The SAGE Handbook of Comparative Studies in Education 2018962496, 9781526419460

1,456 164 8MB

English Pages [683] Year 2019

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The SAGE Handbook of Comparative Studies in Education
 2018962496, 9781526419460

Table of contents :
Contents
List of Figures
List of Tables
Notes on the Editors and Contributors
Preface
Methods and Practice in Comparative Education Research: Models, Ontologies and Epistemologies
Part I: The Status of Comparative Education Research
1: The Status of Comparative Education Research in the 21st Century: An Empiricist’s View
2: Critical Challenges in Approaches and Experience in Comparative Education Research
3: Enduring Issues in Education: Comparative Perspectives
4: Riddled with Gaping Wounds: A Methodological Critique of Comparative and International Studies in Education: Views of a Professor
Part II: Measurement Methods in Comparative Education Research
5: Challenges in International Large-Scale Educational Surveys
6: Non-Cognitive Attributes: Measurement and Meaning
7: Methodological Challenges to Measuring Heterogeneous Populations Internationally
8: The Participation of Latin American Countries in International Assessments: Assessment Capacity, Validity, and Fairness
9: Validity Issues in Qualitative and Quantitative Research of Cross-National Studies
10: Mixed Methods in Education: Visualising the Quality of Quantitative Data
Part III: Research Practices in Comparative Studies of Education
11: Growth and Development of Large-Scale International Comparative Studies and their Influence on Comparative Education Thinking
12: The Meaning of Motivation to Learn in Cross-National Comparisons: A Review of Recent International Research on Ability, Self-Concept, and Interest
13: Examining Change over Time in International Large-Scale Assessments: Lessons Learned from PISA
14: Qualitative Comparative Education Research: Perennial Issues, New Approaches and Good Practice
15: Methodological Challenges in Conducting International Research on Teaching Quality Using Standardized Observations
16: The Measurement and Use of Socioeconomic Status in Educational Research
Part IV: Lessons from International Comparisons of Student Behavior
17: Early Childhood Care and Education in the Era of Sustainable Development: Balancing Local and Global Priorities
18: Equity of Access to Pre-Primary Education and Long-Term Benefits: A Cross-Country Analysis
19: Primary Education Curricula across the World: Qualitative and Quantitative Methodology in International Comparison
20: Outside-School-Time Activities and Shadow Education
21: Measuring Opportunity: Two Perspectives on the ‘Black Box’ of School Learning
22: What Can International Comparative Tests Tell Us about the Future Supply of Highly Skilled STEM Workers?
Part V: International Comparisons of Instruction
23: Comparative Research on Teacher Learning Communities in a Global Context
24: Teaching Instructional Practices: Play-Based Learning – Supporting the Transition from Early Years to Primary Education
25: Challenges in Practice: A Critical Examination of Efforts to Link Teacher Practices and Student Achievement
26: Global Higher Education Trends: Implications for Policy and Practice
27: Digital Technologies and Educational Transformation
28: International Large-Scale Student Assessments and their Impact on National School Reforms
Part VI: Influence of Large-Scale Assessments on Policy
29: Changes in the World-Wide Distribution of Large-Scale International Assessments
30: The Uneasy Relation between International Testing and Comparative Education Research1
31: Global and Local Dissonance when Comparing Nation-States and Educational Performance
32: Building Learning Assessment Systems in Latin America and the Caribbean
Index

Citation preview

The SAGE Handbook of

Comparative Studies in Education

LIST OF REVIEWERS Kai-Ming Cheng, University of Hong Kong, Hong Kong Michael Crossley, University of Bristol, UK Erwin Epstein, Loyola University Chicago, USA Barry McGaw, The University of Melbourne, Australia Barbara Means, Digital Promise, USA Lynn Paine, Michigan State University, USA Aaron Pallas, Teachers College, Columbia University, USA Francisco Ramirez, Stanford Graduate School of Education, Stanford University, USA Margaret Wu, The University of Melbourne, Australia

The SAGE Handbook of

Comparative Studies in Education

Edited by

Larry E. Suter, Emma Smith and Brian D. Denman

SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1 Mohan Cooperative Industrial Area Mathura Road New Delhi 110 044 SAGE Publications Asia-Pacific Pte Ltd 3 Church Street #10-04 Samsung Hub Singapore 049483

Editor: Jude Bowen Editorial Assistant: Orsod Malik Production Editor: Manmeet Kaur Tura Copyeditor: Sarah Bury Proofreader: Derek Markham Indexer: Cathryn Pritchard Marketing Manager: Joanna McDowall Cover Design: Bhairvi Gudka Typeset by: Cenveo Publisher Services Printed in the UK

Introduction & editorial arrangement © Larry E. Suter, Emma Smith and Brian D. Denman, 2019 Preface © Larry E. Suter Chapter 1 © Larry E. Suter, 2019 Chapter 2 © Brian D. Denman, 2019 Chapter 3 © Wing On Lee, 2019 Chapter 4 Rui Yang, 2019 Chapter 5 © Fons J. R. van de Vijver, Nina Jude & Susanne Kuger, 2019 Chapter 6 © Mary Ainley & John Ainley, 2019 Chapter 7 © Leslie Rutkowski & David Rutkowski, 2019 Chapter 8 © Guillermo SolanoFlores, 2018 Chapter 9 © Jae Park, 2019 Chapter 10 © David A. Turner, 2019 Chapter 11 © Larry E. Suter, 2019 Chapter 12 © Ming-Te Wang, Jessica L. Degol & Jiesi Guo, 2018 Chapter 13 © Christine Sälzer & Manfred Prenzel, 2019 Chapter 14 © Michele Schweisfurth, 2019 Chapter 15 © Anna-Katharina Praetorius, Wida Rogh, Courtney Bell & Eckhard Klieme, 2019 Chapter 16 © J. Douglas Willms & Lucia Tramonte, 2019 Chapter 17 © Abbie Raikes, Dawn Davis & Anna Burton, 2019

Chapter 18 © Gerard FerrerEsteban, Larry E. Suter & Monica Mincu, 2019 Chapter 19 © Dominic Wyse & Jake Anders, 2019 Chapter 20 © Siyuan Feng & Mark Bray, 2019 Chapter 21 © Leland S. Cogan & William H. Schmidt, 2019 Chapter 22 © Larry E. Suter & Emma Smith, 2019 Chapter 23 © Motoko Akiba, Cassandra Howard & Guodong Liang, 2019 Chapter 24 © Tanya Hathaway, 2019 Chapter 25 © Laura M. O’Dwyer & Catherine Paolucci, 2019 Chapter 26 © Christopher C. Blakesley, Donna J. Menke & W. James Jacob, 2019 Chapter 27 © Nancy Law & Leming Liang, 2019 Chapter 28 © Gustavo E. Fischman, Pasi Sahlberg, Iveta Silova & Amelia Marcetti Topper, 2019 Chapter 29 © Larry E. Suter, 2019 Chapter 30 © Martin Carnoy, 2019 Chapter 31 © Colin Power, 2019 Chapter 32 © Adriana Viteri & Pablo Zoido, 2019

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.

At SAGE we take sustainability seriously. Most of our products are printed in the UK using FSC papers and boards. When we print overseas we ensure sustainable papers are used as measured by the PREPS grading system. We undertake an annual audit to monitor our sustainability.

Library of Congress Control Number: 2018962496 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library 978-1-5264-1946-0

Contents List of Figuresix List of Tablesxi Notes on the Editors and Contributorsxiii Prefacexxix Larry E. Suter Methods and Practice in Comparative Education Research: Models, Ontologies and Epistemologies xxxiii Larry E. Suter, Emma Smith and Brian D. Denman PART I  THE STATUS OF COMPARATIVE EDUCATION RESEARCH 1

The Status of Comparative Education Research in the 21st Century: An Empiricist’s View Larry E. Suter

1

3

2 Critical Challenges in Approaches and Experience in Comparative Education Research Brian D. Denman

25

3

50

Enduring Issues in Education: Comparative Perspectives Wing On Lee

4 Riddled with Gaping Wounds: A Methodological Critique of Comparative and International Studies in Education: Views of a Professor Rui Yang

64

PART II MEASUREMENT METHODS IN COMPARATIVE EDUCATION RESEARCH 81 5 Challenges in International Large-Scale Educational Surveys Fons J. R. van de Vijver, Nina Jude and Susanne Kuger 6 Non-Cognitive Attributes: Measurement and Meaning Mary Ainley and John Ainley

83 103

7 Methodological Challenges to Measuring Heterogeneous Populations Internationally 126 Leslie Rutkowski and David Rutkowski

vi

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

8 The Participation of Latin American Countries in International Assessments: Assessment Capacity, Validity, and Fairness Guillermo Solano-Flores

141

9 Validity Issues in Qualitative and Quantitative Research of Cross-National Studies 164 Jae Park 10 Mixed Methods in Education: Visualising the Quality of Quantitative Data David A. Turner

178

PART III RESEARCH PRACTICES IN COMPARATIVE STUDIES OF EDUCATION 195 11 Growth and Development of Large-Scale International Comparative Studies and their Influence on Comparative Education Thinking Larry E. Suter

197

12 The Meaning of Motivation to Learn in Cross-National Comparisons: A Review of Recent International Research on Ability, Self-Concept, and Interest Ming-Te Wang, Jessica L. Degol and Jiesi Guo

224

13 Examining Change over Time in International Large-Scale Assessments: Lessons Learned from PISA Christine Sälzer and Manfred Prenzel

243

14 Qualitative Comparative Education Research: Perennial Issues, New Approaches and Good Practice Michele Schweisfurth

258

15 Methodological Challenges in Conducting International Research on Teaching Quality Using Standardized Observations Anna-Katharina Praetorius, Wida Rogh, Courtney Bell and Eckhard Klieme

269

16 The Measurement and Use of Socioeconomic Status in Educational Research J. Douglas Willms and Lucia Tramonte

289

PART IV LESSONS FROM INTERNATIONAL COMPARISONS OF STUDENT BEHAVIOR 305 17 Early Childhood Care and Education in the Era of Sustainable Development: Balancing Local and Global Priorities Abbie Raikes, Dawn Davis and Anna Burton

307

18 Equity of Access to Pre-Primary Education and Long-Term Benefits: A Cross-Country Analysis Gerard Ferrer-Esteban, Larry E. Suter and Monica Mincu

326

Contents

19 Primary Education Curricula across the World: Qualitative and Quantitative Methodology in International Comparison Dominic Wyse and Jake Anders 20 Outside-School-Time Activities and Shadow Education Siyuan Feng and Mark Bray

vii

342 359

21 Measuring Opportunity: Two Perspectives on the ‘Black Box’ of School Learning Leland S. Cogan and William H. Schmidt

374

22 What Can International Comparative Tests Tell Us about the Future Supply of Highly Skilled STEM Workers? Larry E. Suter and Emma Smith

390

PART V  INTERNATIONAL COMPARISONS OF INSTRUCTION

417

23 Comparative Research on Teacher Learning Communities in a Global Context Motoko Akiba, Cassandra Howard and Guodong Liang

419

24 Teaching Instructional Practices: Play-Based Learning – Supporting the Transition from Early Years to Primary Education Tanya Hathaway

445

25 Challenges in Practice: A Critical Examination of Efforts to Link Teacher Practices and Student Achievement Laura M. O’Dwyer and Catherine Paolucci

471

26 Global Higher Education Trends: Implications for Policy and Practice Christopher C. Blakesley, Donna J. Menke and W. James Jacob

492

27 Digital Technologies and Educational Transformation Nancy Law and Leming Liang

508

28 International Large-Scale Student Assessments and their Impact on National School Reforms Gustavo E. Fischman, Pasi Sahlberg, Iveta Silova and Amelia Marcetti Topper

532

PART VI  INFLUENCE OF LARGE-SCALE ASSESSMENTS ON POLICY

551

29 Changes in the World-Wide Distribution of Large-Scale International Assessments 553 Larry E. Suter 30 The Uneasy Relation between International Testing and Comparative Education Research Martin Carnoy

569

viii

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

31 Global and Local Dissonance when Comparing Nation-States and Educational Performance 586 Colin Power 32 Building Learning Assessment Systems in Latin America and the Caribbean Adriana Viteri and Pablo Zoido

600

Index619

List of Figures   1.1 Number of bibliographic references per year 8   2.1 Mårtensson et al.’s concept hierarchy of research quality 31   2.2 Paulston’s conceptual map of perspectivism, constructivism, and rationalisms 32   2.3 Research ‘stream’ trajectories (characteristics) 33   2.4 Data of seminal scholars and their approaches/experiences 34   7.1 Difference in per capita GDP in 2015-adjusted US dollars, estimated as GDPOECD – GDPPartner (dollar amounts in thousands) 128   8.1 Process of an international assessment and the impact of the national context 146   8.2 An example of the rubric used to assess one of the 112 elements included in the PISA for Development capacity needs analytical framework: Guatemala 150   8.3 Percentage of elements rated as ‘Advanced’ within each of the three dimensions considered in the capacity needs analysis 151   8.4 A hypothetical item. Are students from all Latin American countries taught to represent fractions this way? 156 10.1 Education statistics for Australia 180 10.2 Education statistics for Kyrgyzstan 181 10.3 Rotating an image (Australia, Figure 10.1) to see trends and errors 182 10.4 Education statistics for Latvia 183 10.5 Education statistics for Malaysia 183 10.6 Education statistics for New Zealand 186 11.1 Number of individual large-scale international surveys from all sources by world region: 1980 to 2016 199 11.2 Number of comparative education publications per year by author type and whether large-scale or not: 1960 to 2017 213 11.3 Number of different authors for large-scale and small-scale researchers 214 16.1 Differential item function curves for two indicators of SES 294 16.2 Socioeconomic gradient for Peru 296 16.3 School profile for Peru 298 16.4 A model for assessing equality, equity and school effects 298 18.1 Probability of students attending pre-primary education (odds ratios) 334 18.2a and 18.2b Probability of students attending pre-primary education by immigrant background and SES: native versus second-generation students 335 18.3a and 18.3b Probability of students attending pre-primary education by immigrant background and SES: native versus first-generation students 335 18.4 Cross-country analysis: difference in students performance according to attendance in pre-primary education 336 18.5 Probability of students attending pre-primary education: average marginal effects340

x

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

20.1 Key parameters in the ecology of OST activities 364 20.2 Percentages of 15-year-old students receiving supplementary education, 2012 367 21.1 Conceptual model of school learning (student achievement) 375 21.2 Model of potential educational experiences 380 21.3 Carroll’s model of school learning 381 22.1 Number of countries in PISA 2015 by GDP per capita in PISA 2015 399 22.2 Percentage of 15-year-olds who aspire to a STEM, health, or education profession for 13 world regions 399 22.3 Percentage of students aspiring to any STEM occupation by age 30 by log per-capita country income 400 22.4 Percentage of students aspiring to a career as a scientist by per-capita level of country 400 22.5 Number of PISA countries at intervals of achievement in mathematics and science: 2015 402 22.6 Mean science achievement by average GDP per capita: 68 PISA 2015 countries 404 22.7 Average science score by log of per capita GDP 404 22.8 Proportion of students aspiring to a STEM career by age 30 by country level average science achievement: PISA 2015 406 22.9 Maximum and minimum bilateral correlations between pairs of six PISA attitude scales for participating countries: PISA 2015 408 22.10 Minimum, maximum and average correlation between student choice of scientist or engineer by age 30 with six attitude scales for 68 countries 409 23.1 Conceptual framework: teacher learning communities in a global context422 27.1 A diagrammatic representation of the interrelationship among the different levels of indicators 528 29.1 Year countries first entered large-scale survey by year and region (whole countries) 560 29.2 Number of participants in large-scale surveys by World Bank income level, 1990–2015 (countries counted for each study) 561 29.3 Number and percentage of countries participating in large-scale surveys by population size 562 29.4 Percentage of world countries participating in large-scale surveys by average income per person per country 563

List of Tables   1.1 Major journals in international comparative education 13   3.1 Progress evaluation of the attainment of the EFA goals 54   3.2 Gini index of high-performing education systems in PISA, 2007 59   7.1 Number of OECD and partner countries since 2000 with GDP in 2015 USD 127   7.2 PISA 2012 booklet design 134   8.1 Participation of Latin American countries in four international assessments (1995–2018) and per capita gross national income (2017) 147   9.1 Comparison of emic and etic approaches 167 11.1 Number of comparative education publications by academic researchers and large-scale research organizations by source of data: 1960 to 2017 212 14.1 Summary of the contrasts between traditional comparative approaches and contemporary revisions 262 15.1 Overview of major international observation-based studies on teaching quality271 15.2 Overview of pivotal challenges in conducting international studies on teaching275 18.1 Probability of Students attending pre-primary education: coefficients, odds ratios and marginal effects 333 18.2 Cross-country analysis: difference in students performance according to attendance in pre-primary education by sub-samples (low versus high SES students) 337 18.3 Probability of students attending pre-primary education: coefficients and odds ratios 341 19.1 Extract from PIRLS 2016 Reading Achievement Distribution 351 19.2 Percent of PIRLS 2006 and PIRLS 2011 countries reporting levels of emphasis on reading for pleasure 353 19.3 Transition matrix of PIRLS 2006 and PIRLS 2011 of levels of emphasis on reading for pleasure 353 19.4 Percentage of PIRLS 2006 and PIRLS 2011 countries where inspection is used to assess implementation of primary reading curriculum at each time point 354 19.5 Transition matrix of PIRLS 2006 and PIRLS 2011 of inspection use to assess implementation of primary reading curriculum 354 21.1 Sociological perspectives on education opportunity 378 21.2 Student-level correlations of mathematics achievement with SES and its components in three international mathematics studies: TIMSS 1995 (T95), TIMSS 2011 (T11), and PISA 2012 (P12) 378 21.3 Psychological perspectives on opportunity to learn (OTL) 381 21.4 Classroom/school-level correlations of OTL with mathematics performance and OTL with SES in three international mathematics studies 383

xii

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

21.5 Three three-level models predicting PISA mathematics literacy 386 22.1 Occupation aspiration by age 30 (sample size: PISA 2015) 397 22.2 Number of 2015 PISA countries/economies by ISO region and sub-region398 22.3 Correlation coefficients between average country science achievement and proportion of students seeking careers in 11 fields: 68 PISA countries 2015 403 22.4 Average correlation (across individual countries) between student career choice at age 30 and achievement in science 405 23.1a Comparison of teacher collaboration activities from TALIS 2013 teacher survey 429 23.1b Comparison of teacher collaboration activities from TALIS 2013 teacher survey (continued) 430 23.2 Comparison of teacher collaboration activities by regions 431 23.3 Relationship between national mean frequency of teacher collaboration activities432 24.1 Comparison of early learning principles 452 24.2 Comparisons in the frameworks’ perspectives on children’s learning and play 458 24.3 The goals and principles of the British Columbia primary curriculum 460 24.4 Teaching instructional practices of the British Columbia primary curriculum461 24.5 Teaching instructional practices of the Singapore primary curriculum ‘Learning through Play’ 463 28.1 Most common international large-scale student assessments 533 28.2 PISA-based education goals 540 28.3 The use of international large-scale student assessments in national education policy contexts 542 29.1 Number of countries that participated in a large-scale international assessment study (LSAS) by world region, participants weighted by population554 29.2 Large-scale international assessment studies (LSAS) grouped by subject and region 557 29.3 Large-scale comparative studies, 1960–2015, by year first initiated 567 32.1 Learning assessment units in Latin America 604 Annex  32.1 Regional and International Learning assessments studies in Latin America 614 Annex  32.2 National Learning assessments studies across countries in Latin America 616

Notes on the Editors and Contributors

THE EDITORS Larry E. Suter  is a visiting scholar at the University of Michigan and lives in Woodstock, Maryland, as a consultant for educational research. His expertise is in analyzing large-scale international surveys, informal education, measurement of student achievement, and measurement of ‘soft skills’, such as motivation, identity, and career interest. He retired from the US National Science Foundation in 2011. He received his undergraduate education in sociology at the College of Idaho and advanced degrees (MA and PhD) in sociology at Duke University in 1968 and 1975. He was employed for 21 years as a statistician at the US Census Bureau as chief of the Education statistics branch and at the National Center for Education Statistics where he expanded the program of international education to include large-scale cross-national surveys of science, mathematics, and reading. His published works include topics of research methods, international comparative studies, informal learning, and indicators of science education. He has been an adjunct professor at the University of Maryland, Georgetown University, and a visiting scholar at Stanford University. Emma Smith is a Professor of Education and Director of the Centre for Education Studies at the University of Warwick, UK. Her teaching and research are mainly in the area of educational inequality and she is interested in the role that education policy can play in improving social justice and in making lives fairer. She has also recently worked with Sage in co-editing The BERA/SAGE Handbook of Educational Research and on the second edition of her book Key Issues in Education and Social Justice. Brian D. Denman  holds the title Senior Lecturer in Higher Education, Comparative and International Education, and Training and Development at the University of New England, where he also serves as Deputy Head of School. He has also worked as faculty director of an overseas university branch campus in China, director of international development for the first Prime Minister library in Australia (The John Curtin Centre), and study abroad coordinator, overseas opportunities coordinator, and principal/head teacher in the United States. His research has been published widely and supported by a number of international organizations, multilateral agencies and associations. Currently, he serves as Secretary-General of GlobalCIE, SecretaryGeneral of the World Council of Comparative Education Societies (WCCES), UNESCO Fellow, UNE Council member (Board of Trustees), President of the Australian and New Zealand and Comparative and International Education Society (ANZCIES), and Editor-in-Chief of the International Education Journal: Comparative Perspectives.

xiv

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

THE CONTRIBUTORS John Ainley  is a Principal Research Fellow (and formerly Deputy CEO) at the Australian Council for Educational Research (ACER). He has contributed to the IEA Civic and Citizenship Education Study (ICCS), the IEA Computer and Information Literacy Study and the OECD Teaching and Learning International Study 2018 (TALIS 2018). He also contributed to the Australian National Assessment Program sample studies of Civics and Citizenship and ICT Literacy. As director of the national and international surveys research program at ACER he directed many policy-oriented research and evaluation studies for national and state education authorities. Dr Ainley was a member of the Consortium Advisory Group for the Longitudinal Study of Australian Children from its beginning in 2003 until 2016. He is a member of the Publication and Editorial Committee of the International Association for the Evaluation of Educational Achievement (IEA). Mary Ainley is an Honorary Fellow with Melbourne School of Psychological Sciences (The University of Melbourne). Her research interests and experience are in the areas of developmental and educational psychology. A major strand of her research has been around understanding the nature and development of interest, curiosity and information processing as they support student engagement with learning. This research has been published in a number of respected journals, such as Learning and Instruction, Journal of Educational Psychology and Contemporary Educational Psychology. Dr Ainley has also served on the Editorial Boards of these journals. Most recently, in collaboration with Dr John Ainley, she has investigated motivational constructs, most notably interest in science, from a cross-national perspective through secondary analyses of PISA data. This has been published in the International Journal of Science Education and Contemporary Educational Psychology. Motoko Akiba is a Professor and the Chair of the Department of Educational Leadership and Policy Studies at Florida State University. She received her B.A. from the University of Tsukuba, Japan, and a dual-title Ph.D. in Educational Theory & Policy and Comparative & International Education from Pennsylvania State University-University Park. Dr. Akiba’s areas of research expertise are teacher policy and reform, teacher professional development, and comparative education policy. Her publications include International handbook of teacher quality and policy (Routledge/Taylor & Francis, 2018), Teacher reforms around the world: Implementations and outcomes (Emerald Publishing, 2013) and Improving teacher quality: The U.S. teaching force in global context (Teachers College Press, 2009). Her research program has been funded by the National Science Foundation, Institutes of Education Sciences, and AERA Grant Program among others. Jake Anders is an Associate Professor of Educational and Social Statistics in the Department of Learning and Leadership at UCL Institute of Education, and Director of CREATE (Conducting Research, Evaluations and Trials in Education) in UCL’s Centre for Education Improvement Science. Jake’s research interests focus on understanding the causes and consequences of educational inequality and the evaluation of policies and programmes aiming to reduce it. His research, which has been funded by the Economic and Social Research Council, the Nuffield Foundation, the Education Endowment Foundation, the Sutton Trust, and the UK Department for Education, among others, has been published in education, economics, sociology

Notes on the Editors and Contributors

xv

and psychology journals. Recent projects include experimental and quasi-experimental evaluations of school-based interventions, investigations of the importance of curriculum in explaining inequality in university access, and explorations of continuing inequalities into the labour market. Courtney Bell is Principal Research Scientist in ETS’s Global Assessment Center. She completed her doctorate at Michigan State University in Curriculum, Teaching, and Educational Policy after earning her B.A. in Chemistry at Dartmouth College. A former high school science teacher and teacher educator, Courtney’s research looks across actors in the educational system to better understand the intersections of research, policy and practice. Her studies use mixedmethods to analyze the measurement of teaching and the validity of measures of teaching quality, especially observational measures. Current and recent studies investigate how administrators learn to use a high stakes observation protocol, how raters use subject specific and general protocols, how measures of teaching compare across countries, and the ways in which observation protocols capture high quality teaching for students with special needs. She has published in a variety of scholarly journals and also co-edited the 5th Edition of the AERA’s Handbook of Research on Teaching. Christopher C. Blakesley  works as a Learning Engineer at the Eberly Center of Carnegie Mellon University. Dr. Blakesley consults with faculty and manages projects to improve learning opportunities with innovative technologies. Chris holds a Bachelor of Fine Arts degree in Media Arts from Brigham Young University, a Master of Science degree in Instructional Technology & the Learning Sciences from Utah State University, and a PhD in Curriculum & Instruction from the University of Wisconsin-Madison. His work focuses on designing curricula and evaluating learning programs to improve effectiveness and facilitate transformative experiences. Mark Bray  is a Distinguished Chair Professor in the Faculty of Education of East China Normal University (ECNU), Shanghai. Prior to joining ECNU in 2018, he worked for over three decades at the University of Hong Kong where he still holds the title of Emeritus Professor. His career commenced as a teacher in Kenya and Nigeria before moving to the Universities of Edinburgh, Papua New Guinea and London. Between 2006 and 2010, Professor Bray took leave from Hong Kong to work in Paris as Director of UNESCO’s International Institute for Educational Planning (IIEP). Professor Bray is known for his pioneering studies of the so-called shadow education system of private supplementary tutoring in a wide range of countries. Anna Burton  is a doctoral student in Child, Youth and Family Studies at the University of Nebraska-Lincoln. Her research focuses on early childhood care and education quality, teacher training, and child development. She has consulted for the World Bank on studies of early childhood quality in preschools in several countries. Prior to beginning her doctoral studies, she was a lead teacher in a laboratory child development center at Baylor University. Ms Burton received the Rising Star award from the National Coalition of Campus Children Centers for her work. In the past, Ms Burton was the chair of the kindergarten readiness working group of Prosper Waco, a Collective Impact organization. Martin Carnoy is Vida Jacks Professor of Education and Economics at Stanford University. He is co-director of the Lemann Center for Brazilian Education at Stanford, a former president

xvi

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

of the Comparative and International Education Society, a fellow of the National Academy of Education and of the International Academy of Education, and an associate of the Higher School of Economics’ Institute of Education in Moscow. He has written 40 books and more than 150 articles on the economic value of education, on the political economy of educational policy, on educational production, and on higher education. Much of his work is comparative and international and investigates the way educational systems are organized. Recent books include Cuba’s Academic Advantage (2007), Vouchers and Public School Performance (2007), University Expansion in a Changing Global Economy (2014), and Transforming Comparative Education (2018). Leland S. Cogan is a Senior Researcher in the Center for the Study of Curriculum Policy at Michigan State University. He earned undergraduate degrees in microbiology and psychology and a PhD in educational psychology from Michigan State University. He coordinated data collection and analyses for the Survey of Mathematics and Science Opportunities (SMSO), a multinational project that researched and developed the instruments used in the Third International Mathematics and Science Study (TIMSS). He has co-authored technical reports, articles, and books, including Characterizing Pedagogical Flow (1996), Facing the Consequences (1999), Why Schools Matter (2001), The Preparation Gap: Teacher Education for Middle School Mathematics in Six Countries (2007) and Schooling across the Globe: What We Have Learned from 60 Years of Mathematics and Science International Assessments (2019). His research interests include evaluation of mathematics and science curricula, mathematics and science classroom instruction, and the preparation of mathematics and science teachers. Dawn Davis has a PhD in Child, Youth and Family Studies from the University of Nebraska where she is currently an Early Childhood Research Project Manager. Her research focuses on child development, measurement, program evaluation, and early interventions. Dr Davis has been part of the leadership teams on several large-scale intervention, evaluation, and longitudinal studies of low-income children and families in the US, including the Educare Learning Network and an IES-funded Reading for Understanding grant. In addition, she has consulted for the World Bank on studies of early childhood development and quality in preschools in several countries. Her responsibilities have included developing and facilitating trainings on measurement, adapting measurements to specific context, disseminating research findings, applying findings to policy, and working with programs to provide individualized professional development opportunities for staff to increase instructional practices and support program improvement. Jessica L. Degol is an Assistant Professor of Human Development and Family Studies at Penn State, Altoona. Degol received her PhD in Applied Developmental Psychology at the University of Pittsburgh and completed a two-year postdoctoral fellowship at the University of Pittsburgh under the guidance of Dr Ming-Te Wang. Her research interests include understanding how the structure of child care environments impacts teacher stress and emotional well-being, as well as the development of preschool children’s cognitive abilities and executive functioning skills. She also studies the sociocultural factors that impact female decisions to pursue STEM (science, technology, engineering, and mathematics) majors and careers. Siyuan Feng is a PhD candidate and in the Faculty of Education at the University of Hong Kong. He holds an MSc in education from the Moray House School of Education at the

Notes on the Editors and Contributors

xvii

University of Edinburgh and has worked as a private admission consultant prior to his PhD study. His research interest is in private tutors’ perceptions of their occupational identities as well as relevant sociocultural factors shaping such identities. His other research interests and projects include policy research on Asia-Pacific countries’ regulations on private tutors’ occupational standards, competency modelling in outside-school education, and international student mobility and admissions. Gerard Ferrer-Esteban is a Marie Skłodowska-Curie research fellow in the Department of Sociology at the Autonomous University of Barcelona, Catalonia, Spain. His educational background includes a PhD in Sociology and a BA in Pedagogy (UAB). He is involved in the ReformEd project of the Globalization, Education and Social Policies research center (GEPS), in which he complements the qualitative approach with quantitative analyses to disentangle the effects of school autonomy and performance-driven accountability policies on relevant outcomes, such as effectiveness, equity, and other non-cognitive outcomes. He has worked for seven years as a researcher on education policy and school effectiveness at the Agnelli Foundation, in Italy. He also worked as a full-time assistant professor in the Department of Social Pedagogy at UAB, and as a part-time instructor in the Department of Pedagogy at the University of Girona. His main fields of research are comparative education, education policy, and equity in school systems. Gustavo E. Fischman  is a Professor of Education Policy and Director of EdXchange, the knowledge mobilization initiative at Mary Lou Fulton Teachers College, Arizona State University. His scholarship has been distinguished and he has won several awards. He has been a visiting scholar in several universities in Europe and Latin America. He has authored over 100 articles, chapters, and books. He has been the lead editor of Education Policy Analysis Archives and is the editor of Education Review. Among his best-known works are Imagining Teachers: Rethinking Teacher Education and Gender Dumb Ideas Won’t Create Smart Kids, co-authored with Eric M. Haas, and Made in Latin America: Open Access, Scholarly Journals, and Regional Innovations, co-edited with Juan P. Alperin. Jiesi Guo is a Senior Research Fellow at the Institute for Positive Psychology and Education (IPPE) at the Australian Catholic University. His areas of interest include educational and developmental psychology, with a particular focus on how multiple systems on the cultural, social, motivational, and behavioural development of youth shape individual and gender difference in achievement choice. Jiesi completed his PhD and post-doctoral training at the Australian Catholic University and had multiple international research projects with fully funded scholarship to collaborate with prestigious researchers in the United States, Germany, and Finland. Jiesi has published in international leading journals in the fields of psychology and education. Tanya Hathaway is a Lecturer in Coleg Llandrillo, in North Wales in the UK and an online instructor for Laureate Education B.V. She has a PhD in Higher Education from Bangor University. She has extensive experience lecturing in the UK and Australia in teacher professional development and master’s level education. She has pioneered master’s level programmes in teaching and learning and educational research methods. She led the development of the Master’s in Teaching and Learning at the University of Plymouth, UK, under the inspirational leadership of Professor Michael Totterdell. She has taught in a wide range of undergraduate and

xviii

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

postgraduate courses in the areas of theories of learning, science research methods, values and beliefs in education, professional learning and development and early childhood education. She also has extensive experience of supervising postgraduate research studies. Her main research interests are in the field of personal epistemologies, teaching and learning in higher education and more recently in young children’s play. Cassandra Howard  obtained a PhD from the Department of Educational Leadership and Policy Studies at Florida State University. She received her BA from the University of Mississippi and her MA in Latin American Studies from the University of Florida. She has taught in elementary, middle, and high schools and is interested in how to support and prepare teachers in ways that lead to enhanced, meaningful learning opportunities for all students. Her areas of research expertise include teacher professional learning, teacher leadership, teacher agency, and teacher policy and reform. W. James Jacob  is the Vice President of Innovation and International at the Collaborative Brain Trust (CBT). He has extensive administrative experience in higher education professional development and training programs, establishing international partnerships, and external research and program funding opportunities. His international networks span every major global region, where he has helped forge sustainable partnerships with universities, government agencies, nongovernmental organizations, and alumni groups. Dr Jacob holds master’s degrees in Organizational Behavior (Marriott School of Management) and International Development (Kennedy Center for International Studies) from Brigham Young University and a PhD in Education from the University of California, Los Angeles. Nina Jude  is a Senior Researcher at the Leibniz Institute for Research and Information in Education (DIPF) in Frankfurt, Germany. Since 2001, Nina Jude has been involved in largescale assessments, developing measures for cognitive and non-cognitive variables in national and international settings, as well as managing these projects. She has been the international project manager for questionnaire development in the Programmes for International Student Assessment (PISA) 2015 and 2018. For PISA 2021 she chairs the international questionnaire expert group. Her research focuses on the dimensionality of constructs in multi-level settings and the relevance of context factors for education. Recent publications include secondary analysis of large-scale context questionnaire data over time to track countries’ progress in different areas of educational quality. Eckhard Klieme has trained as a mathematician and a psychologist. He is now Professor of Educational Research at Goethe University, Frankfurt am Main, Germany. He has been the Director of the Center for Research on Educational Quality and Evaluation at the German Institute for International Educational Research (DIPF) since 2001. His research interests focus on educational effectiveness and quality of teaching, classroom assessment, and international comparative educational research. Starting with TIMSS-Video 1995 (Trends in International Mathematics and Science Study) in Germany, Professor Klieme has led several video-based studies on teaching in mathematics, science, and language education. He has served as a consultant for national and international agencies and has been involved in international large-scale assessment programmes such as Programme for International Student Assessment (PISA), the OECD Teaching and Learning International Survey (TALIS), and currently the TALIS video study.

Notes on the Editors and Contributors

xix

Susanne Kuger  is Head of the Department of Social Monitoring and Methodology at the German Youth Institute in Munich (Germany). Her research interests and teaching topics are research on the effects of family, early childhood education and care, school and out-of-school environments on child and youth development, survey methodology, international comparisons in education, as well as refining modelling techniques for complex quantitative data analyses in education research. Nancy Law is a Professor in the Division of Information and Technology in studies, Faculty of Education at the University of Hong Kong. She served as the Founding Director for the Centre for Information Technology in Education (CITE) for 15 years from 1998. She also led the Science of Learning Strategic Research Theme at the University of Hong Kong (2013–17). She is known globally as a learning scientist with a strong record and expertise in the integration of digital technology in learning and teaching to promote student-centred pedagogical innovations. Her research interests include international comparative studies of technologyenabled learning innovations, models of ICT integration in schools and change leadership, computer-supported collaborative learning, the use of expressive and exploratory computerbased learning environments, learning design and learning analytics. She received a Humanities and Social Sciences Prestigious Fellowship Scheme Award by the HKSAR Research Grants Council in 2014 in recognition of her outstanding research. Wing On Lee is a Distinguished Professor at Zhengzhou University, China. He is concurrently serving as Director of the International and Comparative Education Research and the Central Plains Education Research Centre at the School of Education. In addition, he has been appointed Director of the Citizenship Education Research Centre as a National-based Centre established at Zhengzhou University. Professor Lee has over 20 years of senior management experience in higher education in different countries. He was previously Vice-President and Chair Professor of Comparative Education at the Open University of Hong Kong (2014–17) and Dean of Education Research at National Institute of Education, Singapore (2010–14). He has also previously served at Hong Kong Institute of Education as Vice President (Academic) and Deputy to President, Acting President and Chair Professor of Comparative Education, Founding Dean of the School of Foundations in Education, Head of two Departments and the Centre for Citizenship Education (2007–2010). In 2005, he was invited by the University of Sydney to be Professor and Director (International). Prior to his service in Australia, he had served at the University of Hong Kong as Associate Dean of Education and Founding Director of Comparative Education Research Centre. He has served on many strategic committees in his public services, such as Chair of Research Ethics Board on Population Health for the National Healthcare Group and Conference Ambassador for Singapore Tourism Board in Singapore, and Education Commission, Central Policy Unit, Curriculum Development Council and Quality Education Fund in Hong Kong. Currently, Professor Lee is appointed by the Hong Kong government to serve as Chair of the Award for Outstanding Practice in Moral Education (Primary Sector), Chair of the Steering Committee of PISA 2018, and member of Task Force on Curriculum Review. Leming Liang is a PhD candidate completing his thesis entitled ‘A Multilevel and Multiscale Exploration of Teacher Learning in Technology-enhanced Pedagogical Innovations’. He is also the project manager for a learning design and analytics project named An Open Learning Design, Data Analytics and Visualization Framework for E-learning, which focuses on learning

xx

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

design, learning analytics, teacher inquiry, and developing a technology platform to support the synergistic process between the three aforementioned components in actual teaching practice. His research interests include technology-enabled pedagogical innovations, learning design, teacher learning, and technology use for teacher inquiry of student learning. Guodong Liang  is a Research Specialist at the Community Training and Assistance Center (CTAC) in Boston, Massachusetts, USA. He received his BA from the University of Science and Technology of China (USTC) and a PhD in Educational Policy Analysis from the University of Missouri. Dr Liang’s research focuses on educational policy (e.g., performancebased compensation, teacher professional development, teacher evaluation, and principal leadership), especially from a comparative and international education perspective. His work has been published in such journals as Educational Policy, Journal of Educational Administration, Journal of Educational Research, Journal of School Leadership, and International Journal of Educational Research. Donna J. Menke is an Assistant Professor at the University of Memphis College of Education in the Department of Leadership. She teaches courses in Higher Education and Student Affairs. Her research areas include the student athlete experience, college student academic advising, and career development. Articles from her research appear in the NACADA Journal, Journal for the Study of Sports and Athletes in Education, and the Journal of Loss and Trauma. She currently serves on the editorial review board of the First Year Experience, the Journal of College and Character, and recently completed service on the NACADA Journal editorial board. She maintains active memberships in NACADA and NASPA. Monica Mincu  is an Associate Professor in the Department of Philosophy and Educational Sciences at the University of Turin, Italy. She has published in high-profile journals, such as Comparative Education, Oxford Review of Education, and History of Education. She has engaged with education politics and governance from a social change and reform perspective and teacher education in Europe in various contexts. Her professional experience revolves around education politics and teacher education, through a comparative approach. Laura M. O’Dwyer is a Professor in the Measurement, Evaluation, Statistics, and Assessment department. Her expertise is in the area of quantitative research methods and design, and advanced data analysis, and her research focuses primarily on examining the relationships between the organizational characteristics of schools and teachers and student outcomes. She has contributed to numerous studies that examined ways for improving teacher quality and student outcomes, and her work has been funded by the NSF, the US Department of Education, and the Institute of Education Sciences. O’Dwyer has extensive experience in the design of large-scale observational and experimental research, and evaluation studies, and in the analysis of large-scale data sets such as PISA and TIMSS. Catherine Paolucci  is an Affiliate Research Scientist in the STEM Education Center at Worcester Polytechnic Institute. She began her career as a secondary mathematics teacher and earned an EdD in Mathematics Education from Teachers College, Columbia University (New York) in 2008. She has since served as a director of teacher education programs and professional development programs in the United States, Ireland, and South Africa. Her research and project work support program and policy development for mathematics teacher education, both

Notes on the Editors and Contributors

xxi

in the US and abroad. Her current research in mathematics teacher education focuses on teachers’ development of mathematical knowledge and the impact of innovative field experiences in teacher education. Jae Park reads at the Education University of Hong Kong. His research interests are in sociology and philosophy of education. He recently published in Comparative Education Review, Educational Philosophy and Theory, International Studies in Sociology of Education, Comparative Education, and Ethics & Behavior. He serves as the President of the Comparative Education Society of Hong Kong and as the Head of the International Education Research Group in the Centre for Lifelong Learning Research and Development of the Education University of Hong Kong. He is the Editor-in-Chief of the International Journal of Comparative Education and Development and Editorial Board member of the book series ‘Educational Leadership Theory’ for Springer. Colin Power  was Deputy Director-General of UNESCO from 1999 to 2000 and Assistant Director-General for Education from 1989 to 1998. As such, he was responsible for the overall policy and management of the education programmes of UNESCO, playing a central role in all of its major initiatives, such as International Literacy Year, Education for All and the International Commission on Education for the 21st Century, and in the UN’s struggle to alleviate poverty, to defend human rights, to protect world heritage sites, and to promote education for sustainable development and a culture of peace and non-violence. Dr Power began his career teaching science and mathematics before taking up an academic post at the University of Queensland where he is an Adjunct Professor at the University and Alumnus of the Year 2002, and for ten years was Professor of Education at Flinders University of South Australia. He is author or co-author of 13 books and over 250 published works on education, learning and development. Currently he is Chair of the Commonwealth Consortium for Education and Director of the Eidos Institute (an international research network and think tank on social policy issues). Anna-Katharina Praetorius  is a Professor for Research on Learning, Instruction, and Didactics at the University of Zurich (Switzerland). She completed her doctorate at the University of Koblenz-Landau (Germany) after studying Educational Science, Psychology, and Elementary Educational Science at the University of Erlangen-Nuremberg (Germany). In her research, she focuses on issues around conceptualizing and measuring instructional quality, both on a national and an international level. Additionally, she is doing research on teacher motivation and teachers’ judgment accuracy. She received several publication awards for her work. Manfred Prenzel serves as Director of the Centre for Teacher Education at the University of Vienna since his retirement at the Technical University of Munich (in 2018), where he owned the Susanne Klatten Endowed Chair of Educational Research and also filled the position of the Founding Dean of the TUM School of Education. Before he had worked as Managing Director of the Leibniz Institute for Science and Mathematics Education (IPN) in Kiel. The main topics of his research relate to issues of learning and teaching in different domains (science, mathematics, medicine, economics). He was National Programme Manager in Germany for PISA 2003, 2006 and 2012. From 2005 until 2011 Manfred Prenzel was a Member of the European Science Foundation (ESF) Standing Committee Social Sciences, and from 2003 until 2009 a

xxii

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Member of the Senate and Joint Grants Committee of the German Research Foundation (DFG). Manfred Prenzel served also as Chair of the German Council of Science and Humanities (Wissenschaftsrat) from 2014 to 2017. Abbie Raikes  is an Associate Professor at the College of Public Health, University of Nebraska Medical Center. Dr Raikes’ recent work has focused on improving early childhood programs and policies in low- and middle-income countries. Her research background also includes a strong focus on young children’s social/emotional development and leadership of the Measuring Early Learning and Quality Outcomes project. Previously, Abbie contributed to early childhood policy development in several countries as a program specialist for the United Nations Education, Science and Culture Organization (UNESCO) in Paris, where she also participated in UNESCO’s process to develop indicators for the Sustainable Development Goals. Abbie was a senior program officer at the Bill & Melinda Gates Foundation, and has advised several organizations on early childhood development and education. Wida Rogh is a Research Associate at the Department for Research on Learning, Instruction, and Didactics at the University of Zurich (Switzerland). She studied Educational Science, Art History and Psychology at the University of Münster (Germany) and completed her Master of Arts in Educational Science in 2014. Between 2013 and 2015 she worked as a Consultant at the Directorate for Education and Skills of the Organization for Economic Cooperation and Development (OECD). Since 2015 she has been working in various research projects on the creativity development and measurement in adolescence. Her current research focuses on the observational-based measurement of teaching and instruction. David Rutkowski is an Associate Professor in Educational Policy and Educational Inquiry at Indiana University. Previously he was a Professor of Education at the Center for Educational Measurement (CEMO) at the University of Oslo, Norway, and a researcher for the International Association for the Evaluation of Educational Achievement (IEA) in Hamburg, Germany. He earned a PhD in educational policy with a research specialization in evaluation from the University of Illinois at Urbana-Champaign. His main areas of research are in the area of educational policy and educational measurement with specific emphasis on how large-scale assessments are used within policy debates. He has consulted for national and international organizations, including the US State Department, USAID, UNESCO, the IEA and the OECD, and has conducted evaluations in over 20 countries. He is the editor of the IEA policy brief series, serves on the IEA publication editorial committee (PEC) and is a board member of several academic journals. He teaches courses in evaluation, education policy, statistics and large-scale assessment. Leslie Rutkowski is Associate Professor of Inquiry Methodology at Indiana University. She earned her PhD in Educational Psychology, specializing in Statistics and Measurement, from the University of Illinois at Urbana-Champaign. Leslie’s research is in the area of international large-scale assessment. Her interests include latent variable modeling and examining methods for comparing heterogeneous populations in international surveys. In addition to a recently funded Norwegian Research Council FINNUT grant on developing international measurement methods, Leslie published the edited volume Handbook of International Large-Scale Assessment (Rutkowski, von Davier, and Rutkowski, 2014) with Chapman & Hall. She is also

Notes on the Editors and Contributors

xxiii

currently co-authoring a textbook on large-scale assessment under the Guilford stamp with David Rutkowski and Eugene Gonzalez. She teaches quantitative methods courses, including structural equation modeling/covariance structure analysis and related topics. Pasi Sahlberg is a Professor of Education Policy and Research Director at the Gonski Institute for Education, University of New South Wales in Sydney, Australia. He has worked as schoolteacher, teacher-educator, researcher, and policy advisor in Finland and has analyzed education policies and advised education policy makers around the world. He has gained working knowledge in over 60 countries and he is a former senior education specialist at the World Bank in Washington, DC, lead education specialist at the European Training Foundation, director general of the Ministry of Education in Finland, and a visiting professor at Harvard University. He is recipient of the 2012 Education Award in Finland, the 2013 Grawemeyer Award in the US, the 2014 Robert Owen Award in Scotland, the 2016 Lego Prize in Denmark, and Rockefeller Foundation Bellagio Resident Fellowship in Italy in 2017. He has published widely in academic journals, professional magazines and public media about educational issues. Most recent books include Finnish Lessons: What Can the World Learn from Educational Change in Finland (2015), Hard Questions on Global Educational Change (2017), FinnishED Leadership: Four Big, Inexpensive Ideas to Transform Education (2018), and Let the Children Play: How more play will save our schools and help children thrive (with William Doyle, 2019). Pasi is a Finnish citizen, now living with his family in Sydney, Australia. Christine Sälzer  is Professor of education and Co-Director of the Professional School of Education at the University of Stuttgart, Germany. Prior to this position, Christine served as Germany’s National Project Manager for PISA 2012, 2015 and 2018. In 2016, she graduated from Technical University of Munich with her habilitation on large-scale student assessments as an empirical point of reference for educational policy making. Christine’s main research topics are large-scale student assessments, educational monitoring, student absenteeism and students with special educational needs. She focuses on teacher education in her university teaching and works on connecting educational research with educational practice. William H. Schmidt  is a University Distinguished Professor of statistics and education at Michigan State University. He serves as director of the Education Policy Center and holds faculty appointments in Statistics and Education. He served as National Research Coordinator and Executive Director of the US National Center which oversaw participation of the United States in the IEA-sponsored Third International Mathematics and Science Study (TIMSS). He has published in numerous journals including the Journal of the American Statistical Association, Journal of Educational Statistics, EEPA, Science, Educational Researcher and the Journal of Educational Measurement. He has co-authored 10 books, including Why Schools Matter (2001), Inequality for All (2012), and Schooling across the Globe: What We Have Learned from Sixty Years of Mathematics and Science International Assessments (2019). His current writing and research concerns issues of academic content in K-12 schooling, including the Common Core State Standards for Mathematics, assessment theory and the effects of curriculum on academic achievement. He is concerned with educational policy related to mathematics. Michele Schweisfurth is Professor of Comparative and International Education and Director of the Robert Owen Centre for Educational Change at the University of Glasgow in Scotland,

xxiv

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

where she also leads a course on comparative and international education. Her research interests include ‘best practice’ pedagogies as travelling policies and ideas, global citizenship education, and the experiences of international students. She has published widely on comparative education as a field and methodology, and on the relationship between education and various forms of development in the Global South. She has lived, worked and researched in a wide range of countries in North America, Europe, Asia, and Africa. She is former Chair of the British Association for International and Comparative Education and former editor of the journal Comparative Education. Iveta Silova  is a Professor and Director of the Center for Advanced Studies in Global Education at Mary Lou Fulton Teachers College. Her research focuses on the study of globalization, post-socialist transformations, and knowledge production and transfer in education. More recently, she has been exploring the intersections of postsocialist, postcolonial, and decolonial perspectives in envisioning education beyond Western modernity. She is co-editor of ‘European Education: Issues and Studies’ and Associate Editor of ‘Education Policy Analysis Archives’. Guillermo Solano-Flores is Professor of Education at Stanford University. He specializes in the intersection of educational assessment, language, culture, and cognition. His research on assessment development, translation, localization, and review integrates reasoning and methods from psychometrics, sociolinguistics, semiotics, and cognitive science. He is the author of the theory of test translation error – a theory of the inevitability of translation error and its impact on validity. Also, he has used generalizability theory – a psychometric theory of measurement error – to estimate the amount of measurement error due to language factors in testing. He has advised countries in Latin America, Asia, Europe, the Middle East, and Northern Africa on the development of national assessment systems and the translation and cultural adaptation of assessments. Current research projects examine formative assessment in linguisticallydiverse science classrooms and the design of illustrations as visual accessibility resources in computer-administered tests for linguistically-diverse student populations. Amelia Marcetti Topper is an Assessment and Evaluation Specialist in the University of Rhode Island’s Office of Student Learning, Outcomes Assessment and Accreditation, and an education consultant. She currently works with undergraduate, graduate, and general education program faculty to advance an effective and meaningful university-wide assessment process, and has also managed and collaborated on numerous quantitative, qualitative, and mixed-methods studies examining student access and success for the U.S. Department of Education and other governmental and non-governmental organizations. Her independent research draws on critical and human development frameworks to examine the conceptualization and measurement of student learning and outcomes at the institutional, national, and global levels. She completed her PhD in Education Policy and Evaluation at Arizona State University, and her dissertation received the American Educational Research Association’s Division J (Postsecondary Education) 2016 Outstanding Dissertation of the Year award. Lucia Tramonte  is Professor of Sociology at the University of New Brunswick (UNB). Her research focuses on comparative education, equity, and equality in educational systems. She works on large-scale international assessments from two perspectives: she analyzes

Notes on the Editors and Contributors

xxv

existing data to tease out inequalities and inequities associated with access and transition in education; and she designs contextual questionnaires, measures, and new tools. With Dr Jon Douglas Willms, she designed and developed the framework and questionnaires for the contextual assessment of 15 year olds, in and out of school, and the statistical analyses for the national and international reports of PISA for Development, an initiative for low- and middle-income countries aimed at tracking international educational targets in the post-2015 UN framework. As the Co-Director of the Canadian Research Institute for Social Policy (CRISP) at UNB, she led the analytical work on the Successful Transition Project for Human Resources and Skills Development Canada (HRSDC). Since 2004, she works internationally with large organizations like the OECD, AFD, and Unesco IIPE, national governments, and universities on questionnaire construction, secondary data analysis, measurement, and multilevel modelling of cross-sectional and longitudinal data. David A. Turner is a Professor at the Institute for International and Comparative Education, Beijing Normal University. He gained his PhD from the University of London Institute of Education in 1981. He is a Fellow of the Academy of Social Sciences in the UK. His research interests include higher education policy, technical and vocational education, quality assurance, and leadership in international contexts. He has written dozens of scholarly articles and a number of books, including Theory and Practice of Education (2007). He has been a consultant to Ministries of Education in the Slovak Republic and Mexico. He has lived and worked in the UK, China, Japan and Mexico, as well as being an invited lecturer in conferences and institutions in many other countries. His book, Theory of Education (2005), was awarded the World Education Fellowship Book Award in 2007. Fons J. R. van de Vijver  is Professor Emeritus in cross-cultural psychology at Tilburg University, the Netherlands, and has an extraordinary chair at North-West University, South Africa, and the University of Queensland, Australia. He is senior researcher at the Higher School of Economics in Moscow, Russia. He has (co-)authored more than 550 publications, mainly in the domain of cross-cultural psychology. The main topics in his research involve bias and equivalence, psychological acculturation and multiculturalism, response styles, translations and adaptations. He is the former editor of the Journal of Cross-Cultural Psychology and serves on the board of various journals. He is a former president of Division 2 (Assessment and Evaluation) of the International Association of Applied Psychology, the European Association of Psychological Assessment, and PastPresident of the International Association for Cross-Cultural Psychology. He has received various national and international prizes for his work. He was/is consultant to various large-scale international assessment projects. Adriana Viteri is an economics consultant at the Inter-American Development Bank (IDB). Before joining the IDB, she worked as technical specialist at UNESCO at the Latin American Laboratory for Assessment of the Quality of Education (LLECE), where she took part in the organization of the fourth learning assessment study implemented across Latin American and Caribbean countries. From 2014 to mid-2018, she was responsible for the technical assistance to international teams for all the activities regarding the elaboration of several studies, such as comparative large-scale assessments and national reports. She has worked in both public and private sectors, as well as in civil society and international organizations, primarily performing work focused on quantitative topics.

xxvi

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Ming-Te Wang is a Professor of Psychology and Education and a Research Scientist in the Learning Research and Development Center at the University of Pittsburgh. He received a doctorate in Human Development and Psychology from Harvard University. His research program aims to inform practice and policy that improve human learning and development and address educational disparities across childhood and adolescence and in school and family contexts. His current research is centered around two primary domains: (a) creating supportive, responsive, and inclusive learning environments that can buffer students’ stress, foster engagement and resilience, and support positive development; and (b) elucidating the mechanisms and processes by which inequalities are propagated in learning environments, especially for students from disadvantaged backgrounds. His specific research interests include achievement motivation and engagement, racial/ethnic and gender disparities and biases, school/classroom climate and school discipline, and parental involvement in education and ethnic-racial socialization. J. Douglas Willms is the President of the Learning Bar Inc. He is also the President of the International Academy of Education and a Member of the US National Academy of Education. From 2002 to 2017, he held the Tier 1 Canada Research Chair in Literacy and Human Development. Since receiving his PhD from Stanford in 1983, Dr Willms has published over 300 research articles and monographs pertaining to child development, children’s health, youth literacy, the accountability of schooling systems, and the assessment of national reforms. He and his colleagues designed the Early Years Evaluation (EYE), an instrument for the assessment of children’s early developmental skills, the OurSCHOOL evaluation system for the continuous monitoring of student outcomes, and Confident Learners, a whole-school literacy program. Dr Willms developed the assessment framework, Educational Prosperity, which several countries are using in their capacity-building efforts and for the development of educational policy. Dominic Wyse is Professor of Early Childhood and Primary Education at University College London (UCL), Institute of Education (IOE), and Academic Head of the Department of Learning and Leadership. Dominic is a Fellow of the Academy of Social Sciences (FAcSS), incoming Vice-President, then President (2019–21) of the British Educational Research Association (BERA), and a fellow of the Royal Society for the encouragement of Arts, Manufactures and Commerce (RSA). The main focus of Dominic’s research is curriculum and pedagogy. Key areas of work are the teaching of writing, reading and creativity (e.g. How Writing Works: From the Invention of the Alphabet to the Rise of Social Media. Cambridge University Press). Dominic has extensive experience of funded research projects which he has disseminated in numerous peer-reviewed research journal articles and books (e.g. his research paper: ‘Experimental trials and what works? in education: The case of grammar for writing’ (2017)). These books include major international research volumes for which he is the lead editor (e.g. The BERA/SAGE Handbook of Educational Research (2017) and The SAGE Handbook of Curriculum, Pedagogy and Assessment (2016)), and bestselling books for students, teachers and educators (e.g. Teaching English, Language and Literacy (4th edition, Routledge, 2018) and A Guide to Early Years and Primary Teaching (Sage, 2016). He has been an editor, and on the editorial board, of internationally recognised research journals. From 2012 to 2018 he was an editor of the Curriculum Journal, one of the journals of the British Educational Research Association (BERA).

Notes on the Editors and Contributors

xxvii

Rui Yang is a Professor in the Faculty of Education at The University of Hong Kong. With nearly three decades of academic career in China, Australia and Hong Kong, he has gained extensive experience and contributed to leadership, with an impressive track record on research at the interface of Chinese and Western traditions in education. He has established his reputation among scholars in English and Chinese languages in the fields of comparative and international education and Chinese higher education. Frequently called on to deploy his cross-cultural knowledge and expertise globally, his international reputation is evidenced by his extensive list of publications, research projects, invited keynote lectures in international and regional conferences, leadership in professional associations and membership in editorial boards of scholarly journals. His research bridges the theoretical thrust of comparative education and the applied nature of international education. Pablo Zoido  is education lead specialist at the Inter-American Development Bank (IDB), where he works to improve the education systems in Latin America and the Caribbean. Before joining the IDB, Pablo worked as an analyst at the Directorate for Education and Skills of the Organization for Economic Cooperation and Development (OECD), where he provided advice to governments and education stakeholders on how to use assessment and evaluation tools such as the Programme for International Student Assessment (PISA) to improve the quality, equity and efficiency of education systems. Pablo has published in academic journals in economics and education on issues ranging from democratic checks and balances, the informal economy or educational equality and opportunity to learn.

This page intentionally left blank

Preface Larry E. Suter

The field of Comparative Education research continues to produce a large volume of studies about all aspects of the practices of education in different cultures. One purpose of this new handbook is to critically examine the quality of claims about cross-national discoveries about learning and instruction. Another purpose is to provide guidance to anyone tempted to seek knowledge about educational practices in other nations. The rationale for this new handbook is based on the editors’ belief that comparative studies in education are frequently used to influence government and education policies at both national and international levels but that the interpretation of evidence by policy makers may not always be reliable, whereby authors and educational policy makers do not always share common understandings of the content of the studies that are the source of their discussion. Other scholars have assembled syntheses of Comparative Education research and there are several excellent textbooks, encyclopedias and handbooks of international education that introduce the history and knowledge base of Comparative Education. Each of these outputs has a particular strength unique to the editors and writers, who, along with those who write for this handbook, provide evidence of great diversity in approach to the study of educational practices in different cultures. The volume will critically assess the status of research methodology and knowledge of educational practices in the field of Comparative and International Education from multiple points of view. The authors of the chapters have been chosen for their expertise in methods of research and for their specific expert knowledge of disciplines such as economics, sociology, psychology, educational policy, philosophy, and political science. The authors have cast a critical lens on how curriculum, assessments and policies are organized today in Comparative Education research. They will discuss theoretical diversity within the discipline and examine integrity and intellectual coherence with a view of guiding future research. This assessment will provide practical guidance for students and experienced researchers for conducting future research about educational policies and practices across countries. The editors of this handbook were stimulated to produce this volume by the expansive application of large-scale survey research methods to measuring student achievement in recent years. The study of educational practices around the world, once limited to observational descriptions like travel reports, has been injected with a large number of repeating ‘empirical’ surveys of students, teachers and parents. While these studies compose a small segment of the total number of publications about comparative education, they have received significant attention by policy makers in nearly all countries. By the turn of the 21st century, while survey methods had matured to reach high levels of quality in operationalization, the results from these studies were often at risk of oversimplified presentation and containing many hidden assumptions about student responses to self-reported surveys. This handbook was initiated by a belief by the three editors that a significant number of misunderstandings were to be found in public media and

xxx

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

among academic researchers about how to interpret studies about the nature and the practice of education in other nations. Handbooks such as these provide a source of direction for other researchers about what is important or what has been shown to be of less value. They contain definitions of complex areas of analysis and critical reviews of the current state of research that may not always be definitive but have authority in the excellence of the quality of thinking represented in their pages. This handbook was developed within this tradition of exploring the current state of knowledge in how we learn from one another in cross-national studies. Who is the audience for a new handbook? We believe that novice researchers in every country have easy access to statistical and qualitative data about education in other countries because of the access to all forms of information allowed by modern technology. The number of such researchers is large and growing. From an organizational point of view, the field of Comparative Education is defined by the members of more than 70 professional associations and journals of Comparative Education around the world who publish in journals such as Compare, Comparative and International Education and Comparative Education, and by courses on International Comparative Studies taught in the world’s leading universities. They, and their instructors, could benefit from listening to the wisdom of other researchers, as contained in this handbook. The editors do not claim that the authors of this book have reached final agreement on the complex issues of how educational knowledge is obtained; instead they represent a sample of the approaches to knowledge building. Significant divisions in philosophies of knowledge are apparent among the authors of published articles about international comparative studies. The diversity of beliefs is especially large between those who conduct or use the results of large-scale survey research and those who examine the broad aspects of global economic, political, and educational systems through participatory observation or examination of historical records. Healthy debate of the tendency of government policy makers to rely on results of large-scale comparative research studies to drive educational policies has stimulated new research efforts to more clearly define the meaning of survey results. The history and consequences of different approaches to Comparative Education has been noted in several systematic reviews by leaders of the field of Comparative Education (Bray, 2010; Cowen, 2014). Researchers with a background in one scientific or historical domain may not be as likely to communicate frequently with researchers in other domains. Therefore, critical analyses of the intent and use of all types of comparative studies will be useful to sort out the possible implications for the best approaches to student learning. By including as many examples of these different approaches, and of critical analyses of each in this volume, we have attempted to address some of these gaps in communication and understanding. The editors of this handbook worked together across great distances to organize and write the chapters of this handbook. Each editor brought unique professional experience and personal commitment to the project that required broad knowledge of educational content areas, familiarity with scholars in different geographic areas, and who held varying types of ideologies about methods of research found in the field of Comparative Education. These editors identified 55 authors for the 32 chapters who were at least partially representative of a broad spectrum of researchers in the field. Since the authors resided in 13 countries, writing and editing a handbook of such scope was possible because of the internet, which allowed connections between various electronic devices with ease. Communication between authors and editors about the content of the chapters occurred over long distances and over a dozen time zones.

Preface

xxxi

Our goal is to provide a bridge for those who industriously conduct all forms of research on student achievement, attitudes, and performances with those who outline the broader social, economic and political systems that also drive the development of educational theory and policy. The interaction of different approaches to the world’s differences in preparing the new generation may inspire young researchers to create a new synthesis of educational knowledge.

REFERENCES Bray, M. (2010) Comparative education and international education in the history of Compare: Boundaries, overlaps and ambiguities. Compare: A Journal of Comparative and International Education, 40(6), 711–725. Cowen, Robert (2014) Comparative education: Stones, silences, and siren songs. Comparative Education, 50(1), 3–14, DOI: 10.1080/03050068.2013.871834 To link to this article: http:// dx.doi.org/10.1080/03050068.2013.871834

This page intentionally left blank

Methods and Practice in Comparative Education Research: Models, Ontologies and Epistemologies L a r r y E . S u t e r, E m m a S m i t h a n d B r i a n D . D e n m a n

INTRODUCTION This handbook is addressed to educational researchers who are interested in the study of education theory and practice in countries around the world. This introductory chapter is an overview of the handbook chapters and a reflection on the status of research in comparative education in 2018. The editors have had a chance to review each chapter and to discuss the implications of their content for informing future research activities in comparative education. Here, the editors of the volume review the contributions of all authors and integrate their remarks with the other contributions to the field of comparative education. One premise of this review is that the different methods of research that are found in comparative education were established through lengthy development and are supported by networks of scholars who share

common experiences and technical knowledge. Thereby, each of these approaches to understanding educational processes across cultures should be acknowledged as legitimate means of investigation and given an opportunity for explanation and debate. The hope of the handbook editors is that, by bringing different approaches to comparative education research together in one resource volume, a better understanding of the goals, methods, and values of all researchers may be gained. The chapters give evidence that the field of comparative education research is a lively enterprise that is driven by a desire to improve educational practices and equity of access to educational opportunities in all corners of the world. Altogether, the chapters contain an overview of comparative educational research methods, ideologies, and policy uses. The subject matter includes

xxxiv

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

views on the status of the field itself; reviews and analyses of instruction and learning practices across different cultures; commentary on social justice issues such as gender and social-economic equity; and the discussion of the consequences of the dramatic growth in number of large-scale comparative surveys of education. Other topics include crossnational studies of pre-school practices, the content of primary and secondary curriculum, experiences in after-school learning, uses of technology, preparation for science and technology careers, the development of student attitudes, and the preparation of teachers. The chapters were written by scholars who have extensive experience in conducting international comparative studies of education. Most of the authors are instructors in higher education institutions with specialties in the fields of educational psychology, social psychology, economics, statistics, teaching methods, curriculum, global education, and qualitative methods of research. Collectively, the authors have lived in, worked in and studied diverse cultures around the world. Therefore, comparative researchers are especially sensitive to the differences in living conditions across the globe and frequently discuss the responsibility of scholars to address with consistency the conditions of education in all systems. While many of the authors share common interests about how comparative studies can be useful for gaining knowledge about human behavior, they do not necessarily share common beliefs about how that knowledge can be gained or applied. Indeed, the recommendations for specific research designs are nearly always accompanied by cautions about the dilemmas and paradoxes found from conducting comparative research. Most researchers have a heightened recognition that over-simplification of country differences or inferences about causation could lead to errors in policy guidance. The different chapters illustrate how comparative education researchers apply differing ontologies and paradigms for constructing their research agenda. We try here to identify

the view of these authors and seek to find sources of conflicting conclusions. What can be learned from the accumulation of essays prepared by 57 diverse authors about what constitutes ‘good’ comparative education research (and what is good about it)? What have these chapters contributed to expanding the content and methods of ‘good research’ in comparative education? Have they, for example, defined standards of quality that can usefully guide future research endeavors in a field that includes diverse tribes and personal motivation for engaging in crossnational research (Becher & Trowler, 2001; Schweisfurth, 2014a)? The quantity of comparative education research has increased across all research paradigms. Has the increase in number of cross-national largescale surveys provided useful frameworks for exchanging knowledge about learning, motivation, teaching practices across diverse cultures? Or has interjecting large amounts of survey data into the field simply created new diversions of arcane methodological topics that increase the complexity of understanding rather than settling issues? The following discussion is organized by some of the most frequently addressed topics of debate and methods of research presented in this volume: the status of the field of comparative education; the variety of epistemologies found among the authors; the status of cross-national student assessment methodology; the use and abuse of qualitative and quantitative methods; the status of large-scale international research; what constitutes good research; and speculation about the types of studies and theory development that is likely to be conducted in the future.

STATUS OF THE FIELD OF COMPARATIVE STUDIES The history, development, and theoretical assumptions of researchers in the field of comparative education were addressed in

METHODS AND PRACTICE IN COMPARATIVE EDUCATION RESEARCH

several chapters (Suter, Denman, Carnoy, Wang, Wang& Fischman, and Power). This collection of views is representative of the diverse approaches to research currently practiced because they reflect differences in ideology and basic beliefs in what should be accepted as knowledge of educational practice. Even though the number and scope of empirical studies of education across countries has increased greatly, there is little evidence contained in this volume that standardized statistical surveys have accomplished the hoped-for consensus among educators. More than one author finds, either from empirical examination of country educational practices or from ideological reasoning, that large-scale comparative studies of education tend to be homogeneous by design and therefore are likely to be blind to uniqueness in student learning and motivation in different countries. Some authors argue that qualitative and quantitative empirical observations should be interpreted to prove that countries and individuals do not easily conform to simple generalizations. Yet a significant proportion of comparative researchers continue to approach the study of comparative education using models of scientific discovery that seek to establish generalizations about human behavior. Some analyists in this volume warn researchers to pay attention to the differences in educational outcomes as much as to common features. Others in this volume focus their attention on enduring problems of inequality of opportunity for education between countries and within their borders and express concern that the research models of the field have not yet led to solutions to these social issues. Some authors raised warnings that educational theory, methods, and policies may be too frequently driven by a few dominant models of explanation developed in ‘Western’ nations without acknowledging the cultural uniqueness of other, specifically ‘Eastern’ cultures. Others point out that the complexity and diversity of a global concept of learning

xxxv

and educational practices may require the development of completely new theories, frameworks, and practices that would maintain the uniqueness of individual cultures while acknowledging that macro changes in world access to information affects all countries. The review of the scope of published research studies in comparative education over the past 25 years shows that the geographic breadth, theoretical scope, and application of research methods of international comparative studies have all expanded; that evidence gathered by researchers has been applied to support aspects of educational practice in some countries; and that scholarly debates about the meaning of and the value for cross-national analysis has contributed conceptual development and new measurement methods to the general field of educational research. In a review of the history of research on comparative education, economics and psychometrics, Carnoy describes the changes in focus of educational policies that developed, along with the creation and publication of comparisons of student achievement. He provides a sweeping analysis of the theoretical and policy changes that occurred as the role of education in economic development became more apparent because of results of the empirical measurement of educational outcomes across all countries. He concludes that the empirical measurements from comparative surveys provide useful descriptive information without an accompanying causal understanding of what activities lead to higher student achievement in a given country. The existing studies are limited by their cross-sectional design and by measurement errors of the important educational practices that may be the cause of student performances. Thus, comparative studies as currently practiced do not provide a perfect source of scientific knowledge, but they do provide empirical evidence that enhance the development of new ideas and provide a check on beliefs that were formed without

xxxvi

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

evidence. Empirical evidence gathered by either qualitative or quantitative methods about topics such as the relationship between education and economic growth are most useful when they fail to support claims made by ideological beliefs. Developing a rationale for explaining negative evidence is a stronger guide to new knowledge than is seeking evidence to support an existing claim.

CONFLICTING THEORIES, FRAMEWORKS, ONTOLOGIES, AND METHODOLOGIES The evolution of the field of comparative and international education continues to reflect ongoing challenges in bringing together the science of measurement (empirical studies) and discovery (qualitative studies). Feyerabend (1987) once argued that the distinguishing contexts of justification and discovery create a necessary blessing in dualism, as the former concerns a systematic approach to explain content and the reasons for accepting it, while the latter tells of a history of a particular piece of knowledge. While positivism may often be associated historically with Auguste Comte (1798–1857), the founding father of positivism and sociology, the nature of comparative education continues to maintain its interdisciplinary approaches, its historical roots in the social sciences, and its breadth and scope in modes of inquiry, suggesting that differences in space, time, reality, measurement, and change are all relative. While much of this handbook problematizes large-scale measurement, the qualitative aspects of this handbook tend to reflect how the field continues to mature and evolve. Park, Lee, and Yang share their current frustration that sound measurement is necessary for comparative education research to realize its importance in stages of educational development and advancement, and Denman reflects on the vacuous ‘shelf’ life of

research, contending that there is still much opportunity in building greater robustness and veracity in comparative education research through increased collaboration, discussion, and depth. Turner, Hathaway, and Ferrer-Esteban, Suter, & Mincu actively engage in qualitative and quantitative studies in comparative education, which take ‘mixed methods’ design to the next level by addressing problems of experimental design, sampling, and data analysis by means of application. Unabated dissatisfaction continues concerning the rate and speed of change for research in comparative studies in education. The field is no different, and most comparative scholars question the likelihood that the field will reach higher levels of understanding and awareness of its significance. Notwithstanding the need to explore how comparative education is taught as a field – with research centers promoting global, international, and multicultural education to help differentiate approaches – there is an increasing body of evidence suggesting that research elevates the potential within the field. Accordingly, the field is not necessarily to delimit, restrict, or control types of comparative education research, but to recognize that there may be limitations to existing methods and techniques in exploring education phenomena, different perspectives, and different discourses. This dissatisfaction is also a driver to excel and help the field to realize its potential. As such, the field will continue to evolve and mature.

MEASUREMENT EPISTEMOLOGIES About a fourth of the chapters in this volume contain discussions about qualities of different research methods. The significant attention to aspects of research methodology illustrates how much scholars in comparative education hold varying beliefs about the

METHODS AND PRACTICE IN COMPARATIVE EDUCATION RESEARCH

fundamental basis of knowledge. Those who conduct research in and about cultures outside their own field of residence may be particularly drawn to holding a broad view of the nature of knowledge. Different epistemologies are clearly present in the entire field of comparative education, ranging from empirical positivism to ideological dogmatism. This epistemology of education research has been defined as ‘understanding about its core nature as a scientific endeavor’ (Shavelson & Towne, 2002: 15). A thorough review of methods of research about all forms of education found that even the basic understanding of the nature of knowledge has itself changed as knowledge of human behavior has accumulated and the depth of understanding of human nature has evolved (Shavelson & Towne, 2002). Thus, the chapters in this volume reflect recommendations of each author about ways to improve methods of observation and reporting of educational practices in settings of widely diverse languages, historical backgrounds, geographic settings, and economic conditions. The methods of observation are an important aspect of comparative research that is continually evolving. One set of research methods attempts to design a common framework of learning and teaching that is applied rigorously in the study of diverse cultures, while other methods accept the diversity of learning and teaching as a given and seek methods to describe them within their own context. The recent rapid growth of large-scale surveys, acknowledged and documented in these chapters, may have exacerbate differences of opinion among scholars about the source of knowledge. Certainly, the increased number of publications and sophistication of largescale cross-national studies has increased the need for a larger number of researchers to appreciate the strengths and limitations of survey data collection and analysis. At least ten chapters in this volume (Ainley & Ainley, Praetorius, Rogh, Bell, & Klieme, Law & Liang, Rutkowski & Rutkowski, Sälzer &

xxxvii

Prenzel, Cogan & Schmidt, Solano-Flores, van de Vijver, Jude, & Kuger, Wang, Degol, & Guo, and Willms & Tramonte) present a discussion of particular aspects of quantitative research methods in cross-national research. These authors reflect on the measurement of achievement, attitudes toward education, change over time, curriculum, language influence, and social status in largescale survey research. Recommendations from these authors are based on extensive experience with conducting cross-national research. Their comments include cautions about drawing too many conclusions from cross-national studies as well as clarifications of techniques that improve quantitative analysis.

THE APPLICATION OF QUALITATIVE AND QUANTITATIVE RESEARCH METHODS As in all social sciences, research methods may be divided into two research traditions: quantitative research that uses measurement and numbers and qualitative research that emphasizes meaning and words (Wikipedia, 2018; Wyse et  al., 2017). A review of the content of chapters in this handbook shows that the comparative study of educational practices around the world cannot be undertaken without an understanding of the limitations and strengths of both qualitative and quantitative research methods. As Schweisfurth points out, making comparisons between countries’ type of educational practices has been underway for centuries (Schweisfurth, 2014b: 15) by means of travel reports. In the field of comparative education today, individual researchers differ in their degree of emphasis on words or numbers because researchers conduct and publish their studies using the traditions under which they were trained, such as the humanities (history and philosophy), the social sciences (psychology, anthropology,

xxxviii

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

sociology, economics, political science), or mathematics (statistics). Each of these disciplines represent a complex collection of belief systems, training periods, and research practices (Becher & Trowler, 2001; Schweisfurth, 2014a). No single set of established rules for research analysis has been consistently applied to evaluation of research in this field (Cowen, 2006). Some policy makers in educational theory and research have argued that qualitative and quantitative research methods can co-exist in the same paradigm of scientific research in education (Feuer, Towne, & Shavelson, 2002). Yet, for practical as well as basic belief reasons, individual researchers are more likely to be more expert in one technique than the other. Professional life is limited by the time available to learn the rules of applying different models; the peer review process requires sophisticated knowledge and a specific vocabulary of a method; and the types of research question that interest an investigator may not invite the use of one method over the other. Several chapters in this volume address specific issues in making quantitative measurements valid and reliable (Rutkowski & Rutkowski, van de Vijver, Jude, & Kuger, Willms & Tramonte, Ainley & Ainley, Praetorius, Rogh, Bell, & Klieme, and Power). The authors reliably define research procedures and techniques that quantify cognitive, non-cognitive, and student or teacher behaviors reliably across cultures as well as assess the successes of existing quantitative studies. Some of the authors are skeptical about the ability of student self-reported evidence of their characteristics to be consistent across different cultures (Rutkowski & Rutkowski). Others realize that errors of reporting occur naturally, and they seek to provide guidance for improving the reliability of the survey process in light of potential random errors (Willms & Tramonte and van de Vijver, Jude, & Kuger). The originating psychometricians and policy makers who initiated large-scale comparative research in the

1950s believed that quantitative measurement was preferred to the collection of stories and anecdotes about education. The objectivity of common survey instruments was believed to improve the accuracy of observation (Suter, this volume). However, some areas of educational outcomes, such as writing skills, were found to be too closely intertwined with cultural patterns to be analyzed across countries by a common evaluation framework – although certainly such efforts were made (Gorman, Purves, & Degenhart, 1989). One chapter in this handbook addresses the meaning and use of qualitative methods directly. Schweisfurth writes that some researchers who apply qualitative methods believe that isolating specific variables, as is regularly done in quantitative research, loses the meaning of educational experiences. She explains that subjectivity is ‘paramount in understanding why students are successful’. Subjectivity would include how education is valued, the meaning of relationships between teachers and learners, and meanings shared between learners. She argues that comparative research must include both insider and outsider cultural perspectives to understand the ‘inner workings and influence of context’. She continues arguing that to understand why people are doing what they do, we need to go beyond what is readable, observable and countable to understand the wider and more subtle workings of context.’ Methods such as ethnography can help the researcher to understand these through the perspectives of the actors involved and to glean why differences between policy and practice and between different settings are evident. Qualitative research may be particularly valuable in comparative studies of education for showing how different contextual factors work together holistically (not simply exist or correlate, and not necessarily work as ‘causal factors’). Since educational practices may be influenced by several different levels of policy making (national, local, classroom, other students), it is necessary to examine the relationships

METHODS AND PRACTICE IN COMPARATIVE EDUCATION RESEARCH

between each of these variables to learn how individual policies are interpreted and acted upon. Finally, Schweisfurth points out that the methods of qualitative analysis of crossnational studies have been changing, and she describes how traditional approaches to analysis are being replaced by new revisions. Yet, she says, as others have also proclaimed, both quantitative and qualitative methods may be used together for the better understanding of educational policies. The research studies reported in this handbook are rarely classifiable as either purely qualitative or quantitative. Although more chapters profess to discuss quantitative methods of data collection and interpretation of cross-national surveys, the content of the chapters often includes uses of qualitative methods, such as case studies, syntheses, or classroom observation, to generate hypotheses or explanations. The analysis of multiple research studies is a qualitative process which depends more upon the interpretation of words than on the discussion and analysis of numerical measurement. ‘Mixed methods’ has become a term that appears frequently in publications. A single research project is likely to apply methods that analyze words and methods to achieve the ability to count actions. Case studies and ethnographic analyses are sometimes summarized by counting frequencies of behaviors, for example. And vice versa, quantitative studies occasionally seek examples from case studies to illustrate the behaviors and overall systems that are captured by single statistical measurement. As Darwin found in his exploration of finches in the Galapagos Islands, when one species of birds is confined to a single island, over time it is likely to multiply into many species rather than emerge as a single weeded-out species. Thus, it may be true also for research methods in comparative education. Over time, the number of combinations of different approaches to investigations increases as the types of discoveries and problems change.

xxxix

ASSESSING STUDENT ACHIEVEMENT ACROSS COUNTRIES Large-scale internationally comparative surveys have been conducted at least once in over half of all countries in the world. These assessments of students in primary and secondary schools have been supported by a broad number of funding agencies from individual governments, international organizations (e.g. OECD, UNESCO, World Bank), and private foundations. Average individual country levels of student achievement are no longer confined to nations with large populations or advanced economies. The use of comparative assessments of student performance in countries of lower economic status has increased. Moreover, new comparative studies are being designed while previous studies are still in the process of analysis and interpretation (PISA 2021 and TIMSS, and the PISA new study for lower income countries, for example). Consequently, the body of knowledge about research practices for making comparisons is increasing as researchers learn from experience and develop improved technologies and conceptual frameworks for cross-national understanding. Whether knowledge about educational achievement or school management has increased because of the availability of extensive survey information about students is a question addressed by many of the chapters in this volume. Researchers, educators, and psychometricians are concerned with establishing the validity and meaning of cognitive assessments designed in one culture and collected in another. The assessments are based on student reports of what they know about a field of study (such as mathematics, science, reading, or geography). Chapters in this volume address topics such as whether student assessments in mathematics, science, or reading are comparable from one country to another, and whether relationships between student school experiences and achievement in one country

xl

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

(essentially a large, highly developed country) are of value for policy making in other, perhaps less developed, countries. Several chapters in this volume review the history and growth of large-scale international surveys of student achievement (Carnoy, Rutkowski & Rutkowski, van de Vijver, Jude, & Kuger, Fischman, Sahlberg, Silova, & Topper, Suter, and O’Dwyer & Paolucci). Each author reviews a particular aspect of the growth of these surveys with different analytical points of view, but all acknowledge that the methods of study have evolved over time in response to both policy needs and growth in research knowledge about the measurement and practices of student assessment. Chapters by van de Vijver et  al. and Rutkowski and Rutkowski address specific issues about using cross-national studies to reliably and validly measure student achievement. Van de Vijver et  al. conclude that while valid measurement of achievement is theoretically possible, many details in the use of language must be considered when designing and conducting cross-national assessments. The authors conclude that the frameworks and systems of data collection have improved and progressed to higher levels of quality over the years of conducting international comparative assessments. However, the authors caution that differences occur in content coverage, student interpretation of items (because of dependence on self-response), differing perspectives of what education is, and that these perspectives change over time. Thus, making adjustments to measurement methods will be necessary on a continuous basis. The Rutkowskis note that the use of the large-scale surveys PISA and TIMSS has expanded to as many as 90 countries at once, yet the test items have developed at an international rather than a local level. Thus, the assessment of student performance is less likely to be valid. They point out that these studies have progressed over the years and that future changes will be necessary as a larger number of countries are engaged. In

a similar vein, O’Dwyer and Paolucci note that in studies of the relationship between teaching practices and student assessment the relationships found in large-scale surveys are weak and contradictory; probably, they say, because the assessments do not adequately reflect the unique educational experiences of students at the local level. The discussion of the expansion of student assessment comparisons in Latin America by Viteri and Zoido illustrate the significant influence of global agencies in urging governments in all countries to apply methods of assessment and evaluation for the purposes of accountability. Their presentation is a useful illustration of how individual countries and regions may adopt portions of educational assessments from other countries but adapt them to the specific circumstances of the local cultures.

ASSESSING ATTITUDES While student achievement has been defined mainly by student performance on a test of skills in mathematics, science, reading, or technology, also of concern to educators are the ‘non-cognitive’ attributes or attitudes of students. As the authors of both chapters on attitudes in the volume state, comparative studies have shown a persistent across country relationship between a student’s positive attitudes toward a subject and their test performance on that subject. Wang, Degol, and Guo explore the differences in both theory and evidence between Eastern and Western cultures in how student motivation is linked with achievement. They employ the ‘expectancy value’ theory as a means of untangling the relationships between attitudes and achievement. Expectancy beliefs are those that are related to an individual’s expectations for success in the future and include a self-concept of ability. Task values, on the other hand, include attainment value, utility value, intrinsic value, and cost associated

METHODS AND PRACTICE IN COMPARATIVE EDUCATION RESEARCH

with a particular subject domain. They cite research studies that have tested the relationship of these motivational concepts to achievement in Eastern and Western cultures and find that the concepts apply equally, more or less, but that the task value measurements are especially important for predicting the achievement levels of females. They also cite evidence that expectations and task goals developed early in childhood are likely to predict the level of motivation later in life. However, in most cultures, interest in mathematics tends to decline between primary and secondary school, especially for boys. Likewise, Ainley and Ainley discuss the development of motivational constructs that have been used in international comparative studies of reading, mathematics, and science. Their chapter pays particular attention to the achievement paradox first observed in TIMSS, in which the average level of achievement of a country is negatively related to the average level of interest in a subject; whereas, within countries, the relationship between interest and achievement is positive. Attempts to explain this paradox by examining biases in student reporting on the various attitude and achievement scales were unsuccessful. Ainley and Ainley then propose that cultural differences in the conception of attitudes explain the paradox. They write: In sum, part of the explanation of the attitude– achievement paradox lies in the ways that macrocultural values provide a context within which students develop their attitudes in relation to schooling domains. The examples described here locate these cultural context effects in the degree to which science and technology have been adopted and are embedded in the culture, more particularly, students’ access to science-relevant learning opportunities.

This observation of how the macro-economic and social conditions of a country affect how a student creates interests and goals is a theme that is repeated throughout this volume of international differences. Individuals react to the culture around them in ways that affect not only their achievement but also in the

xli

social-psychological processes that are a part of the determining features of their performance. Some of the country-level characteristics that affect students’ career choices are explored in greater detail in the chapter by Suter and Smith. Their analysis of PISA 2015, for example, shows that the relationships between the country-level average of students’ interest in specific careers, such as scientist, engineer, or in the health fields, and the average achievement levels are positive for some occupations and negative for others. Their analysis of the interest of 15-yearolds in science and mathematics occupations or as natural scientists or software technicians, is higher in countries with high general achievement levels while students who are most likely to choose a career in engineering or technology professions are in lower achievement countries. Thus, the relationship of both interest levels and achievement levels of students to science career choices varies by specific occupation and the country’s economic development level. These observations and the investigations of Ainley and Ainley suggest a rich area for future research on the aggregate conditions of countries that affect the performance and career choices of students.

EQUALITY OF OPPORTUNITY AND SOCIAL JUSTICE Equality of opportunity encompasses many types of social and economic circumstances. Not all of these have been addressed specifically in this handbook on international comparisons in education. But several chapters address the differences in student achievement and attitudes as they are related to their family social status and the economic conditions of the countries in which they reside. Some of the discussions are about individuallevel status and others are discussed at the global level of economic differences between

xlii

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

countries and regions (e.g. Suter & Smith on careers, Willms & Tramonte on status measurement, Carnoy on economic development, and Lee on the balance of power). The chapter by Willms and Tramonte is a lesson on how to use, apply, and interpret scales of social status in large-scale surveys. The scales are found in most large-scale surveys of student performance. The measure of social status includes the educational level of parents and their occupation. The income level of the student’s family is also a potential measure of social status but is rarely collected in surveys of students in schools because students cannot be expected to report such sensitive information reliably. This chapter is useful to those who wish to examine survey data from the point of view of assessing differences in access to education by students at different levels of social status. The authors examine the characteristics of the scale itself, its construction, and how cut-off points might be created to measure sub-populations, such as those in poverty. Status levels have been used to set attainable goals for aggregates of students (those classrooms, schools, districts, or larger geographic units) but not individual students. The levels may assist in assessing the equity of provisions among advantaged and disadvantaged groups and the equality of outcomes. Status levels may also be useful for designing interventions aimed at reducing inequalities. In his chapter on enduring issues, Wing On Lee evaluates the overall reduction in differences in educational opportunities. He particularly assesses successes of the efforts of the international organizations to set goals and achieve reductions in inequality. He finds that little has changed over the years because leaders have had a narrow vision of education, they have ignored childhood care (see also Raikes, Davis, & Burton, this volume), and they do not sufficiently emphasize the quality of learning. He finds that females have not achieved equality and says that investment in education worldwide

is insufficient. Studies of the economic conditions of countries and the relationships between achievement scores and parental background conducted by PISA have shown that equity is higher in some countries than in others. Lee seems to believe that the evidence across countries illustrates that those countries which have high student achievement levels are most likely to grow economically and have fewer inequalities. His use of large-scale research to refine his understanding of the size of country-to-country differences and their relationship to underlying economic characteristics is worthy reading. Some of the same points are made in the chapter by Colin Power, who writes about how international organizations use ‘indicators’ to point out problems underlying a country’s social issues. He writes that indicators only suggest where to look for areas of need but are not agents of the political will to make the necessary changes. Like Lee, Power uses available empirical data about education on a worldwide basis.

CONTENT OF EDUCATION Some specific educational activities of primary and secondary schools that are discussed in eight chapters. The chapters discuss preparation for early grades (pre-school), the issue of considering curriculum differences in analyses of achievement, the amount of after-school experiences, the use of technology, the types of instruction, differences in preparation for science occupations and careers, preparation of teachers, and preparation for careers in higher education. Only one of the chapters includes a statistical analysis of large-scale survey data while the others review case studies or international organizational policies toward educational practices and monitoring of worldwide changes. No author found evidence of improvements in the level of access to educational opportunities (although some research methods for

METHODS AND PRACTICE IN COMPARATIVE EDUCATION RESEARCH

such discovery are discussed by Willms and Tramonte). The analysis of cross-national differences in student interest in science careers finds paradoxes in relationships between interest and opportunity. For example, students in some countries (such as Peru) had a high level of interest in science occupations but were offered few opportunities for science occupations compared with those offered by other countries. On the other hand, highly advanced countries, such as France, have a lower proportion of students expecting to seek careers in science and technology. Wyse and Anders examined whether and how international comparative research has paid attention to primary education curriculum as a matter of analysis. They point out that early discussions and analysis of curriculum have been qualitative and focused on the development of language itself through proper curriculum design. Later studies of curriculum design and presentation in schools used a quantitative design. The study, however, shows that the adoption of common concepts of primary curriculum that can be applied across countries has yet to be developed and used by educational researchers. In a similar vein, Cogan and Schmidt argue that many recent comparative studies fail to take into account the changes in opportunities to learn based on the content of the classroom curriculum. They had been engaged with measuring curriculum coverage of many countries in the TIMSS 1995 and in later PISA surveys. They believe that student achievement cannot be properly understood until the opportunities for students to learn specific content is accounted for in all largescale studies of student performance and that measurement of the effects of student time devoted on a topic requires measurement of more than one point in time. Two chapters include analyses of practices and change in how well young children are prepared for formal schooling around the world. One by Hathaway (chapter 24) is a case study analysis of pre-school practices

xliii

and policy in British Columbia, Canada, and Singapore to learn about policies regarding transition into public education. The other by Raikes, Davis and Burton reviewed global policies of early childhood development, examining the policies and indicators conducted by UNESCO, the World Bank and the United States to improve early child development. Both chapters use cross-national experiences to argue for how to develop an adequate model for analyzing early childhood preparation for school and neither author believed that current statistical monitoring is sufficient to inform public policy about early child preparation for schooling. Akiba, Howard, and Liang conducted a literature review of studies of teacher learning communities (such as ‘Lesson Study’ around the world). They reported that most published papers were published in a few English-speaking countries (although some were published in Chinese) and limited the analysis to a single country. They found an insufficient exchange of ideas across countries to develop the knowledge of learning communities into a valuable tool for education. They also examined a recent largescale survey of teacher practices (TALIS by OECD) to describe country-to-country differences in teacher discussions about students. They reported that teachers in countries that reported concern about individual student performance were also more likely to exchange ideas with other teachers about how best to assess the students. Their statistical portraiture of country differences in the extent to which teachers have time to exchange teaching methods with each other is astounding and worthy of further study. The use of computers and other forms of technology in classrooms has been a subject of comparative research since the birth of small-scale computing in the 1980s. Law and Liang (this volume) reviewed the results of 12 international comparative studies by the IEA and OECD on technology use. They find that early studies were concerned about the preparedness of students and schools to

xliv

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

use ICT. Later studies focused more on the pedagogical adoption patterns of schools and teachers and measured aspects of student knowledge of computer and information technology. They report that educational technologies have changed the priorities of student-centered learning to include a greater amount of collaboration and inquiry. They propose that future ICT studies examine the amount of progress in educational transformations of school systems. Educational experiences outside of formal schooling occurs widely around the world in informal settings, such as museums and libraries, but also in organized study programs. Asian societies are especially likely to provide some form of tutoring to students who are preparing for exams. Feng and Bray (cp. 20) provide a thorough review of the types of outside-school-time (OST), sometimes called “shadow education”, that students in different countries have participated. They review the various ways that students may have experiences that affect their educational growth and they outline specific research areas that must be addressed in studying the significance of “shadow-education”.

CONCLUSION: INFLUENCE OF INTERNATIONAL The authors of this volume are more likely to see the study of other cultures for the purposes of learning about ‘good educational practices’ to be a matter of contention than a guide for immediate policy implementation. Establishing causality with cross-national comparisons is difficult because of the wide range of conditions within countries that are not identified a priori and therefore not included in analyses. Much of comparative research leads to possible dead ends and misdirection. It will be a challenge for future researchers to invent better methods of identifying the specific properties of countries that provide useful insight about practice.

The introduction of survey research into the province of cross-national comparison policy making has created a small industry of method development and policy development. The OECD and IEA promote more analysis. Many countries have participated, and the number of student assessments continues to grow. The impact of these studies on educational policy may be debated but their influence cannot be completely ignored. National leaders and education policy makers will need to become well informed about the strengths and limitations of large-scale quantitative studies (see Power, this volume). Will the differences in interpretation of observations be solved by a more rigorous application of the scientific method? Testing hypotheses was not found to be a common activity by the authors of this volume. The acquisition of knowledge occurs through the exchange of ideas and argument more frequently than by the use of clearly aligned data. What constitutes ‘good’ comparative education research? One of the leaders of the field of comparative education argues that the field is an academic exercise without the need to be relevant to the policy world. He wrote: Comparative education as an academic field of study does not fix educational things when they are broken; it does not service the needs of Ministries of Education; it is not a branch of policy studies; it is not reducible to sociology, or to political science, or to history; it has not yet succumbed to the one true way of a specified methodology; nor has it accepted the seductive but corrosive position of claiming for itself disciplinary status in the terms defined so carefully by London philosophers of education. (Cowen, 2006: 592)

The use of any form of empirical approach to evaluate the effectiveness of social programs in education, health, or other systems has been found to be extremely difficult (Rossi, 1987). After years of evaluating United States social interventions during the 1970s and 1980s, Peter Rossi invented the ‘metallic laws of evaluation.’ He wrote in the Iron Law that the possibility that a social program

METHODS AND PRACTICE IN COMPARATIVE EDUCATION RESEARCH

would have a positive effect is zero (Rossi, 1987). He later revised his Iron Laws of Evaluation to state that he believed that the evaluations of social programs were generally not believable (he wrote, ‘The findings of the majority of evaluations purporting to be impact assessments are not credible,’ Rossi, 2003: 5). While the wisdom of a researcher and evaluator developed four decades ago may appear dour today, Rossi’s expressions of distress with the capability of empirical research to reach solutions should be taken seriously by those who are engaged in cross-national research. Educational research in general must contend with an extremely wide range of human issues and the conceptual frameworks research methods and techniques for studying student learning, teaching practices, or psychology, sociology, political science, economics, computer science, engineering, the physical sciences and the humanities such as history and philosophy. Each of these fields has its own set of standards and social networks that interact with the research in education. Moreover, the corpus of knowledge about education resides not only in university settings but also in government and non-governmental agencies that monitor, evaluate, and set goals for educational institutions. Therefore, establishing a single definition of what is ‘good’ in the field is nearly always a result of a compromise of goals and a recognition of the significant social networks that ultimately receive the results of the research, either passively or actively engaged in using the research results. Recommendations for how to set standards for educational research can be found in many academic and policy-making resources. Publishing companies, such as Sage, have created ‘handbooks’ in nearly every field of science and social science to synthesize the field of research methods (such as The BERA/ Sage Handbook on Educational Research Methods; Wyse et al., 2017). Shavelson and Towne, writing for a panel of the US National Academy of Sciences, which investigated the status of educational research to respond to

xlv

attacks on it by politicians, claim that even the rigors of the scientific method are not agreed upon everywhere. The result: changes have occurred in understanding the elements of human nature and in how human knowledge grows and develops over time (Shavelson & Towne, 2002). In the history of comparative educational research, several writers have recognized that examining other cultures may involve observational biases and that means must be established to improve the quality of these observations. In 1848, Julian de Paris argued for producing an ‘education science’ (Gautherin, 1993). In 1900, Sir Michael Sadler recognized observers of educational practices were not likely to all agree (Sadler, 1900/1964). In the 1960s, Noah and Eckstein again encouraged the field to adopt a ‘scientific method’ approach to the study of comparative education (Noah & Eckstein, 1969). At the beginning of the 21st century, the American Academy of Sciences sought to define the components of a ‘scientific discovery’ in education rather broadly (Shavelson  & Towne, 2002) with the understanding that all rigorous research – quantitative and ­qualitative – embodies the same underlying logic of inference (King, Keohane, & Verba, 1994). This inferential reasoning is supported by clear statements about how the research conclusions are reached: What assumptions are made? How is evidence judged to be relevant? How are alternative explanations considered or discarded? How are the links between data and the conceptual or theoretical framework made? A high-quality research study depends heavily on how well the research question itself is formulated. Well-respected research studies have formulated a research question that addresses a specific need for information. The questions must be clearly defined and articulated to ensure that they address all aspects of the issues under investigation. Its meaning, implications, and assumptions should be clear. The research problem should reduce broad questions into their constituent

xlvi

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

parts to make the problem more accessible to other research. The research question should reflect the content of relevant literature. All research questions should be capable of being found false and to be capable of being answered through practical data collection activities. The statement of purpose should not be biased toward the outcome of the research to avoid any tendency to predispose the outcome of the results. Although none of the chapters in this handbook contains a recommendation for standards of ideal qualities in comparative educational research, the authors have provided several different models of good research design and execution in this field. For example, those supporting the use of empirical research methods of large-scale surveys, case studies, and rigorous observation of student activities in multiple cultures have adopted a model of discovery based on scientific methods. In scientific discovery, a claim is made about an educational phenomenon that is supported by previous research evidence or derived from theory and that claim is examined with new evidence such as from survey data, case study, or rigorous observation. On the other hand, some chapter authors argue that many assumptions made by educational researchers are inappropriate and should not be used to guide policy in local educational systems. Their beliefs are that standards adopted from what is known have been taken from dominant world cultures that do not accurately represent the meanings of educational interactions as understood by the local cultures (Lee, this volume). Some proposed goals for international comparative educational research that may be taken from the chapters in this volume are these: • First, contribute to the accumulation of knowledge of educational practices and methods of research in the field of comparative education; • Second, provide guidance for educational policy and practice; • Third, challenge, test, and replicate previous research studies;

• Fourth, foster the development of new theories and hypotheses about social, psychological, or economic conditions and educational achievement; • Fifth, take into account ethical considerations about how they report on the behavior of others. Park suggests that, ‘our desired goal is not only external accountability but also the justification of moral coherence, integrity, and emotional satisfaction of the work performed.’ He, like others in the field, is especially sensitive to the ethics of explaining the behavior of others from the point of view of an outsider.

Following a period of public discussion of education following the release of international comparisons of student achievement in the 1990s, some officials of government believed that international comparisons were more authoritative than other forms of educational research. Thus, the US National Research Council (NRC) was asked to investigate whether the methodologies of large-scale surveys had become sufficiently sophisticated as to improve understanding of educational practices. In their conference discussion, the question was raised as to whether countries might ‘stand as existence proofs for the possibility of higher levels of achievement’? The reporters of the conference noted that countries may differ in so many ways that a simple interpretation of cause and effect would not be practical. Still, they believed that the studies could provide insights that would lead to developing hypotheses about more effective educational practices and they could be tested for feasibility (Porter & Gamoran, 2002: 5).

THE FUTURE The discussions found in this handbook provide some clues to predicting what types of changes in the field of comparative education may occur and how the field is likely to affect both policies and educational research

METHODS AND PRACTICE IN COMPARATIVE EDUCATION RESEARCH

topics in the future. The extensive and still expanding application of large-scale quantitative assessments of student achievement will continue and are likely to expand to new countries. The psychometricians working on these studies are likely to develop new techniques (since new surveys are already in the planning stages), and as long as the results of these studies are adopted by those in positions of power to enact policies based on them, they will cause debate. So far, nearly all large-scale surveys are cross-sectional analyses of student performance measured at one point in time. While these studies permit the measurement of changes in overall achievement of a population over time, they do not allow describing the amount of change in student performances or attitudes within the same student over time. Some countries conduct their own individual longitudinal studies of student changes, but few of these allow comparisons of the nature of these changes across countries. Conducting longitudinal studies that follow up individual students over several years will be necessary to draw causal inferences. Longitudinal studies are very complex to administer, requiring methods to track individuals, and they are expensive. New methods of monitoring may be possible with the development of appropriate technological tools. Future studies are likely to use new forms of observation, such as capturing classroom instruction through video methods (see Praetorius, Rogh, Bell, & Klieme, this volume). New technologies promise to provide cleaner information about observations of classroom practices and they will also bring new issues of interpretation. The future of research in comparative education is likely to continue discussions of the consequences of gradual globalization as it impacts local institutions of education. The development of international organizations that monitor the condition of educational institutions in all parts of the world may lead toward better understanding of how

xlvii

economic growth and political stabilization is affected by the provision of education. The future will likely continue to include strong differences in views about how to apply the observations from cross-national studies to educational policies. Diversity is more obvious than similarity. Researchers examining either quantitative or qualitative descriptions of education in different countries cannot avoid noting that the combination of many forms of organization and behavior exist. Ideological and political issues will also continue to be discussed in the literature of comparative education. Some thinkers see individuals as integrated members of a large world system, while others see the world system as a problem that dominates little people. The problem of lagging economic development and how it has affected education will continue to be a dominant concern in the studies of comparative education researchers. A continuing theme among nearly all the authors of this volume is that some aspects of educational and individual student characteristics are common across countries, but that the differences between countries in achievement and in cultural factors that lead to that achievement are dominating conditions that comparative researchers must find ways to incorporate into their analysis of global education.

REFERENCES Becher, T. & Trowler, P. R. (2001) Academic Tribes and Territories. Buckingham, UK: The Society for Research into Higher Education and Open University Press. Downloaded from www.mheducation.co.uk/openup/ chapters/0335206271.pdf (10/03/2018). Cowen, R. (2006) Acting comparatively upon the educational world: Puzzles and possibilities. Oxford Review of Education, 32(5), 561–573. ISSN 0305-4985 (print)/ISSN 14653915 (online)/06/050561–13.

xlviii

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Feuer, J., Towne, L., & Shavelson, R. J. (2002) Scientific culture and educational research. Educational Researcher, 318, 4–14. http:// doi.org/10.3102/0013189X031008004. Feyerabend, Paul (1987) Farewell to Reason. London: Verso. Gautherin, Jacqueline (1993) Marc-Antoine Jullien de Paris. Perspectives: Revue trimestrielle d’éducation compare, 23(3–4), 783–798. Downloaded from http://www.nie.edu.sg/ research/publication/cieclopedia-org/cieclopedia-org-a-to-z-listing/jullien-de-parismarc-antoine (02/05/2017). Gorman, T. P., Purves, A. C., & Degenhart, R. E. (1989) The IEA Study of Written Composition I: The International Writing Tasks and Scoring Scales. College Composition and Communication, 40(2). DOI: 10.2307/358148. King, G., Keohane, R., & Verba, S. (1994) Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton, NJ: Princeton University Press. Noah, H. & Eckstein, M. (1969) Towards a Science of Comparative Education (Part II, pp. 85–122). London: Macmillan. Porter, A. C. & Gamoran, A. (Eds.) (2002) Methodological Advances in Cross-National Surveys of Educational Achievement. Washington, DC: National Academy Press. Rossi, P. (1987) The Iron Law of Evaluation and other metallic rules. Research in Social Problems and Public Policy, 4, 3–20. Retrieved

from www.gwern.net/docs/sociology/1987rossi (03/10/2018). Rossi, P. (2003) The ‘Iron Law of Evaluation’ reconsidered. Paper presented at the AAPAM (American Association for Public Administration and Management) Research Conference, Washington, DC. Retrieved from http://welfareacademy.org/rossi/Rossi_ Remarks_Iron_Law_Reconsidered.pdf (10/05/2018). Sadler, M. (1900/1964) How far can we learn anything of practical value from the study of foreign systems of education? Comparative Education Review, 73, 307–314. Schweisfurth, M. (2014a) Among the comparativists: Ethnographic perspectives. Comparative Education, 50(1), 102–111. Schweisfurth, M. (2014b) Data and dogma: Knowledge and education governance in the global age. Comparative Education, 50(2), 127–128. http://doi.org/10.1080/03050068. 2014.892754. Shavelson, R. J. & Towne, L. (2002) Scientific Research in Education. Washington, DC: National Academy of Sciences. http://doi. org/10.17226/10236. Wikipedia (2018) Quantitative and Qualitative Research Methods. https://en.wikipedia.org/ wiki/Quantitative_research. Wyse, D., Selwyn, N., Smith, E., & Suter, L. (Eds.) (2017) The BERA/SAGE Handbook of Educational Research. London: Sage.

PART I

The Status of Comparative Education Research

This page intentionally left blank

1 The Status of Comparative Education Research in the 21st Century: An Empiricist’s View Larry E. Suter

INTRODUCTION Several thousand university scholars, organizational leaders, college students, and government employees regularly join professional associations about comparative education and thereby identify themselves as ‘comparativists’ in education (Schweisfurth, 2014). This chapter attempts to assess the current and past status of this ‘field’ of study. It examines what scholars have produced and what topics are of concern to educational researchers in this field. All that one chapter can accomplish is to attempt to present an honest accounting of research conducted by other scholars. Indeed, entire handbooks and encyclopedias have been created to provide summaries of significant research about educational practices and outcomes in the world at large. Any review of a large domain of academic research and development will be influenced by the background and experience of the author (see, for example, Heyneman &

Lykins, 2007, for another view). This chapter introduces the field of international comparative analysis in education from the experience by a former government administrator who contributed to decisions about the direction of educational research in large-scale international studies and as a researcher of statistical surveys in comparative education. The results of educational research are not necessarily assigned to dustbins of unread journals. They may be applied to making changes in teaching, administration, or student assessment. Research studies may have consequences beyond the publication career of a particular researcher. Consequently, a review of how comparative education scholars have approached the study of education across countries must include a critical examination of theory, method, and policies used for understanding all forms of educational practices. Any contribution that may emerge from an examination of research should identify unsuccessful as well as successful methods of study of educational systems.

4

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

BACKGROUND The field of comparative education contains an extremely wide variety of philosophical approaches, research methods, disciplinary foundations, and understanding of basic concepts because its scholars may be trained in any disciplinary background and may reside in any region of the world. Even the meaning of ‘national’ in ‘international’ is likely to be defined differently, such as by formal boundaries or by cultural identities. Researchers and courses taught in the field of comparative education may have either an orientation that leans toward humanities (history, philosophy, languages) or toward the sciences (educational psychology, sociology, economics, and scientific research methods) and the content of the subject may have a political ideology associated with it (Cowen, 2006). Modern computer technology has altered international research by giving easier access to the information of other countries and providing immediate and extensive access to researchers around the globe. Changes in the approaches to comparative educational ana‑ lysis have been dramatic, especially since the introduction of computer technology into the research laboratory, which permits a rapid exchange of ideas across large distances. Also, the growth of an industry of large-scale international surveys of primary and secondary education may be reshaping the research enterprise of comparative studies as they continue to introduce an extensive body of statistical information about differences and similarities in educational practices across the globe (see Suter, Chapter 11). The continuing release of new international rankings of student achievement through the public media around the world has increased public and political figures’ attention to education. Academic researchers cannot ignore the role of comparative education in affecting the direction of and contributions to knowledge nor the effect on public policy in education (Cowen, 2011). Some researchers in this

field have suggested that comparative largescale student surveys may have become a driver of change as much as they are a recorder of educational experiences across countries (Lafontaine & Baye, 2012). Therefore, one intent of this chapter, and the entire Handbook, is to provide a resource for expanding the lines of communication between different philosophies, epistemologies, and methods of research used in this field in the 21st century. A significant number of handbooks and encyclopedias have been published in the field of comparative education. About 50 titles have addressed some international aspects of educational research in the past 20 years (a few are included in the references). Scholarly fields change rapidly, and they need continuous critical review of the meaning and direction of recent research (a theme expressed by Cowen, 2011). This need is particularly acute now because the amount of published papers in this field has expanded exponentially (as discussed later in this chapter). New research report findings, survey results, case study analyses, and changes in methods of research have led to the development of new frameworks of basic concepts. The topics covered in this chapter include a review of research in the field of large-scale survey studies, of researchers that reject the value of ‘quantitative’ research models, and of practitioners who seek educational insights from comparative education research (Moss et al., 2009; Phillips, 2010). The subject of comparing educational systems between countries has become a large academic and policy-oriented enterprise (Easton, 2016). There are significant divisions within the field between those who rely on large-scale survey research, those who examine the broad aspects of global economic, political, and educational systems, and researchers who seek evidence for policy changes at a micro level within a single country (Becher & Trowler, 2001; Bray & Thomas, 1995; Cowen, 2006, 2014; Crossley, Broadfoot, & Schweisfurth, 2007; Fairbrother,

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

2005; Tony et  al., 2012). Some researchers in comparative education conduct analysis of large macro-level systems such as nations and the entire globe to study globalization, political socialization, transferring educational practices, and inequality. Other researchers seek evidence for introducing new models of educational practices into new settings (Phillips, 2000; Steiner-Khamsi, 2015). Importantly, national and local leaders may use the results of comparative studies to justify revisions of their educational policies (Crossley, 2016; Dossey & Wu, 2012; Heyneman, 2003). If an accumulated body of knowledge from comparative education is to be achieved, methods of research must be developed that can be shared and acknowledged as valid across different disciplines. Otherwise, research teams would only be writing for themselves. The field of comparative education includes research themes found in sociology, political science, economics, psychology, and history. Publications of research apply a variety of methods, such as scholarly inquiry, mixed methods, measurement of cognitive and non-cognitive behaviors, and varieties of qualitative investigation. Some research may rely on large-scale statistical analysis or smaller-scale observational studies of global education; others examine the economic and political factors behind changes within educational institutions. Research models may stretch from macrolevel trends of world dynamics or discuss specific educational policy issues such as transfer of ideas, alignment of education and employment, alignment between local needs and global information flows. Many comparative research articles include discussions of theory, policy, and methods of study that are intended to be applied to particular countries and world regions in ways that will assist in the accumulation of knowledge of practice. This Handbook is mostly limited to research about primary and secondary levels of education because international research

5

post-secondary education would require an effort equal to the size of this volume.

INTRODUCTION TO RESEARCH IN THE FIELD OF COMPARATIVE EDUCATION In modern societies, youth and adults in all countries are educated through formal and informal institutions supported largely by taxation of local and national governments. The involvement of government funding often brings with it a desire for accountability. Comparative education researchers examine the variety and efficacy of all levels of educational practices through comparative study of their similarities and differences in different cultures. Comparative education studies have examined the influence of the growth and decline of educational institutions through the study of history (Benavot, Resnik, & Corrales, 2004; Cowen, 2006; Lingard, 2014; Meyer, Kamens, & Benavot, 1992; Ramirez & Boli, 1987), through analysis of pedagogical and curriculum influences on teaching and learning (Houang, Schmidt, & Cogan, 2004; Lingard & Ladwig, 1998; Tramonte, 2015; Zuzovsky, 2003), and through analysis of individual student differences in learning (Cai, 2003; Jones, 2005; Mesa, Gómez, & Cheah, 2012; Nunes, 1999; OECD, 2010). The field of comparative research may include officials of international agencies who monitor educational programs and implement policies (such as the World Bank, UNESCO, and OECD) and university scholars who conduct individual research projects of history, observation, and large-scale data analysis that is submitted for publication for peer review (Bray, Kwo & Jokić, 2015). Philosophers of education have described education in these terms: All human societies, past and present, have had a vested interest in education. … For one thing, it is obvious that children are born illiterate and

6

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

innumerate, and ignorant of the norms and cultural achievements of the community or society into which they have been thrust; but with the help of professional teachers and the dedicated amateurs in their families and immediate environs (and with the aid, too, of educational resources made available through the media and nowadays the internet), within a few years they can read, write, calculate, and act (at least often) in culturally-appropriate ways. Some learn these skills with more facility than others, and so education also serves as a social-sorting mechanism and undoubtedly has enormous impact on the economic fate of the individual. (Phillips & Siegel, 2013: 1)

This definition of education is broad enough to encompass the variety of cultural norms, resources, instruction, media, and individual learning differences of systems and individuals across many nations. The human experience of educating and being educated includes the formal systems of schooling that emphasize growth in cognitive understanding of the world around us as well as non-cognitive growth in developing successful human relationships, experiences of joy and happiness, and motivation for further study and career development. The global growth and development of educational institutions that have developed over centuries is also a subject of intense examination and reflection (Baker & LeTendra, 2005; Benavot, Resnik, & Corrales, 2004; Ramirez & Boli, 1987) because they have expanded at different rates and with different functions for cultivating growth in cognitive and non-cognitive domains of human development. While international educational differences provide a basis for comparison that inform our understanding of their structure, function, and development, as is true for all sciences, observations about country patterns do not necessarily lead to a common interpretation. Rather than settling complex political, economic and social issues of education, cross-national studies in education expand the range of research questions on education pursued by inserting frameworks from many disciplines rather than contracting investigations into a single model of learning or practice.

What is Comparative Education about? Research that is conducted across many countries must deal with complexities of language barriers, varieties of cultural habits, differences in political power execution, as well as geographic diversity. Cowen summarizes the intent of the field of comparative education as dealing with the context of international differences and the transfer of ideas (Cowen, 2006). The content of a research project in comparative educational may be intended for either academic theoretical development or for improving practices of schools and governmental programs. The source of conceptual frameworks, theories, and research methods are borrowed from the disciplinary perspectives of history, philosophy, sociology, economics, psychology, political science, science education, research methods, and education policy (Bray, Adamson & Mason, 2014; Cowen, 2014; Furlong, 2013). Thus, the field of comparative education is eclectic, contains competing stances of epistemology, conceptual frameworks, and research methods, and it is conducted in an environment of multiple social-cultural-economic settings all at once.

HISTORY AND GROWTH OF RESEARCH IN COMPARATIVE EDUCATION Systematic consideration of international differences in education began in Europe as early as the 18th century by Jullien de Paris and in the 19th century by Sir Michael Sadler (Gautherin, 1993; Sadler, 1900). Even in these early years, differences in epistemologies to be used for comparative research were a matter of open debate (Bray & Manzon, 2014). Thus, for well over 100 years scholars have recommended that educational theory and practice would benefit from studying educational institutions across cultures. And that they do so empirically. For example, in 1848, Jullien wrote that,

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

Education, like all other arts and sciences, is composed of facts and observations. It thus seems necessary to produce for this science, as has been done for the other branches of knowledge, collections of facts and observations arranged in analytical tables, so that these facts and observations can be compared and certain principles and definite rules deduced from them, so that education may become an almost positive science. (quoted in Gautherin, 1993: 784)

Later, Sir Michael Sadler suggested at a conference at Guildford, England, in 1900, that it was necessary to establish a common meaning for basic concepts, such as ‘Systems of Education’, and to ‘prevent possible misunderstanding and differences of opinion if we ask ourselves in passing what we mean by Education’ (Bereday, 1964a). Yet, he also understood that such a field of study would be controversial. In remarks to answer the question ‘How Far Can We Learn Anything of Practical Value from the Study of Foreign Systems of Education?’ he recognized the challenge of conducting comparative studies. He said, I am inclined to think that, on the subject which we are about to discuss, if we are quite frank with ourselves and with other people, we shall have to confess that we are not all of one mind. We are going to ask ourselves whether really there is anything of practical value to be got from studying foreign systems of Education. To that question, I suspect, we shall not all be disposed to give the same answer. (Cited in Bereday, 1964b: 307)

Sadler’s observations over a century ago that ‘we are not all of one mind’ about the practical value of studying other systems accurately describes the state of the field in the 21st century. The contents of the handbooks and college textbooks of comparative education as well as summative reviews by researchers in the field of comparative education illustrates the extensive variety of approaches to the research process (Alexander, Broadfoot, & Phillips, 1999; Anderson-Levitt, 2015; Baker & LeTendre, 2005; Benavot, Resnik, & Corrales, 2004; Bray, Adamson, & Mason, 2014; Carnoy,

7

2001; Cowen, 2014; Cowen & Kazamias, 2009; Crossley & Jarvis, 2000; Epstein & Carroll, 2011; Fairbrother, 2005; Mundy, Green, Lingard, & Verger, 2016; Phillips & Schweisfurth, 2016; Postlethwaite, 2004; Torney-Purta, 1990). Bray and Manzon’s review of the status of Asian comparative educational developments also summarizes the differing views of research philosophies in this field, noting that some believe that the diversity of viewpoints increases the number of ideas that would be expressed in meetings of researchers (Bray & Manzon, 2014). A review of the growth of comparative education for various purposes in England is provided by Crossley and Watson (2011), who specifically examine how comparative and international education played a role in shaping research in ‘teacher education, in postgraduate studies, and in the realms of policy and practice, theory and research’ in England during the 20th century and into the 21st. The origin and development of comparative education research did not occur in only one region of the world. For example, Bray reviews the growth of comparative education societies in Greater China in recent years (Bray & Gui, 2001; Bray & Manzon, 2014). These authors, and Denman and Higuchi (2013), document a rapid growth of institutionalization of comparative education research in Asia during the past 20 years. They note that a significant number (23) of comparative education research journals are currently published in the Asian region (Denman & Higuchi, 2013). Denman and Higuchi also identified authors of ‘foreign education’ in Japan and Korea that pre-date the 19th-century scholars Jullien and Sadler (Denman & Higuchi, 2013). They explain: In Japan, Nakajima Hanjiro (1871–1926) recently has been identified as one of a few comparative educators who predated the founding fathers of comparative education such as Kandel, Bereday, and Sadler (Otsuka, 2001). His 1916 publication, Comparative Study on Education in Germany, England, and America, was the first book in Japan that included ‘comparative’ in its title. Another

8

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

contemporary of that era is Higuchi Osaichi (1871– 1945), who published two important books, Comparative Education (1928) and Comparative Study on Education Systems (1936). He is noted for his differentiating general, vocational, and higher education in the interest of targeted research and analysis (Otsuka, 2011). (Denman & Higuchi, 2013: 6)

In recent years, the number of research studies in comparative education has grown especially rapidly. The number of publications on comparisons between countries has grown from about 800 a year in the 1980s to about 3,000 a year by 2015 (see Figure 1.1) (Easton, 2015, 2016; Raby, 2010). However, this list of academic publications ignores the large number of reports by international organizations such as UNESCO, OECD, and the World Bank. Between 1972 and 2016, the Comparative Education Review (the official journal of the Comparative and International Education Society, CIES) conducted an annual analysis of the books and articles published about comparative education. The organization stopped the effort in 2016 because of the size of the task (Easton, 2016). Easton reported that, ‘This year the bibliography includes over 4,300 references drawn from 345 different publications’ (Easton, 2016). The topics of the collected papers range from economic growth to sexuality. About 70 journals are focused specifically on comparative educational and

research papers (Easton, 2016; Raby, 2007). The wide distribution of comparative education research into different fields of study is illustrated by the large number of journals that accept papers. For example, about 230 journals contained at least two articles about comparative education and 125 journals published at least 10 papers in 2015. The number of publications has risen so large that the collection and classification of the articles has become a major effort and has been moved to a database made available to members of the CIES (Raby, 2010). The published 2015 CIES bibliography contained 3,000 journal entries and the online bibliography held at Florida State University contained 4,300 (Easton, 2016). Even this large accumulation of journals’ research papers is incomplete. For example, the list of journals in the CIES bibliography does not include journals on assessment, which publish many technical studies of cross-national comparison (such as Large-scale Assessments in Education and Educational Assessment, Evaluation and Accountability, which were established in 2013 and 1988) nor the 23 journals from Asia listed by Bray and Manzon (2014) (also Denman & Higuchi, 2013). This rapid growth is evidence that research in the field of international comparative education is increasing its growth rate in the first years of the 21st century.

Figure 1.1  Number of bibliographic references per year (Easton, 2016: 834)

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

RESEARCH METHODS FOR COMPARATIVE EDUCATION Research studies in the field of comparative education are conducted with a very wide variety of educational research methods encompassing all strands of epistemologies. Moreover, the leaders in the field are eclectic in their choices of methods for guiding educational practices (Bray, Adamson & Mason, 2014; Cowen, 2006). Historians of education have pointed out that in the earliest development of the field of comparative education that history, philosophy, and politics played a large role in shaping the field (Cowen & Kazamias, 2009). In more recent years, empirical observation and survey evidence are represented in studies of education across cultures, but those methods for analyzing cross-national educational systems continues to be as much a matter of discussion and debate as accepted as a given doctrine (Jerrim & Choi, 2016). For example, Cowen cited a section of the 1900 lecture by Sadler to point out that the study of education must be broader than the study of easily observed elements of schooling. Cowen writes, ‘Sadler reminded us that: … if we propose to study foreign systems of education, we must not keep our eyes on the brick and mortar institutions, nor on the teachers and pupils only, but we must also go outside into the streets and into the homes of the people, and try to find out what is the intangible, impalpable, spiritual force which, in the case of any successful system of Education, is in reality upholding the school system and accounting for its practical efficiency. … In studying foreign systems of Education we should not forget that the things outside the schools matter even more than the things inside the schools, and govern and interpret the things inside. (Cowen, 2014: 282)

Consequently, many scholars conducted personal observational studies of ‘foreign’ education systems describing the results within their own personal theories of context (Bray, 2014; Cowen, 2014; Schweisfurth, this volume). The practice of producing qualitative descriptions of educational activities in

9

regions of the globe continues in the 21st century (Bray, 2014), as shown in the most recent compilations of comparative education papers (Easton, 2016; Raby, 2006, 2008). In 1969, Noah and Eckstein published a textbook for comparative education entitled, Toward a Science of Comparative Education, in which they outlined an approach that encouraged the replacement of the use of intuition only with scientific observation (Noah & Eckstein, 1969). The early reviews of the text were not always positive. For example, Mallison comments that they ‘then conclude with the observation that “a method which incorporates the intuitive insights and speculative reflections of the observer, but submits them to systematic, empirical testing appears to offer the best hope for the progress of comparative education”. I am not convinced’ (Mallinson, 1969: p 334). Mallison asserts that intuition is a necessary form of interpretation that cannot be replaced by scientific observation (Mallison, 1969). Nevertheless, the positivistic approach recommended by Noah and Eckstein was encouraged frequently by other researchers, even by historians. In the 80-chapter International Handbook of Comparative Education by Cowen and Kazamias (2009), for example, the Noah and Eckstein volume is cited 32 times. The large differences between scholars in their beliefs about what is real may inhibit the advancement of the field toward an integration of knowledge about educational practices. The canons of the scientific method have provided guidance on how to organize ideas and behaviors into logical forms of testing and replication. Widely recognized steps are: (1) make an observation that describes a problem, (2) create a hypothesis, (3) test the hypothesis, and (4) draw conclusions and refine the hypothesis (Toulmin, 1958; Zarębski, 2009). According to Toulmin, science does not discover new facts or regularities in nature, but rather offers some new ways of seeing and understanding the physical world. Its basic, fundamental purpose is

10

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

to achieve understanding through a relevant theory of phenomena with which we are already familiar. Toulmin points out that each field of science develops its own methods and ‘working logic’ which are closely connected with the methods of representation accepted in that field. Agreeing on the rules of argument is a first step toward achieving consensus (Toulmin, 1958). Toulmin’s early writings elaborated on the method of ‘substantial argument,’ which consists of six components: claims, data, warrants, backing for warrants, rebuttals, and modal qualifiers (Toulmin, 1958). Not all comparativists (or educational philosophers) accept the tenets of the scientific method as the only method of investigation. In fact, some comparativist academics eschew completely the rules of scientific investigation (Stenhouse, 1979). For example, Stenhouse commented that ‘Comparative education is not, I think, a science seeking general laws; nor is it a discipline of knowledge either in the sense that it provides a structure to support the growth of mind, or in the sense that it has distinctive conventions by which its truths are tested’ (Stenhouse, 1979: p 5). Another interesting and extreme view is presented by Stronach, who titled his volume, Globalizing Education, Educating the Local: How Method Made Us Mad (2010). These post-positivist views are frequently found in the comparative education literature. For example, Dennis Phillips cites the philosopher Carr claiming that ‘the forms of human association characteristic of educational engagement are not really apt for scientific or empirical study at all’ (Phillips & Siegel, 2013: p 13.; see also Carr, 2003). His reasoning is that educational processes cannot be studied empirically because they are processes of ‘normative initiation’. Another philosopher also named Carr (Carr, 2006) writes that educational theory is based on an unfounded assumption that a foundation of knowledge exists to support all other knowledge (such as practice); but he asserts that such foundations do not

exist in modern culture and, therefore, educational theory has no place in academic discussions (Carr, 2006). These philosophers are examples of a ‘post-foundation’ in which knowledge is not assumed to have a basic foundation of known facts. On the other hand, several reviews of Stronach’s book critique his stance (Edwards, Carney, Ambrosius, & Lauder, 2012; Gannon, 2012). These reviews illustrate the extreme differences in argument that may be found in this field. The five reviewers of Stronach run the gamut of strong support to extreme criticism. Some reviewers believed that Stronach was writing satire and reflecting on his own career of research in comparative education. Others believed that he was stating a needed point of view about the large number of possible extreme interpretations of comparisons. Researchers in the field of comparative education will find widely different epistemologies regarding the meaning of evidence in the field. While the researchers who participate in this field of research recognize that they are a part of a singular field of study, they must accept that different points of view even about the source of knowledge is an accepted form of scholarship and worthy of acceptance into the field and debated as necessary (similar points are made by Shavelson & Towne, 2002). Clearly from these examples, the field of comparative international education contains researchers who represent a large number of competing philosophies. Individual researchers approach a topic with established views of what is real or not (ontology) and thus create an epistemology for approaching their research. Often these underlying philosophies of thought are not made manifest by the researchers themselves. For a useful discussion of the difficulties in arriving at a singular conclusion in education research, examine the reflections of Moss and Phillips (Moss et al., 2009; Phillips, 2010). Changes in the dominance of one perspective over another over the past century has been affected by the availability of funds

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

for researchers. The type of research funded shifted significantly after 1945. For a significant period beginning in the 1960s, the social sciences emphasized the role of data collection and analysis. A popular textbook by Noah and Eckstein (1969) documented and encouraged an important epistemic core in the discussion of methodology toward the methods of social scientists. In summarizing the significance of Noah and Eckstein’s text for the field, Hawkins and Rust selected this text from Noah and Eckstein (1969): They claimed that the potential of the field lies in four spheres. First, it promises to extend the generality of social and educational propositions beyond the confines of a single society. Second, it has the potential to test propositions that can only be tested in the cross-national context. Third, it has the capacity to further cross-disciplinary activities. Fourth, and most importantly for our discussion, it has the potential to serve ‘as an instrument for planners and policy makers’. In other words, while comparative education has significant theoretical potential, it also has important instrumental potential. (Hawkins & Rust, 2001: 504)

A pragmatic view of research methods will recognize the difficulties in describing human behavior through counting behavior with positivistic measurement methods while accepting the challenge of repeated and reevaluated observation methods. Measurement of student and teacher performance necessitates an a priori definition of events that are to be observed. This selection period reduces the possibilities of observation to a level that can be managed by a single researcher and thereby eliminates some potential explanations. Only with repetition and revision of measurement methods over a long period of time will it be possible to see the benefits of the application of scientific methods of observation for creating comprehensive theories of human learning behavior. Fortunately, the experiences over the past 50 years provide a guide for what might be learned. Further examples of the many types of research approaches to the study of education have been collected by the editors of

11

The SAGE Handbook of Educational Research (Wyse, Selwyn, Smith & Suter, 2017).

METHOD OF COMPARISON Making comparisons between similar social systems is a significant method of inquiry that defines the field of comparative education itself. Comparisons may be conducted between high levels of aggregation, such as nations, educational systems, world regions, or at a more micro level, such as individual student behavior or teacher practices (Bray & Thomas, 1995; Cummings, 1999). Comparison of ourselves to others is a natural form of human understanding, and it is a key element in conducting analysis of social behavior. Learning how other humans who are in unseen places around the globe go about the instruction of reading, mathematics, history, and sciences can inform us of how an existing system might be unjust or inefficient or lackluster. The field of comparative education may be approached as either a science or a humanity. The professional standards and associations provide frameworks, methods, and a history of prior studies that provide a necessary framework for human understanding. The comparative researchers who publish their observations and overall analysis bring concepts and methodological procedures from an extremely wide variety of disciplines, which requires great effort on the part of entry-level researchers to grasp the breadth of vocabulary and epistemologies. Moreover, changes in the ability to communicate across cultures (through technology and travel ease) has increased the level and frequency of exchanges across different cultures so that the number of possible combinations of analyses has grown to such an extent that no single conceptual framework has yet appeared. Multiple comparisons of many elements of education seen in vastly different settings can be overwhelming. The extreme

12

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

variety of observations in cross-national research interferes with the ability to quickly reduce conflicting trends into a generalization. Aggregation of many small pieces into a larger construct aids this process but also introduces other elements of observational error, such as making an ecological fallacy in assuming that the aggregate levels of behavior are representative of individual initiatives. Comparing whole countries with each other is a convenient way to reduce complex behaviors into manageable concepts or, as in large-scale surveys, to a single statistic. The value of the reduction of variation into a singular form may simultaneously produce insight and confusion because countries are composed of many smaller social, economic, and political systems that may be acting in opposite directions within the same social system. Therefore, the discipline of comparative education must contain both those who make the effort to create detailed observations and those who theorize about their symmetry. The result may not produce a single or simple answer to a question about educational processes, but such comparative analysis refines questions about complex processes, such as how to educate the next generation of children.

HOW DOES COMPARATIVE EDUCATION FIT INTO GENERAL ACADEMIC RESEARCH? Should the field of international comparative education be officially recognized as a singular profession? Or is it too eclectic to be awarded its own identity (Hawkins & Rust, 2001)? According to Darity (2008), a profession is said to exist when it creates: • methods for recruitment and training of members; • requirements and standards of practice; • regulation of practices; • means to disseminate knowledge; • official recognition of its members;

• a developed code of ethics; • formalized educational process that serves as a gatekeeping function and makes exclusive claims on qualifications, expertise, and jurisdiction.

Comparative education researchers apparently would agree that sufficient organization and merit exists to identify comparative international education as a recognized field of study. First, researchers in the field identify themselves as ‘comparativists’ (Crossley, Broadfoot, & Scheisfurth, 2007). Second, field leaders openly discuss the properties of comparative education as a field of study in over 50 papers between 1960 and 2015. These reflections on the status of the field of comparative education (Arnove et al., 1982; Bray, 2003; Cowen, 2014; Fairbrother, 2005; Grant, 2000; Holmes, 1998; Noah, 1988; Rust et al., 1999; Theisen & Adams, 1990) demonstrate that while the field has many diverse points of view, it is acknowledged by participants as a recognizable field of research. Third, the field has established methods of research employed by researchers and a shared body of knowledge (Bray, Adamson & Mason, 2014). Fourth, courses and degrees are offered for the study of International Comparative Education in many major universities around the world (Drake, 2011; Larson, 2010). Every discipline in the social sciences, such as sociology, economics, and political science, and humanities has recognized sub-fields that engage in conducting comparisons of elements of that discipline among nations (Cowen, 2014). Fifth, the findings of comparative education research are recognized by government officials and professional organizations that set policies for educational institutions. By this definition, the field of comparative international education is a recognized profession. It is represented in universities that provide formal training for research practices, members of the profession regulate publication quality research by peer review, and the associations provide recognition to its members through awards. In 2017, 40 societies were members of an umbrella organization,

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

Table 1.1  Major journals in international comparative education 1 Comparative Education 2 Comparative Education Review 3 Compare: A Journal of Comparative and International Education 4 Current Issues in Comparative Education 5 Globalization, Societies and Education 6 International Education Journal: Comparative Perspectives 7 International Journal of Comparative Education and Development 8 Journal of Research in International Education 9 Research in Comparative and International Education

the World Council of Comparative Education Societies (WCCES, www.wcces-online.org/). This level of organization demonstrates that, through university courses, membership societies, and collaboration between professional societies, the profession of comparative international research is strong.

RESEARCH TOPICS IN COMPARATIVE EDUCATION Common research topics in comparative education are found in the handbooks and textbooks published over the past two decades. A list of handbooks is separated from the reference list in this chapter (Bascia, Cumming, Datnow, Leithwood, & Livingstone, 2005; Bray, Adamson, & Mason, 2014; Cowen & Kazamias, 2009; Hayden, Levy, & Thompson, 2015; Howie & Plomp, 2005; Keitel, Bishop, & Clements, 2013; Kennett, 2004; McRobbie, Fraser, & Tobin, 2012; Mundy, Green, Lingard, & Verger, 2016; Papanastasiou, Plomp, & Papanastasiou, 2011; Phillips & Schweisfurth, 2016; Rutkowski, von Davier, & Rutkowski, 2013; Sharpes, 2016; Smith, 2016; Zajda, 2005). The research topics in these handbooks include: discussion of the status of the field of comparative education itself; the meaning and consequences of globalization and industrialization; economic development, assessment, and testing; and program

13

evaluation. Other mentioned topics frequently addressed by comparative education authors include human rights, violence, school materials, pedagogy, technology, and economic development. Typical educational topics, such as teaching, curriculum, and study patterns, were rarely addressed in the cited handbooks. In another formulation of the sources for research topics in comparative education, Hawkins and Rust (2001: p. 501) observed that the field of comparative education is derived from three large perspectives: ‘area studies based, social science disciplinary based, or development/planning studies based’. An examination of the papers assembled into bibliographies by the Comparative and International Education Society (CIES) shows that published papers by researchers in comparative education address many issues of composition and change in social and economic institutions (Easton, 2014, 2015, 2016; Raby, 2007, 2010). Comparative education scholars address classical subjects of educational theory and practice, such as student achievement, teacher education, curriculum, pedagogical practices, education for disabled students, lifelong learning, technology, and mathematics and science education. Research articles also addressed issues in macro global domains, such as globalization (the integration of thought and commerce across nations), the role of international agencies, economic development, global leadership, global metrics, governance, and social justice. This research is of special interest to international organizations such as the United Nations and the World Bank and to non-governmental international organizations. Research papers also include topics of special interest to sociologists, anthropologists, and economists, such as social class, ethnicity, gender, immigration, cultural capital, indigenous cultures, and historical change. The methods of research presented in the research journals include case studies, comparison as a method, understanding context, student assessment, and historical ana­ lysis. Published articles in the same journal

14

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

may contain analyses of conflicting philosophies, such as postmodernism, positivism, constructivism, or pragmatism. The research topics that were most often addressed in cross-national education publications are these (Bray & Manzon, 2014; Denman, 2017; Masemann, Bray, & Manzon, 2007): • • • • • • • • • • • • • •

Economic development; International agencies; Indigenous cultures; Metrics to inform national and international policy; Curriculum that is intended, implemented and attained; Equity of access to quality education; Levels of cognitive achievement; Varieties of non-cognitive traits; Pedagogy; Philosophy of science, especially epistemology; Quality of instruction in subject areas (mathematics, science, reading); Uses of time inside and outside formal classes; Reflections on the field of comparative education itself; Extensive descriptions of education institutions in world regions.

A useful introduction to the theoretical and political macro forces that affect the relationships between educational values in different world regions and their relationship to a global economy is provided by Popkewitz in ‘Globalization/Regionalization, Knowledge, and the Educational Practices’ (2000). Resources for regional analysis of educational issues are addressed in academic journals and by international organizations such as UNESCO (Bray, 2010; Cowen, 1990; Mundy & Madden, 2011; UNESCO Institute for Statistics, 2014).

THE INFLUENCE OF LARGE-SCALE COMPARATIVE EDUCATION SURVEYS On the one hand, the motivation for exploring how students are taught in another

country may be as simple as making comparisons of known processes in one’s own country with others in another country with the aim of exploiting an idea (Phillips, 2000). But other motivations include the study of education in other countries with the purpose of developing an extensive body of common measurements of student learning and educational practices (Bottani, 2012; Husén, 1979a; 1979b; IEA, 2011; Postlethwaite, 2004; Schleicher, 2016). The introduction of large-scale surveys of primary and secondary students across countries to the domain of educational research has provided information about differences and similarities in educational outcomes and practices across the globe, but also sparked debates about the significance of reducing educational practices to a numerical form (see Yang, in this volume for extensive discussion). The methods and results of largescale surveys are perceived as affecting education as much as they are necessary for describing it. For example, Lafontaine asked whether the large surveys such as PISA and TIMSS have been more of a witness to educational events or whether they drive change (Lafontaine et  al., 2015). Whether or not large-scale comparative studies of student achievement have influenced policy and practices is discussed following each new round of survey results. Several chapters in this volume address this subject (Suter, Fischman et al., and Power). Sufficient evidence exists to demonstrate that national policies have been affected by these studies (Baird et  al., 2017; Barrett & Crossley, 2015; Berliner, 2015; Berliner & Biddle, 1995; Brown, 1996; Carnoy, Hinchey, & Mathis, 2015; Cowen, 2011b; Cresswell, Schwantner, & Waters, 2015; Crossley, 2014; Dossey & Wu, 2012; Fairbrother, 2005; Ferrer, 2012; Figazzolo, 2009; Goldstein, 2004; Grek, 2009; Heyneman, 2003; Husén, 1987; Selwyn, 2016; Torney-Purta, 1990; Wiseman & Alexander, 2010). However, the essay by Husén (1979), following his experience with the first series of IEA studies in the 1970s,

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

is sobering. He points out that evidence from objective surveys have been used on both sides of the same argument. Survey data often do not settle arguments. Instead, they present new topics for further elaboration (Cowen, 2011a). Or, as Fishman, Sahlberg, Silova and Topper point out in this volume, they are used to support existing practice. A guiding assumption of the survey research of many countries is that the diversity of educational content and organization throughout the world provides an opportunity for making objective scientific studies of effective practices of teaching and learning (Husén, 1983). Husén referred to the variety of existing models of education as a ‘natural experiment’ that could be exploited by research to better understand the relationships that lead to good education. He wrote, What is the rationale for embarking on a venture with such far reaching administrative and financial implications and such frustrating technical complexities? We, the researchers who almost 15 years ago decided to cooperate in developing internationally valid evaluation instruments, conceived of the world as one big educational laboratory where a great variety of practices in terms of school structure and curricula were tried out. We simply wanted to take advantage of the international variability with regard both to the outcomes of the educational systems and the factors which caused differences in those outcomes. (Husén, 1973: 10; see also Ross & Genevois, 2006)

Now that the studies envisioned by Husén and his colleagues in the 1950s have expanded to more countries and have been continuously conducted since the 1960s, is it possible to answer Husén’s hope that analysis of variability of educational practices across the world has contributed positive knowledge to educational procedures and practices? An answer to this question may be proposed by examining the substance of the research articles from the studies that have been published. Chapters in this volume critically address some of the topics of these studies. Although large-scale empirical surveys have become a significant enterprise

15

supported by international organizations and individual governments, analyses of comparative education through use of these surveys are not necessarily reflected in the publications of the Comparative International Education Society (CIES), as shown by the content of the bibliographic review of comparative education publications (Easton, 2016; Raby, 2009). Only 45 articles of the 2,953 listed in the 2015 CIES bibliography cited the international surveys conducted by IEA and OECD (PISA, TIMSS, or PIRLS). Although hundreds of reports are published by their own organizations, the surveys are less likely to appear in a peer-reviewed journal (for more discussion, see other chapters in this Handbook by Power, Fischman et al., Cogan & Schmidt, and Suter). International organizations such as UNESCO and the World Bank seek evidence to guide policies about the development of education in low-income countries. Consequently, new comparative surveys have been conducted in specific world regions, such as Africa, Latin America, and Southeast Asia (see Yang, this volume). Benavot, Resnik and Corrales (2006), provide an example of how comparative educational researchers may address the role of international organizations by conducting a detailed description and critical review of the history of educational institutional development and growth of universal education. They outlined how educational institutions developed legal authority and how international organizations address the problems of inequality in less developed countries. Some comparative education researchers are concerned that the use of large-scale international comparative studies for guiding improvement of practice may lead to poor practices. The use of large-scale surveys was the subject of a focused discussion of academic researchers and policy makers at a conference of comparative education researchers in 2016, where they discussed how measurement metrics of educational practices might be best created to

16

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

provide guidance for pedagogy – or whether such indicators are even desired (Teachers College, Arizona State University, 2016). Some researchers were concerned about the expansion of comparative education surveys for gathering information for advising policy makers in organizations such as OECD, IEA, and UNESCO. Although, the speakers were attentive to the value of large-scale survey data for guiding educational policy, others were more concerned that the types of analyses that might be done with them could lead to negative outcomes. Gorur (2016), for example, argued that testing and measurement has led to conservative policies and practices in schools rather than to pedagogical innovation. He argues that the standardization of norms in schools runs counter to the risk-taking and innovation that is needed to improve schooling. It encourages gaming and narrowly-framed curriculum. By design, metrics are aimed at policies where there is a large gap between reform and practices. Reform is not the same as innovation because innovation must be led by teachers and administrators in schools. In a similar vein, Sahlberg and Hasak (2016: 2) expressed concern that the publication of results of large data sets does not spark sufficient insight about teaching and learning in classrooms because ‘they are based on analytics and statistics, not on emotions and relationships that drive learning in schools’. Measurements produce outputs that are regarded as outcomes, and they do not produce measures that capture the impact of learning on lives and minds. Other speakers questioned whether cross-sectional surveys should be used to infer causal relationships between educational practices and student performance as has been practiced by academic and policyoriented organizations. Others were concerned that large-scale surveys were not well designed to capture subjective or ‘non-cognitive’ aspects of education, such as agency, empathy, and respect for other humans. Edwards (2016) pointed out that largescale measurements should be used to inform

specific decision making but not for setting goals (similar to the recommendation made by Husén, 1983). Edwards said that survey indicators are only partially fit-for-purpose in capturing meaning. They cannot themselves point to how to develop practices that lead to a way out of poverty or to gains in social responsibility (Barrett & Sorenson, 2015). Metrics should be designed to fit an agreed upon collective target that could support evaluations of impact and reveal inequalities. The authors suggest that if educational researchers reduced the elements of practices to small component parts, the leaders of large organizations would ignore broad social approaches and focus their attention on small goals. These debates about the role of largescale international studies should be taken seriously by all scholars in the field. Survey research requires that topics of analysis be designed early in the research process in frameworks of the knowledge base and student activities. These frameworks must be developed as much as five years prior to the analysis and require extensive testing. Study developers and researchers conducting secondary analysis should recognize that survey research, like building bridges, may require extensive trials, evaluation, and redevelopment. A current example of the efforts for new topic measurement is in the 2015 survey by OECD of after-school attendance and its consequences (Bray, Kobakhidze, & Suter, 2018). While the PISA designers have attempted to account for all student time that might affect achievement in prior studies, the large differences in the structure and use of non-formal school time have been described more clearly. The results show that these researchers do not yet fully understand the structure of after-school time at an international level and therefore many more revisions to the survey forms will be necessary before common definitions will be created that fit a variety of country practices. Validity and reliability of survey research requires many years of experience and revision. Critics of their value provide useful advice

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

but should also be aware of the nature of the methods required for good measurement.

SUMMARY: THE CURRENT STATUS OF RESEARCH IN COMPARATIVE EDUCATION The author’s purpose was to introduce issues and areas of research interest in the expanding field of comparative education to new scholars. It presented a brief statement of the origins and directions of some of the types of research that are conducted by educational researchers who identify themselves as comparativists. What is comparative education about? Originally, scholars who studied educational practices in other countries did so to learn how to improve their own educational system. While importing ideas from other countries is still a matter of focused attention in the field of comparative education (Phillips, 2000) the introduction of large-scale surveys in the 1960s was intended to bring a formal scientific ‘empirical’ analysis model to the field, with the expectation that many educational issues could be studied objectively with analytical and statistical models. The growth of these methods has not expanded smoothly. They have required modification to their survey methods and the analyses may not have led to great insights about how school systems might operate more effectively (except perhaps in how well they depict the extreme differences of achievement within as well as among countries). They have, however, provided many new hypotheses about why educational systems are as they are. These questions need to be explored with all available methods of research. The existing international comparative studies on education policies and practices have had consequences for national policies and for educational theory (Benavot, 2016). In the field of comparative education, ‘research’ is a broad term that is applied to different

17

forms of investigation, such as observation, analysis, experiment, description, statistical relationships, and comparisons. Research follows scientific rules of observation and analysis that can be replicated by others and that use verifiable forms of observation (Phillips, 2010). It has been defined as an investigation that is novel, creative, uncertain regarding final outcomes, systematic, transferable, and reproducible (Frascati Manual, OECD, 2015: 45). Examination of education systems in different nations can be conducted to meet this standard through philosophical, historical, empirical, or ethical analyses. Engaging in research about students and teachers in cultures beyond one’s own requires a significant degree of empathy and setting aside of one’s own vocabulary of understanding (epistemology). Most researchers in comparative education are sympathetic with the goals of improving access to educational institutions for all persons. A significant portion of professional comparativists are involved with theories and the application of social and economic development. Other researchers, as documented by David Phillips (2000), seek ideas from other countries to borrow for their own networks in order to improve their system of education. The introduction of large-scale educational surveys across many countries in the late part of the 20th century has in many ways upended the goals and methods of comparative education. Statistical summaries of complex human behaviors are believed to oversimplify the complex networks in which individuals exist; therefore, the direct application of statistical models to revisions in human institutions will lead to critique and conflict, as reflected in conference papers at a Comparative Education Conference held at the Arizona State University (Teachers College, Arizona State University, 2016). The originators of large-scale surveys were very aware of the issues involved with collecting and summarizing educational systems into single sets of numbers and were uneasy with releasing the contents of Pandora’s

18

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

container. Once released and institutionalized by government and international agencies these forms of investigation need to be understood fully before converting the numbers into strategies.

CONCLUSION The field of comparative education continually conducts critical assessments of the theoretical stances and the application of research methods of its members. This chapter, and the remainder of this Sage Handbook, introduces readers to the types of contributions and conflicts that occur in a field that draws ideas from many different scientific and humanistic disciplines. Furthermore, since the results of comparative research studies frequently affect governmental or international educational policy (partly because educational institutions are products of public financing), the methods and application of comparative research studies will continue to be affected by political ideology as well as scientific and academic standards. Consequently, comparative researchers of teaching, learning, and education administration face a future of dynamic exchanges in conducting and interpreting results in this expanding field of study.

REFERENCES Alexander, R., Broadfoot, P., & Phillips, D. (Eds.). (1999). Learning from Comparing: New Directions in Comparative Educational Research. Oxford, UK: Symposium Books. Anderson-Levitt, K. M. (2015). Comparative education review guide to searching for world literature. Comparative Education Review, 59(4), 765–777. http://doi.org/ 10.1086/683095 Arnove, R. F. (1982). Approaches and perspectives. In P. G. Altbach, R. F. Arnove and G. P. Kelly (Eds.), Comparative Education (pp. 3–11). New York: Advent Books.

Baird, J., Johnson, S., Hopfenbeck, T. N., Isaacs, T., Stobart, G., Yu, G., & Isaacs, T. (2017). On the supranational spell of PISA in policy. Educational Research, 58(2), 121–138. http:// doi.org/10.1080/00131881.2016.1165410 Baker, D., & LeTendre, G. K. (2005). National differences, global similarities: World culture and the future of schooling. Stanford, CA: Stanford University Press. Barrett, A. M., & Crossley, M. (2015). The power and politics of international comparisons. Compare: A Journal of Comparative and International Education, 453, 467–470. http://doi.org/10.1080/03057925.2015.1027509 Bascia, N., Cumming, A., Datnow, A., Leithwood, K., & Livingstone, D. (Eds.). (2005). International handbook of educational policy. Springer International Handbooks of Education. Cham, Switzerland: Springer International. Becher, Tony, & Trowler, P. R. (2001). Academic Tribes and Territories (2nd Ed.). Oxford: The Society for Research into Higher Education, Oxford University Press and Open University Press. Benavot, A. (2016). Assuring quality education and learning: Lessons from education for all. Prospects, 46(1), 1–10. http://doi.org/10.1007/ s11125-016-9386-1 Benavot, A., Resnik, J., & Corrales, J. (2006). Global educational expansion: Historical legacies and political obstacles. Washington, D.C.: American Academy of Arts and Sciences. Bereday, G. (1964a). Notes of an address given at the Guildford Educational Conference, on Saturday, October 20, 1900, by M. E. Sadler, Christ Church, Oxford. Bereday, G. (1964b). Sir Michael Sadler’s ‘Study of Foreign Systems of Education’. (Reprinted in) Comparative Education Review, 7(3) (February), 307–314. Berliner, D. & Biddle, B. (1995). The manufactured crisis. New York: Addison- Wesley. Berliner, D. (2015). The many facets of PISA. Teachers College, 117(10), 519–521. Bottani, N. (2012). Les Origenes a L’OCDE de L’Enquête PISA. Revista Española de Educación Comparada, 0(19), 17. http://doi.org/ 10.5944/reec.19.2012.7579 Bray, M. (2003). Comparative education in an era of globalization: Evolution, mission

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

and goals. Policy Futures in Education, 1(2), 209–219. Bray, M. (2010). Comparative education and international education in the history of Compare: Boundaries, overlaps and ambiguities. Compare: A Journal of Comparative and International Education, 40(6), 711–725. Bray, M. and Manzon, M. (Eds.). (2010). Common interests, uncommon goals: Histories of the World Council of Comparative Education Societies and its members. Comparative Education Research Center [CERC]/ Springer, Hong Kong. Bray, M., & Gui, Q. (2001). Comparative education in Greater China: Contexts, characteristics, contrasts and contributions. Comparative Education, 37(4), 451–473. Bray, M., & Manzon, M. (2014). The institutionalization of comparative education in Asia and the Pacific: Roles and contributions of comparative education societies and the WCCES. Asia Pacific Journal of Education, 34(2), 228–248. http://doi.org/10.1080/ 02188791.2013.875646 Bray, M., & Thomas, R. M. (1995). Levels of comparison in educational studies: Different insights from different literatures and the value of multilevel analyses. Harvard Educational Review, 65(3), 472–490. Available at: www.edreview.org/harvard95/1995/fa95/ f95bray.htm Bray, M., Adamson, B., & Mason, M. (2014). Comparative Education research: Approaches and methods (2nd Ed.). Hong Kong: Springer. Bray, M., Kobakhidze, N., & Suter, L. (2018). PISA research on after-school education. Paper presented at the Comparative International Education Society, Mexico City, April 2018. Bray, M., Kwo, O., Jokić, B. (2015). Researching private supplementary tutoring methodological lessons from diverse cultures. CERC Studies in Comparative Education, 32, XXIX, 278. Retrieved from http://dx.doi.org/ 10.1007/978-3-319-30042-9 Brown, Margaret (1996). FIMS and SIMS: The first two IEA International Mathematics Surveys. Assessment in Education: Principles, Policy & Practice, 3(2). DOI: 10.1080/ 0969594960030206 Cai, J. (2003). Investigating parental roles in students’ learning of mathematics from a cross-national perspective. Mathematics

19

Education Research Journal, 15(2), 87–106. http://doi.org/10.1007/BF03217372 Carnoy, M. (2001). Globalization and educational reform: What planners need to know. Economics of Education Review, 20. http:// doi.org/10.1016/S0272-7757(01)00023-1 Carnoy, M., Hinchey, P. H., & Mathis, W. (2015). International Test Score Comparisons and Educational Policy: A Review of the Critiques. National Educational Policy Center. 23 pp. Retrieved from http://nepc.colorado. edu/publication/international-test-scores Carr, D. (2003). Making Sense of Education: An introduction to the philosophy and theory of education and teaching. London: Routledge Falmer. Carr, W. (2006). Education without theory. British Journal of Educational Studies, 54(2), 136–159. Cowen, R. (1990). The national and international impact of comparative education infrastructures. In W. D. Halls (Ed.), Comparative education: Contemporary issues and trends (pp. 321–352). London and Paris: Jessica Kingsley/UNESCO. Cowen, R. (2006). Acting comparatively upon the educational world: Puzzles and possibilities. Oxford Review of Education, 32(5), 561–573. http://doi.org/10.1080/03054980600976155 Cowen, R. (2011a). Foreword. In M. Manzon (Ed.), Comparative Education: The construction of a field (pp. xiii–xvi). Hong Kong: Comparative Education Research Centre, The University of Hong Kong, and Dordrecht: Springer. P. Cowen, R. (2011b). Coda. In M. A. Pereyra, H.-G. Kotthoff and R. Cowen (Eds.), PISA under Examination: Changing Knowledge, Changing Test, and Changing Schools (pp. 259–264). Netherlands: Sense Publishers. Cowen, R. (2014). Ways of knowing, outcomes and ‘comparative education’: Be careful what you pray for. Comparative Education, 68(August), 1–20. http://doi.org/10.1080/ 03050068.2014.921370 Cowen, R., & Kazamias, A. (2009). International handbook of comparative education (Vol. 22). Dordrecht: Springer Netherlands. http://doi.org/10.1007/978-1-4020-6403-6 Cresswell, J. Schwantner, U., Waters, C. (2015). A review of international large-scale assessments in education. Paris: OECD Publishing.

20

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Crossley, M. (2014). Global league tables, big data and the international transfer of educational research modalities. Comparative Education, 50(1), 15–26. http://doi.org/10.1080/ 03050068.2013.871438 Crossley, M. (2016). Global goals versus local contexts: A particular challenge for small island developing states. NORRAG News, 54, 55–56. Retrieved from http://www.norrag. org/fileadmin/Full%20Versions/NN54.pdf Crossley, M., & Jarvis, P. (2000). Introduction: Continuity and change in comparative and international education. Comparative Education, 36(3), 261e–265 Crossley, M., & Watson, K. (2011). Comparative and international education: Policy transfer, context sensitivity and professional development. In J. Furlong and M. Lawn (Eds.), Disciplines of Education: Their role in the future of education research (pp. 103–121). London: Routledge. Crossley, M., Broadfoot, P., & Schweisfurth, M. (2007). Changing educational contexts, issues and identities: 40 years of comparative education. London: Routledge. http://doi.org/ 10.4324/9780203961995 Cummings, W. K. (1999). The institutions of education: Compare, compare, compare! Comparative Education Review, 43, 413–437. Darity, W. A. (2008). International encyclopedia of the social sciences (2nd ed., Vol. 6, pp. 515–517). Detroit, MI: Macmillan Reference USA. Retrieved from http://go.galegroup. com/ps/i.do?id=GALE%7CCX3045302074&v= 2.1&u=red68720&it=r&p=GVRL&sw=w&asid= 29f0093de5865da9e5b42a3731a5cc57 Denman, B. D., & Higuchi, S. (2013). At a crossroads? Comparative and international education research in Asia and the Pacific. Asian Education and Development Studies, 2(1), 4–21. Denman, B. D., (2017). Post-Worldview? A dialogic meta-narrative analysis of NorthSouth, South-South, and Southern Theory. International Journal of Comparative Education and Development, 19(2/3), 65–77. Bingley, UK: Emerald Group Publishing. Dossey, J. A. and Wu. M. L. (2012). Implications of international studies for national and local policy in mathematics education. In Alan J. Bishop, Christine. Keitel, Jeremy Kilpatrick and Frederick K. S. Leun (Eds.), Third

international handbook of mathematics education. Springer. Drake, T. A. (2011). U.S. comparative and international graduate programs: An overview of programmatic size, relevance, philosophy, and methodology. Peabody Journal of Education, 86(2), 189–210. Easton, P. B. (2014). Documenting the evolution of the field: Reflections on the 2013 Comparative Education Review Bibliography. Comparative Education Review, 58(4), 555–574. http://doi.org/10.1086/678047 Easton, P. B. (2015). Comparative education review Bibliography 2014: Catching up with the rapid growth of the field. Comparative Education Review, 59(4), 743. Easton, P. B. (2016). Comparative Education Review Bibliography 2015: Galloping growth and concluding reflections. Comparative Education Review, 60(4), 833–843. http:// doi.org/10.1086/688766 Edwards, D. (2016). Are global learning metrics desirable?: That depends on what decision they are attempting to inform. CASGE working paper No. 1, “The Possibility and Desirability of Global Learning Metrics: Commentaries”. Arizona State University. http:// dx.doi.org/10.14507/casge1.2017 Edwards, R., Carney, S., Ambrosius, U., & Lauder, H. (2012). Review of Stronach’s ‘Globalizing Education, Educating the Local: How Method Made Us Mad’. British Journal of Sociology of Education, 33(3), 451–463, doi: 10.1080/01425692.2012.664909 Epstein, E. H., & Carroll, K. T. (2011). Erasing ancestry: A critique of critiques of the ‘Postmodern Deviation in Comparative Education’. In J. Jacobs and J. Wei-deman (Eds.), Beyond the comparative: advancing theory and its application to practice. Pittsburgh, PA: University of Pittsburgh Press. Fairbrother, G. P. (2005). Comparison to what end? Maximizing the potential of comparative education research. Comparative Education, 41(1), 5–24. http://doi.org/10.1080/ 03050060500073215 Ferrer, F. (2012). PISA: Aportaciones e incidencia sobre las políticas educativas nacionales Revista. Española de Educación Comparada, 19, 11–17. Figazzolo, L. (2009). Testing, ranking, reforming: Impact of PISA (2006) on the education

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

policy debate. Brussels: Education International. Furlong, J. (2013). Education: An anatomy of the discipline: Rescuing the University project. London: Routledge. Gannon, S. (2012). Book review of Stronach’s ‘Globalizing Education, Educating the Local: How Method Made Us Mad. Journal of Education Policy, 27(6), 867–869, doi:10.1080/ 02680939.2012.666054 Gautherin, J. (1993). Marc-Antoine Jullien de Paris. Perspectives. Revue trimestrielle d’éducation comparée, 23(3–4), 783–798. Retrieved from http://www.nie.edu.sg/research/ publication/cieclopedia-org/cieclopedia-orga-to-z-listing/jullien-de-paris-marc-antoine (01/05/2017). Goldstein, H. (2004). International comparisons of student attainment: Some issues arising from the PISA study. Assessment in Education, 11(3), 319–330. http://doi.org/ 10.1080/0969594042000304618 Gorur, R. (2016). Seeing like PISA: A cautionary tale about the performativity of international assessments. European Educational Research Journal, 15(5), 598–616. Grant, N. (2000). Tasks for comparative education in the new millennium. Comparative Education, 36(3), 309–317. Grek, S. (2009). Governing by numbers: The PISA ‘effect’ in Europe. Journal of Education Policy, 24(1), 23–37. http://doi.org/ 10.1080/02680930802412669 Hawkins, J., & Rust, V. D. (2001). Shifting perspectives on comparative research: A view from the USA. Comparative Education, 37(4), 501–506. Hayden, M., Levy, J., & Thompson, J. J. (2015). The SAGE handbook of research in international education. London: Sage. doi: http:// dx.doi.org/10.4135/9781473943506 Heyneman, S. (2003). The history and problems in the making of education policy at the World Bank 1960-2000. International Journal of Educational Development, 23(3), 315–337. Heyneman, S. P., & Lykins, C. R. (2007). The evolution of comparative and international education statistics. In Helen F. Ladd and Edward B. Fiske (Eds.), Handbook of research in education finance and policy (pp. 107–130). London: Routledge.

21

Higuchi Osaichi and Comparative Study on Education Systems. Holmes, B. (1998). Problems in Education: A comparative approach. London: Routledge. Houang, R. T., Schmidt, W. H., & Cogan, L. (2004). Curriculum and learning gains in mathematics: Across country analysis using TIMSS. In C. Papanastasiou (Ed.), Proceedings of the IRC-2004 TIMSS conference (Vol. 1, pp. 224– 254). Nicosia, Cyprus: Cyprus University Press. Howie, S., & Plomp, T. (2005). International comparative studies of education and largescale change. New York: Springer. Husén, T. (1973). Foreword. In LC Comber & JP Keeves (Eds.), Science education in nineteen countries. International Studies in Evaluation (Vol. 1). Stockholm: Almquist & Wiksell. Husén, T. (1979a). Observations: Rather than a reply. Oxford Review of Education, 53, 267. Husén, T. (1979b). An international research venture in retrospect: The IEA surveys. Comparative Education Review, 23. Husén, T. (1983). The international context of educational research. Oxford Review of Education, 9(1), 21. Husén, T. (1987). Policy impact of IEA research. Comparative Education Review, 311, 29. http://doi.org/10.1086/446654 International Association for the Evaluation of Educational Achievement (IEA). (2011). Brief history of the IEA. Retrieved from http:// www.iea.nl/brief_history_of_iea.html Jerrim, J., & Choi, A. (2016). The use (and misuse) of PISA in guiding policy reform: The case of Spain. Comparative Education: An International Journal of Comparative Studies, 52(2), 230–245. Jones, G. A. (2005). What do studies like PISA mean to the mathematics education community? In H. L. Chick and J. L. Vincent (Eds.), Proceedings of the 29th conference of the international group for the psychology of mathematics education. Melbourne, Australia. https://www.emis.de/proceedings/PME29/ PME29CompleteProc/PME29Vol3Fug_Mou.pdf Keitel, C., Bishop, A. J., & Clements, M. A. (2013). Third international handbook of mathematics education. New York: Springer. Kennett, P. (Ed.), (2004). A handbook of comparative social policy. Northampton, VT: Edward Elgar Publishing.

22

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Lafontaine, D., & Baye, A. (2012). PISA, instrument ou témoin du changement: évolution des performances en lecture et des politiques éducatives dans cinq systèmes européens. Éducation Comparée, 7. Lafontaine, D., Baye, A., Vieluf, S., & Monseur, C. (2015). Equity in opportunity-to-learn and achievement in reading: A secondary analysis of PISA 2009 data. Studies in Educational Evaluation, 47, 1–11. http://doi.org/ 10.1016/j.stueduc.2015.05.001 Larsen, M. (2010). New thinking in comparative education: Honoring Robert Cowen. Rotterdam/Boston/Taipei: Sense Publishers. Retrieved from http://books.google.be/ books?id=26aqcQAACAAJ%5Cn Lingard, B. (2014). Historicizing and contextualizing global policy discourses: Test- and standards-based accountabilities in education. International Education Journal: Comparative Perspectives, 12(2), 122–131. Retrieved from http://openjournals.library. usyd.edu.au/index.php/IEJ/article/view/7459 Lingard, B., & Ladwig, J. (1998). School effects in postmodern conditions. In R. Slee, G. Weiner and S. Tomlinson (Eds.), School effectiveness for whom: Challenges to the school effectiveness and school improvement movements (pp. 84–100). London: Falmer Press. Mallinson, V. (1969). Reviewed Work: Towards a science of comparative education by Harold J. Noah and Max A. Eckstein. British Journal of Educational Studies, 17(3), 334. http:// www.jstor.org/stable/3119655 Accessed: 06-06-2018 14:07 UTC. Masemann, V., Bray, M. & Manzon, M. (Eds.) (2007). Common interests, uncommon goals: Histories of the World Council of comparative education societies and its members. Springer, University of Hong Kong. McRobbie, C. J., Fraser, B. J., & Tobin, K. (2012). Springer international handbooks of education. Dordrecht: Springer Netherlands. Mesa, V., Gómez, P., & Cheah, U. H. (2012). Influence of international studies of student achievement on mathematics teaching and learning. In (Ken) Clements et al. (Eds.), Third international handbook of mathematics education. Springer International Handbooks of Education 27, Springer, Science+Business Media. New York. DOI 10.1007/978-1-4614-4684-2_27

Meyer, J. W., Kamens, D., & Benavot, A. (1992). School knowledge for the masses: World models and national primary curricular categories in the 20th century. Washington, DC: Falmer. Moss, P. A., Phillips, D. C., Erickson, F. D., Floden, R. E., Lather, P. A., & Schneider, B. L. (2009). Learning from our differences: A dialogue across perspectives on quality in education research. Educational Researcher, 38(7), 501–517. http://doi.org/10.3102/ 0013189X09348351 Mundy, K., & Madden, M. (2011). UNESCO and its influences on higher education. In Robert M. Basset and Alma MaldonadoMaldonado (Eds.), International organizations and higher education policy: Thinking globally acting locally (pp. 46–63). New York and London: Routlege. Mundy, K., Green, A., Lingard, B., & Verger, A. (Eds.) (2016). Handbook of global education policy. New York and London: Wiley-Blackwell. Noah, H. J. (1988). Methods in comparative education. In T. N. Postlethwaite (Ed.), The encyclopedia of comparative education and national systems of education (pp. 10–12). Oxford: Pergamon Press. Noah, H., & Eckstein, M. (1969). Toward a science of comparative education. London: Macmillan. Nunes, T. (1999). Learning and teaching mathematics: An international perspective. London: Psychology Press. OECD. (2010). PISA 2009 results: Learning to learn – Student engagement, strategies and practices (Vol. III). Paris: OECD Publishing. http://doi.org/10.1787/9789264083943-en OECD. (2015). Frascati manual 2015: Guidelines for collecting and reporting data on research and experimental development, the measurement of scientific, technological and innovation activities. Paris: OECD Publishing. http://dx.doi.org/10.1787/ 9789264239012-en. Otsuka, Y. (2001). Education culture and identity in twentieth century China. Ann Arbor, Michigan: The University of Michigan Press. Papanastasiou, Plomp, T. & Papanastasiou, E. (2011). IEA 1958–2008: 50 years of experiences and memories (Vol. 1). Nicosia, Cyprus: International Association for the

THE STATUS OF COMPARATIVE EDUCATION RESEARCH IN THE 21ST CENTURY

Evaluation of Educational Achievement (IEA). Phillips, D. (2000). Learning from elsewhere in education: Some perennial problems revisited with reference to British interest in Germany. Comparative Education, 36(3), 297. http://doi.org/10.1080/713656617 Phillips, D. C. (2010). Empirical educational research: Charting philosophical disagreements in an undisciplined field. In H. Siegal (Ed.), The Oxford handbook of philosophy of education (pp. 1–28). Oxford: Oxford University Press. http://doi.org/10.1093/oxfordhb/ 9780195312881.003.0022 Phillips, D. C., & Siegel, H. (2013). Philosophy of Education: Stanford Encyclopedia of Philosophy. Stanford, CA: Stanford University Press. https://plato.stanford.edu/entries/ education-philosophy Phillips, D., & Schweisfurth, M. (2016). Comparative and international education: An introduction to theory, method, and practice (2nd Ed.). London: Bloomsbury. Postlethwaite, T. N. (2004). Monitoring educational achievement: International institute for educational planning. Paris: UNESCO. Raby, R. (2006). Comparative education review bibliography: Changing emphases in scholarly discourse. Comparative Education Review, 504, 695–704. Raby, R. (2008). The globalization of journals: A review of the 2007 comparative education review bibliography. Comparative Education Review, 523, 461–475. Raby, R. L. (2007). Fifty years of comparative education review bibliographies: Reflections on the field. Comparative Education Review, 51(3), 379–398. Raby, R. L. (2009). Whither diversity in publishing? A review of the 2008 comparative education review bibliography. Comparative Education Review, 533, 435–450. http:// doi.org/10.1086/600815 Raby, R. L. (2010). The 2009 comparative education review bibliography: Patterns of internationalization in the field. Comparative Education Review, 54(3), 415–427. http:// doi.org/10.1086/653543 Ramirez, F. O., & Boli, J. (1987). The political construction of mass schooling: European origins and worldwide institutionalization.

23

Sociology of Education, 60(1), 2–17. http:// doi.org/10.2307/2112615 Ross, K. N. & Genevois, I. (2006). Crossnational studies of the quality of education: Planning their design and managing their impact. Paris: UNESCO, Institute for Educational Planning. Rust, V., Soumare, A., Pescador, O., & Shibuya, M. (1999). Research Strategies in comparative education. Comparative Education, 431, 86–109. http://doi.org/10.1086/ 447546 Rutkowski, L., von Davier, M., & Rutkowski, D. (2013). Handbook of international largescale assessment: Background, technical issues, and methods of data analysis. London: Chapman and Hall/CRC. Sadler, M. (1900). How far can we learn anything of practical value from the study of foreign systems of education? In J. H. Higginson (Ed.) (1979), Selections from Michael Sadler (pp. 48–51). Liverpool: Dejall & Meyorre. Reprinted in 1964 in Comparative Education Review, 7(3), 307–314. Schleicher, A. (2018). World Class: how to build a 21st century school system. Paris: OECD. Schleicher, A., & Zoido, P. (2016). The Policies that shaped PISA, and the policies that PISA shaped. Handbook of global education policy, 374–384. Schleicher, Andreas. (2016). International assessments of student learning outcomes. In D. Wyse, L. Hayward and J. Pandya (Eds.), The SAGE handbook of curriculum, pedagogy and assessment (1st Ed.). London: Sage. Schweisfurth, M. (2014). Data and dogma: Knowledge and education governance in the global age. Comparative Education, 50(2), 127–128. http://doi.org/10.1080/03050068. 2014.892754 Selwyn, N. (2016). ‘There’s so much data’: Exploring the realities of data-based school governance. European Educational Research Journal, 151, 54–68. http://doi.org/ 10.1177/1474904115602909 Sharpes, D. (Ed.) (2016). Handbook on comparative and international studies in education. In the series: International Perspectives on Educational Policy, Research and Practice. Charlotte, North Carolina: Information Age Publishing.

24

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Shavelson, R. J. & Towne, L. (2002). Scientific research in education. Washington, DC: National Academy of Sciences. http://doi.org/ 10.17226/10236 Smith, W. C. (2016). The global testing culture: Shaping education policy, perceptions, and practice. Oxford Studies in Comparative Education. Oxford: Oxford University Press. Steiner-Khamsi, G. (2015). Standards are good (for) business: Standardised comparison and the private sector in education. Globalisation, Societies, and Education, 14(2), 161–182. doi:10.108 0/14767724.2015.1014883 Stenhouse, Lawrence (1979). Case study in comparative education: Particularity and generalization. Comparative Education, 15(1), Special Number (3): Unity and Diversity in Education, 5–10. Teachers College, Arizona State University (2016). The possibility and desirability of global learning metrics: Comparative perspectives on education research, policy and practice. Scottsdale, AZ: Mary Lou Fulton Teacher’s College. https://education.asu.edu/ globalmetrics Theisen, G., & Adams, D. (1990). Comparative education research. In R. M. Thomas (Ed.), International comparative education: Practices, issues, and prospects (pp. 277–300). Oxford: Pergamon. Tony, B. F., Giacomo, D., Fishbein, B. G., & Buckley, V. W. (2012). International comparative assessments: Broadening the interpretability, application and relevence to the United States. Torney-Purta, J. (1990). International comparative research in education: Its role in

educational improvement in the US. Educational Researcher, 19(7), 32–35. Toulmin, S. E. (1958). The uses of argument. Cambridge: Cambridge University Press. Tramonte, L. (2015). An analysis of the learning resources in lower middle income countries – lessons from PISA 2009. UNESCO Institute for Statistics (2014). Learning Metrics Task Force, 3. Retrieved from www. uis.unesco.org/Education/Pages/learningmetrics-task-force.aspx Wiseman, A. W., & Alexander, W. W. (2010). The impact of international achievement studies on national education policymaking. International Perspectives on Education and Society (Vol. 13, pp. xi–xxii). Bulgarian Comparative Education Society. https:// www.researchgate.net/profile/Patricia_ Almeida3/publication/236261199_International_ Perspectives_on_Education/links/00b7d5176c814 cd615000000/International-Perspectives-onEducation.pdf Wyse, D., Selwyn, N., Smith, E. and Suter, L. (Eds.) (2017). The BERA/SAGE Handbook of Educational Research. London: SAGE. Zajda, J. (2005). International handbook on globalization, education and policy research. New York: Springer. Zarębski, T. (2009). Toulmin’s model of argument and the ‘logic’ of scientific discovery. Studies in Logic, Grammar and Rhetoric, 16(29), 267–283. Zuzovsky, R. (2003). Curriculum as a determinant of learning outcomes: What can be learned from international comparative studies – TIMSS-1999. Studies in Educational Evaluation, 29(4), 279–292. http://doi.org/10.1016/ S0191-491X(03)90014

2 Critical Challenges in Approaches and Experience in Comparative Education Research Brian D. Denman

INTRODUCTION Combining educational philosophy and methodology can be very contentious, as there are numerous approaches that attempt to prove or disprove educational phenomena in any combination in educational research. Broadly speaking, the field of education is interdisciplinary by nature and comprised of many specialties and political aspects that point to various purposes and objectives. One of those specialities is comparative and international education. While ever expanding – at least in terms of research interest – there is ample and demonstrable evidence validating and legitimising its role in improving learning achievement and acquisition at both local and global levels, influencing educational policy and practice. This metanarrative analysis begins with the identification of major trends in comparative education research from the 1950s onwards with the purpose of describing the evolutionary development of methodological

and epistemological underpinnings during this time period. While there continues to be no internally-consistent body of knowledge, no set of principles, theories, or canons of research that are generally agreed upon since Nash (1977), there does appear to be a clear divisiveness between qualitative and quantitative researchers in comparative education. This metanarrative analysis involves a systematic review of journal articles devoted exclusively to comparative education research with the aim of analysing qualitative and quantitative methodologies. The significant journals used for this study include Comparative Education Review, International Review of Education, Comparative Education, and Compare, as they are considered ‘gold standards’ specific to the field. Distinction between research purposes were not taken into account in order to help highlight the area study’s reality as an interdisciplinary ‘add on’ to other areas of study, one that can be viewed as (1) a stand-alone professional

26

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

area of study and teaching (Wiseman & Matherly 2009), (2) an academic (scholarly) field of study (Laska 1973; Masemann 1990; Broadfoot 2003), or as an explicit method of research which is comparative (Mason 2007, p. 1). The principal objective is to translate positional objectivities of key researchers according to their modus operandi and align them with their methodological approaches in an attempt to collect and analyse thematic patterns of development, evolutionary trends, and facilitate a quantification of findings for further analysis. While it is difficult to gauge the essence of what a comparative educator thinks and classify his/her research orientation, the acculmination of data over a 70-year period helps to best determine changes in approaches and experiences over time, influences by like-minded scholars who have taken similar approaches and tact, and a reexamination of the field’s theoretical assumptions, research concentrations, and research trajectories. This approach reflects the research conducted by Larsen, Majhanovich and Masemann (2007), who defined three main stages of development of comparative education research: Establishment Phase: 1950s–1970s Fragmentation Phase: 1980s–1990s Expansion Phase: 2000s–present Future: 2020 Forward?

In addition, the works of Rust et  al. (1999) and Fairbrother (2005) analysed comparative education journals in attempts to explore strategies in content and interpretation, and Theisen and Adams (1990) and Schlager (1999) provided overviews of frameworks, theories, models, and methods for comparative education research. While Watson (1998) and Paulston (1997) devoted much of their scholarly work mapping paradigmatic approaches and experiences through the meta‑ narrative analysis of texts, much of the inspiration to pursue such a study was a need (1) to systematically identify and analyse distinct research planning and design orientations and

trajectories, and (2) to project ways forward for what constitutes as ‘good’ and useful comparative education research. This study utilises elements of each to follow through and analyse links between epistemological positions and methodologies used to identify paradigmatic streams in comparative education research. By identifying streams of approaches, there is a greater likelihood of developing more robust and verifiable research designs and methods to explore, explain and describe educational phenomena. A caveat relevant to this study starts with the premise that knowledge is not static – including the evolving nature of one’s own epistemological stance – and, as such, its verifiable existence is dependent on its use and utility. Kuhn argued that comparative studies must not be a mere point-by-point juxtaposition (Kuhn 1970, p. 3), as the intentions are quite different when approaching research using a qualitative, as opposed to a quantitative lens. With specific reference to comparative education research, while ever-evolving and expanding, its limits may require differentiating temporal ‘streams of consciousness’ with sound methodological pathways.

DEFINING ‘DISCIPLINE’, ‘FIELD’ AND ‘AREA STUDIES’ The interdisciplinarity of comparative education rests in its associated relationships with education, the traditional disciplines, and other academic fields of study (Denman 2017). Contrary to the predictions of many comparative educators, particularly those who were identified in the Establishment Phase (1950s–1970s), comparative education has evolved into a multidisciplinary area of study, associated both with education and the social sciences. In other words, comparative education is not a discipline, field of study or discourse, although it might quite properly be considered a subfield of education.

CRITICAL CHALLENGES IN APPROACHES

The classification of comparative education as discipline, field, or discourse has undergone a host of inadequate attempts at generalising and characterising research generated over a 70-year period. In the 1950s, UNESCO’s Institute for Education recognised comparative education as a discipline, reflecting a characteristic stage of development from 1955, 1963, and 1971 (see Edwards et  al. 1973). At that time, the division of research was concentrated in four inward-looking themes and seven outwardseeking themes: Inward-looking themes: 1 The establishment of curricular goals 2 The pattern and content of the curriculum 3 The process of curricular change and its constraints 4 The operation of the curriculum within the formal educational system, including the evaluator process. (Edwards et al. 1973, pp. 211–212)

Outward-seeking themes included: 1 2 3 4

International textbook comparison and revision Youth culture as an international phenomenon Polytechnical education as an international trend Impact on national educational policy of international developments 5 Influence of international organisations 6 Viability and persistence of national characteristics 7 Educational colonialism through hardware (i.e. through the control of educational technology, communications, media, etc.). (Edwards et  al. 1973, p. 237)

Initially, there were efforts to classify comparative education as a discipline for purposes of ‘scientific legitimacy’. This definition was eventually replaced, characterising it as a multidisciplinary field that examines education in cross-cultural contexts (Altbach 1991, p. 491). Then later, it was seen as a part of area studies, social science, or development/ planning studies ‘orientation’ (Hawkins & Rust 2001). E. Epstein contends that such ‘orientations’ have theoretical import by

27

relating the discourse in comparative education to the factors that influence student behaviour in varying settings, and have practical value in international education by furnishing knowledge regarding how learning structures for international students can be shaped most effectively to produce intended outcomes (E. Epstein 1994b, p. 922). Then and now, scholars in comparative education have expressed colourful rhetorical postulations – Nash’s variegated mosaic – about the area studies and have argued fiercely for a clear definition of what the area study is and is to become. While King (1975, p. 86) argued for an operational analysis of some kind – a ‘critical path’ analysis – which failed to find consensus and ignored trends (Kelly & Altbach 1986); Adams pleaded for probing the assumptions of educational planners and decision-makers (Adams 1990, p. 382); Benson emphasised the relationships between theoretical approaches and the world of practice (Benson 1990, p. 392); King identified comparative education as an entity of one of human choices in a range of complementary settings (King 1990, p. 392); and Paulston encouraged tolerance, reflection, and the utility of multiple approaches in knowledge production and use (Paulston 1990b, p. 396). All these scholars failed to realise that comparative education was not just about schooling, policy, and curriculum studies or societal trends. While Psacharopoulos (1990) expressed discontent for not reaching consensus among his peers in the Establishment Phase – particularly in over-generalisations and non-quantitative accounts – an emerging pattern of comparative education research across national borders began to take form in terms of scientific observation (Hilker 1964, p. 223). It was only about that time in history that the then-titled Comparative Education Society (CES) began to encourage scholarly research in comparative and international studies in education (Brickman 1977) and with that development, international education became a bonafide ‘Siamese-twin’, synthesising comparative

28

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

thesis with international antithesis (Wilson 1994, p. 450) and ultimately rendering obsolete the debates about the differences between comparative and international education (Ninnes & Mehta 2004) in the area. Initially, it was Fernig who felt that comparative education should use the methods of psychology and sociology to obtain objective, ‘cultureand value-free’ measures of the educational process (Fernig 1959, p. 344). While the Methodenstreit (methodological dispute) of the 1960s continued the dispute between ‘quality’ research and ‘good’ research in the field (Welch 2003, p. 25), its vitality, relevance, and practicability in data, data collection, and analysis continue to be what can be considered a shared value but, at the same time, a need to be heavily scrutinised for verifiability. Barber echoed Noah and Eckstein’s assertion that comparative education’s most cogent justification and form is its scientific rather than intuitive lines, expressing concerns that comparative education tends to place its primary focus on overcoming its ‘imagined inferiority’ (Barber 1972, p. 424). Kazamias (2001) also was critical of empiricism due to the importance of pedagogy and teaching approaches (see Eckstein 1975, p. 79); yet, the complementarities of approaches in applying theory to practice are still not well defined or synchronised, nor do they necessarily advance knowledge production. The general anxiety besetting education everywhere would be improved on the basis of universal principles (Cowen 1981, p. 381) if it could only be realised utilising subjects that were like-minded in interpretation for data collection and assessment. Alba laments: …[S]plit subjects, in comparison with their singular and ‘coherent’ antecedents operating in the discursive ether of the grand-narrative, are constituted as subjects with a number of different positions and postures to the world. (Alba 1999, p. 484)

Hence, the age-old contestation about the veracity and quality of comparative education research remains ill-defined due to the variety, breadth, and depth of approaches or

‘orientations’. These may include research approaches involving inductive scientific observation of educational phenomena, traditional positivist approaches such as scientific method(s) to explain such phenomena, and critical discourse analysis that scrutinises and analyses research for faulty assumptions, questionable logic, weaknesses in methodology, inappropriate statistical analyses, and unwarranted conclusions. It is as if each study stands on its own merit, and it is in the eyes of the beholder to determine the usefulness and practicability of the research in question. As Sadler once wrote, things outside the schools matter more than those inside them (Sadler, quoted in Woody, 1955, p. 95). While Chen’s (1994) study on research trends in China helped amplify how that country internally focused on comparable systems, institutions, and pedagogical practices to help further develop their own stances, approaches, and customs, the impetus to ‘navel-gaze’ into research practices in existence abroad is now necessary to help set a course for further development and refinement. Whether existing knowledge is built, expanded upon, or in dispute, it is the subjugation of comparative education research that results in what is provable, beneficial, demonstrable, or context‑ ual. In one of Bereday’s earlier works, he argued that while the variety and indeterminate scope and methods employed in teaching comparative education render comparability all but impossible, the convergence of opinions in organising what is fundamental (both in terms of what should be taught and researched) is necessary. Otherwise, comparative education will suffer from the embarrassment of being unable to explain what it is supposed to encompass (Bereday 1959, p. 4) and lose its sense of usefulness.

SENSE OF PURPOSE AND IDENTITY Perhaps Brembeck (1973) wrote most accurately when he suggested that comparative

CRITICAL CHALLENGES IN APPROACHES

education is alternatively waxing and waning due to its unsteady sense of purpose. This waxing and waning has continued unabated over the last 70 years due in part to comparative education research being a subset cousin of its parent, the field of education and related disciplines in the social sciences. Truth is only a temporary victory over error. Comparative education in not a hidden source of pre-existing truth about education, but its scrupulous study may reveal blatant errors in our thought. (McClellan 1957, p. 9)

In his earlier works, Holmes linked Popper’s critical dualism, which is a classifying framework for educational and societal data, with three broad relationship inquiries specific to comparative education (Popper 1990). These can be identified as follows: a theoretical analyses of relations between concepts or normative statements; b functional relations between institutions or subinstitutions; and c relations between stated policy and institutional outcomes. (Holmes 1972, p. 211)

Later, Holmes modified his paradigmatic approach by adding Dewey’s reflective method of thinking to Popper’s hypotheticodeductive method of scientific inquiry, suggesting that verification is in the difference-making operative between ontology and epistemology. Juxtaposing Dewey with qualitative approaches to verify problems and Popper with quantitative approaches that attempt to refute solutions, there is the distinction between the two and perhaps an argument for the ‘waxing and waning effect’. Before explanation can begin, its object – the explanandum – must be described. Any description may be said to tell us what something ‘is.’ If we call every act of grasping what a certain thing is ‘understanding,’ then understanding is a prerequisite of every explanation, whether causal or teleological. This is trivial. But understanding what something is in the sense of ‘is like’ should not be confused with understanding what something is in the sense of means or signifies. The first is a characteristic preliminary of causal, the second of

29

teleological explanation. It is therefore misleading to say that understanding versus explanation marks the difference between two types of scientific intelligibility. But one could say that the intentional or non-intentional character of their objects marks the difference between two types of understanding and of explanation. (von Wright 1971, p. 135)

All this seems to suggest that there is a genuine tension between interpretations of orthodoxy (stringent forms of methodology) and teleological revision, in which the relaxing of certain theory helps to interpret changes resulting in purpose, goals, or outcomes. Von Wright contends that the law of transmutation of quantity into quality seems to be a good example of Hegel and his tradition and Marx as his opposite. Masemann further differentiated between the ‘objective’ and ‘subjective’ as forever separated by the ‘is’ and the ‘ought to be’ (Masemann 1982, p. 1), suggesting that the approaches taken to pursue comparative education research are divisive. Farrell (1970, p. 269) recognised the need for analysing quantitative data but questioned the nature of the problem and nature of the data. He expressed concerns about the accuracy of cross-national data and the fact that the data frequently do not necessarily fit the assumptions underlying the statistical techniques used. He later asked how quantitative should comparative studies be? (Farrell 1979, p. 3). Science is empirical in so far as it is backed by experience. Theoretical processes involved in the hypothetico-deductive method, therefore including the intellectualisation of a problem, provides direction and attention to data of a certain kind that may offer possible solutions (Holmes 1972, p. 211). Popper’s claim is that scientists are acting more responsibly when they attempt to refute hypothetical solutions (Holmes 1984, p. 588). King referred to Popper’s ‘the logic of situations’ when analysing the ‘flabby inaccuracy of process’ in qualitative research at local levels and ‘causal relationships’ in the form of law which constituted a

30

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

‘deterministic element’ and ‘predictive power’ in quantitative research (King 1977, p. 103). It appears that the Establishment Phase’s embrace of empiricism was ostensibly based to cooperate with or imitate various sciences for mutual gain. Circumstances faced by comparative education research, however, dictate the necessity of improving its robustness and rigour through systematic conceptual schemes but also to operationalise and experiment through multidisciplinary approaches. Anderson (1977) first observed that quantitative data was the missing link in comparative education. Eckstein (1977, p. 355) argued that comparative education research is influenced by the general intellectual and ideological movements of its time, citing a shift from self-interest and nation building in the nineteenth century to a more ethnocentric approach in the twentieth century, to arguably a complementarity of qualitative and quantitative studies in comparative education. Noah and Eckstein emphasised ‘systematic, controlled, empirical, and critical’ in exploring the interaction between school and society cross-nationally (Noah and Eckstein, in Gezi 1971, p. 5). Bereday suggested that ethnocentricism interferes more with the objective interpretation of data than with the objective collection of them (Bereday, in Holmes 1977, p. 16), but when the shift took place in integrating comparative education with international education, a need and desire to improve and refine the collection of statistics across nation-states and cultures began to take form. This included multicultural studies and with greater regularity in the 2000s and beyond (Ortiz de Montellano 2001). Even within the empirical and quantitative approaches, our measuring instruments are of varying precision, ranging all the way from the divining rod to the micrometer, a state of affairs not unknown in the social sciences in an earlier time. In those other fields the distinction between the methodologies of their major approaches have been considered divisive; it is urged that we examine the various

social sciences themselves to understand and avoid such tendencies in our own area (Edwards 1973, p. 81). The evolutionary development of ideas from experience, ways of thinking, and strategies help demonstrate how comparative studies in education have drawn scholars out of their own paradigmatic stance to explore other modes of inquiry either through influence of other notable scholars, political socialisation (Fairbrother 2005), or critical self-reflection. In extreme examples, there are many who concern themselves with comparative education but who do not compare at all. Rossell¯o (1963) describes a comparison ‘in the mind’ of the author who is convinced that the reader himself will compare the one described to him with the situation in his/her own country (Rossell¯o 1963, p. 103). Comparing the area studies of comparative education as rooted in ideology may not necessarily be cause for alarm, but it is disturbing when it occupies a special place in shaping the field of education (E. Epstein 1994a, p. 401). Noah and Eckstein sought to provide rigour in comparative education by advocating the ‘scientific method’ of hypothesis formation, testing, and validation through quantitative data about educational systems and their outcomes (Noah and Eckstein, in Kelly 1992, p. 15), but Carney cautioned against a move towards functionalist epistemology (Carney 2009, p. 66). If comparativists could be more self-conscious about theoretical implications of practical processes, there would be less likelihood of constructing unintentional hegemonies of knowledge (Cowen 1990, p. 323).

THE STUDY The use of the work of Mårtensson et  al. (2016) (see figure 2.1) provides focal points of comparative education research from notable scholars and their various approaches linked by synchronising with Paulston’s

31

CRITICAL CHALLENGES IN APPROACHES

conceptual map (see figure 2.2). Attention is given to the relationships between philosophical underpinnings and methodologies. An analysis of whether comparative education research is seen as evolving or declining will follow with further discussion of how qualitative and quantitative research is being used or not used, how comparativists might consider more collaborative research, mount more rigorous approaches, and respond to the apparent need to establish teams of researchers who embody differing approaches to comparative education to help complement its representation of vigour and breadth. The intersecting and overlapping relationships between perspectivist, constructivist, and rationalist suggest the need to probe more deeply into the limitations of the researcher’s worldview, the approach taken, the research process, and the lens utilised for analysing research findings. Reflective

praxis helps to move beyond generalised, ‘departmentalised’ knowledge, allowing researchers to identify gaps or generic areas of increasing breadth outside known disciplinary boundaries. These may include variables such as individuals, nuances, records, thoughts, ideas, perspectives, dynamics, and sustainability. The researcher must differentiate critically between what can be measured and what cannot and why. This is why certain methodological channels or ‘streams of consciousness’ follow an appropriate but studied plan of attack. Whereas most measurable variables tend to denote ranges of significance (Radnofsy 1996), qualitative variables humanise multiple meanings subject to interpretation and represent co-existing realities. The challenge is to combine such forces in the form of mixed-methods for purposes of complementarity, validity, and substantiation.

Internally Valid Rigorous

Credible

Consistent

Reliable Contextual

Coherent Transparent Original Idea Original

Original Procedure Original Result Relevant Research Idea

Contributory Relevant

Applicable Result Current Idea

Research Quality

Generalizable Structured Consumable

Communicable

Accessible

Understandable Readable

Searchable

Aligned with Regulations Morally Justifiable

Conforming

Ethical

Open Equal Opportunities

Sustainable

Figure 2.1  Mårtensson et al.’s concept hierarchy of research quality (Mårtensson et al. 2016, p. 599)

32

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Figure 2.2  Paulston’s conceptual map of perspectivism, constructivism, and rationalisms (Paulston 1997, p. 118)

Research ‘Stream’ Trajectories By analysing epistemological positions and paradigmatic approaches over a 70-year time period, it is possible to loosely categorise ‘streams of consciousness’ characterising comparative education research. While more in-depth treatment of these various streams will be explored further, the interpretation of these trajectories is in its infancy stage and may require further refinement. It is understood that in acknowledging trends in ‘stream’ trajectories, the researcher is presented with greater opportunity to develop sound research methodologies and research designs – from both qualitative and quantitative approaches – and the ability to delimit the research, which helps improve its validity and strength. Figure 2.3 depicts four research ‘stream’ trajectories that offer some guidance for further deliberation and discussion. If paradigmatic, epistemological leanings were aligned with research methodology, a

clarification of research strategies and designs could be formulated in an attempt to improve the processes of researching educational phenomena, data collection of evidence, analysis, interpretation, and implications. The following data presented in figure 2.4 are a 70-year collection of seminal works of comparative education researchers who explored the establishment, development, and future of comparative education as a quintessential area study.

FINDINGS Patterns in the Establishment Phase (1950s–1970s) A number of observers, most of them Westerners and often from afar, objectively researched aspects of educational phenomena from various countries (Hall 1967, p. 189). The pattern shifted to an increasing number of scholars coming from other areas

33

CRITICAL CHALLENGES IN APPROACHES

to confirm & validate

to explain & predict

to build theory & understand

to test theory

Hermeneutical Constructivist

Deconstructive Perspectivist

to describe views & point-of-views

to categorise & classify into meaningful structures

to align theory with historical thought

to describe relationship between policy & practice

to describe & explain

to explore & interpret

to dismiss theory

to uncover reality

Critical Rationalist to identify understandable 'exceptions to the rule'

Technical Rationalist

to relate to consumable & accessible knowledge & information to produce evidence with a detached view

to identify & discern ethical &/or sustainable anomalies or issues

Figure 2.3  Research ‘stream’ trajectories (characteristics)

of the world writing and describing their approaches and experiences at the coalface, since many comparative educators had the inability or tools necessary to conduct fieldwork (Henry 1973, p. 232). Hall also notes a general lack of linguistic competence and Farrell a lack of empirical studies during this phase (Farrell 1970). Hilker (1964, p. 225) inventively produced a ‘process of comparing’ that has been useful in reflecting the scope, depth, and integration of qualitative and quantitative research in comparative studies. These included: (a) the description of phenomena; (b) the (analytic) explication of underlying forces; and (c) the comparison in the narrower and specific sense (the last process described deserves attention as it is what Hilker believed was often missing in comparative education research). In its simplest form, it consists of the juxtaposition of quantitative data, which reflects decrease,

increase, or no change. When, however, qualitative differences are under study, it is necessary to seek to find a common denominator (tertium comparationis), before a decision about the direction of further procedures and/ or approaches can be made. When prompted to extend the scope of using large-scale studies, Foshay argued that such studies could shed light on the nature of educational achievement according to national cultures and traditions (Foshay 1963, p. 267), which prompted the first development of large-scale studies, including the importing and exporting of educational ideas, systems, and techniques. In an attempt to avoid serious errors and dangerous consequences in transplantation, identifying key theoretical reference points were needed, which required a well-developed comparative education studies programme both in qualitative and quantitative

34

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Figure 2.4  Data of seminal scholars and their approaches/experiences Research Quality

Credible

Contributory

Hermeneutical Constructivist

Deconstructive Perspectivist

Woody 1955; Hans 1959; Bone 1960; Brickman 1960; Rossell¯o 1963; Petty 1973; Lawson 1975; Broadfoot 1977; Cowen 1981; Kelly 1987; Schriewer 1988; Masemann 1990; Rust 1991; Hawkins & Rust 2001; Groux 2013 Moehlman 1963; Berstecher & Dieckmann 1969; Siffin 1969; Hopkins 1973; Laska 1973; Anderson 1977; Ross 1992; Watson 1998; King 1975, 2000

Arnove, Kelly & Altbach Taba 1963; Eckstein 1982; Welch 1985; 1977; Eckstein & Adams 1990; Benson Noah 1985; 1990; Thiesen & Broadfoot 1995; Adams 1990; SteinerHoffman 1999; Khamsi 2006; Carney Crossley & Holmes 2009 2001; Mehta & Ninnes 2003; Ninnes & Burnett 2003; Price 2003 King 1965; Carey 1966; Kienitz 1971; Cormack Fletcher 1974; 1973; Eckstein Thomas 1990; 1975; Farrell 1979; Altbach 1991; Cowen 1990; Kelly 1992; Chen Clayton 1998 1994; Husén 1979, 1996; Welch 1998; Stromquist 1995; Crossley & Jarvis 2000; Paulston 1990a, 2000; Arnove 1980, 2001; Broadfoot 2003; Fairbrother 2005 Bereday 1967a, 1967b; Farrell 1970; Paulston Pennar, Bakalo & 1976, 1990b, 1994 Beredy 1971; Jones 1971; Kazamias 1961; Butts 1973; Radnofsky 1996; Mollis & Marginson 1999 Crossley 2000 Eckstein & Noah 1992; Alba 1999

Communicable

Samonte 1963; Brembeck 1975; Eckstein 1983

Conforming

Edding 1965; Henry 1973; Mitter 1997; Dale 2005; Mason 2007

studies (Taba 1963, p. 171). Understanding that educational reforms are a country’s own prerogative, what the comparativist can do is merely to help individuals in the research aspect of the problem and not in its application (Kazamias 1964, p. 218). While large-scale comparative studies may be valuable for decision makers, they rarely illuminate the actual decisions that

Critical Rationalist

Technical Rationalist Hall 1967; Barber 1972; I. Epstein 1995

Fernig 1959; Barber 1973; Brickman 1977; Holmes 1984; Postlethwaite 1987; Rust et al. 1999; Bray 2002; Sellar 2015

Hochleitner 1959; Foshay 1963; Holmes 1972; Husén 1983; Epstein 1994; Bray 2003, 2004a, 2004b Hilker 1964; Clayton 1972; Raivola 1985; Psacharopoulos 1990; Spaulding 1992; Wiseman & Matherly 2009

must be made at local levels (Noah 1974, pp. 345–346).

Patterns in the Fragmentation Phase (1980s–1990s) As ‘pure scientists’, comparativists should attempt to formulate alternative policies to carefully analyse problems and eliminate

CRITICAL CHALLENGES IN APPROACHES

those that will be less successful in particular countries. As ‘applied scientists’, comparativists should be prepared to see how policy makers may be able to help those responsible for policy, to implement adopted policies, and to help them to anticipate the outcomes and problems which are likely to arise when a policy has been implemented (Holmes 1981, p. 54). Fragmentation arose in part from those comparativists who sought to broaden the research base in terms of research experimentation and discovery and those who sought to operationalise and compartmentalise it. Hackett (1988, p. 399) introduced the idea of the aesthetic dimension of comparison to improve cultural perceptions, resulting in a culture-as-difference approach (Hoffman 1999, p. 472) and moving away from positivist theories. In a further development, postmodernism gave rise to one of the quintessential features of ‘modern man’ – a ‘cold rational’ demeanour – using this as a stick with which to beat more ‘primitive’ societies in which ethics formed (Welch 1985, p. 15) and hegemonic struggles ensued. Both reification and a technocratic scientism became regular features, if not functionalist explanations, of comparative education and the social sciences (ibid., p. 15). This extends the duality of endogenous and exogenous influences, which suggests that while the bipolarity of North–South issues continue, the additional advancement of technology is subjecting life to the supremacy of technology (Mitri 1997, p. 127). As Irving Epstein argues: [Comparative education] has faced the prospect of growing de-institutionalisation and fragmentation due to its generalist orientation and the inherent ambiguity of its character: professional yet theoretical. (I. Epstein 1995, p. 7)

According to Gibbons et  al. (1994), a new mode of knowledge production began to emerge. Unlike the familiar mode, which was discipline-based and carried a distinction between what is fundamental and what is

35

applied, the new mode is transdisciplinary; characterised by a constant flow back and forth between the fundamental and the applied, and between the theoretical and the practical (Gibbons et al. 1994, p. 19). It was at this time that comparative education research became more reflexive and eclectic, thus allowing new theory to emerge from often paradoxical combinations of existing theories (Paulston 1994, p. 932).

Patterns in the Expansion Phase (2000s to Present) Grant argued that comparative education has the capacity to do in space what educational history does in time; it can provide the opportunity to understand more fully the workings of the educational process by providing a wider view than the here and now (Grant 2000, p. 317). Thus, comparative studies of present systems are exploring what previous educators inherited from the past, how they responded to need and feasibility, and how change has been perceived – both positively and negatively (King 2000, p. 268). Bray (2004a, p. 69) ruminated that comparative education is branching out further to other academics (other than those in the social sciences). While Crossley and Jarvis echo Bray’s sentiment, suggesting that the rich variety of foundations, traditions, and possible futures for a diversity of comparative educations may well prove to be one of the field’s greatest assets in meeting the complex challenges of the twenty-first century (2000, p. 264). However, in another publication, Crossley (2000, p. 327) expressed concern about the field responding too directly to changing disciplinary fashions, with the result that the phases of its own development mark the rejection of past practices, rather than a cumulative advancement. Irving Epstein (1995) and Groux (2013) saw the field as exploiting or being interested in the ‘other’, Hans (1959) and Dale (2005, p. 117) saw the field as moving towards a methodological

36

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

nationalism, which debunks the notion of a free-flowing form of researching freedoms. Certainly, the rules of global development and engagement have never been more open and/or dangerous. Where modern thought is largely shaped by notions of knowledge advancement and scientific rationality, there are powerful, new, and unsettling ideas that are permeating the consciousness and work of the comparativist: • Changes in thinking about thinking • Changes in identity and boundaries have been quickened by technological change • Changes in how we choose to represent our separate ‘truths’. (Anderson, in Paulston 2000, p. 9)

Bray (2004b, p. 10) laments that while the field welcomes scholars who compare almost anything comparative, the fact that many of those scholars are inadequately systematic in their comparison is problematic at best. Connell (2014) lists the practical uses and applicability of southern and postcolonial perspectives that suggest that scholars are reflecting more on the coloniality and northern dominance of knowledge production. What it indirectly demonstrates is the need for scholars too to move out of their own perspectives and positional vantage points. As von Wright (1971, p. 40) states: The usefulness of condition concepts is not in conflict with the fact that they also provoke problems. The problems may be said to concern the ‘place’ of condition concepts in logic. … An extensionalist view of conditionship relations might not be connected with internal philosophical complications.

Shelf Life This research reflects an ongoing problem regarding qualitative and quantitative research in comparative education. Notwithstanding the fact that inductive and hypo-deductive methods treat the research problem from differing positions, there are also differences in terms of ‘standards’, variances in temporal priorities, and questionable

‘shelf life’ of observations, ways of knowing, and verifiability in terms of audience. What may be justifiable in the 1950s, for instance, must be judged by the standards of the approach at that point-in-time (Husén 1979, p. 372). Uniformity in method may be considered self-evident for a particular geolocational space and moment, but it loses its veracity and vitality over time due to having little ‘value’ or meaning depending upon context, relevance, and ‘useability’. Case study analysis is a good example of a common comparative education methodology used to explore intensive, qualitative observations, but as research has evolved over a 70-year period, isolated comparative case study analysis tends to be just that: isolated and stand-alone with very little applicability across cultures, timelines, or contexts. Anecdotal evidence suggests that comparative education research will likely become increasingly collaborative, both from differing worldview perspectives and a combination of mixed methods or approaches in an attempt to improve research ‘shelf-life’ and veracity.

Sustainability Research generated over the years reflects an essential balance of interpretation and constructive tension in scrutinising ‘evidence’. This can be a double-edged sword. Notwithstanding the differing approaches to researching educational phenomena for purposes of schooling or society, the variety of methods employed suggest an ever-expanding cosmos of both experimentation and operationalism. The review of qualitative research reflects a lionshare of approaches undertaken over the 70-year period, but evidence suggests that this is more a reflection of the self-interest of the researcher(s) in question and not necessarily a need to seek amelioration of an educational issue. Qualitative methodologies provide ‘voice’ where voice may be supressed or restricted,

CRITICAL CHALLENGES IN APPROACHES

while quantitative research tends to increasingly rely on secondary data sources. The challenge of ensuring the quality of data is never-ending and, like other disciplines, fields, and area studies, comparative education research requires elements of discretion in discerning ‘facts’ from ‘fiction’, identifying discrepancies and distortions, and building a sound combination of research methodologies and design. Specialising its craft may require a need to re-examine principles and practices (Eckstein 1983), improve the alignment between educational pursuits and employment prospects, and justify the need to advance area studies by means of identifying existing research strengths. For those inclined to continue research pursuits on the basis of personal whims and privileged freedoms to conduct research for research sake, there is pause in the viability of researching for advancement’s sake. Increasingly, institutions are challenging what research is considered ‘useful’, as research is increasingly perceived as a ‘performance-based’ commodity and/or variable that is measurable and must deliver tangible evidence of impact or outcomes.

Scope Institutional research has clearly expanded beyond the nation-state, yet the controls on educational pursuits are creating barriers and discouraging research generation and progress. The communality of scholarly networks has precipitated an emerging debate about knowledge knowing no boundaries, but the advent of technologies is stratifying how information and knowledge are accessed and disseminated, the equality and equity of perspective (voice), and the opportunity for further engagement, development, and action. When investigating scope, a comparative researcher must be cognizant of the limitations of research. No longer should comparative education confine itself to studying

37

existing systems of education and simply describing what they are and what they are not. The area study must go further to refine the developing and analysing in varying cultural and temporal settings, explore experimental modes of inquiry, propose prescriptions for educational problems based on reliable data (variables), and employ appropriate and relevant ‘ethical’ approaches. The pressures for educational change are increasingly fluctuating with a ‘waxing’ and ‘waning’ effect, but it is not just international comparisons that need attention. Subcultural or multicultural studies require a comparative and cross-cultural perspective in an effort to effectively develop solutions to educational issues or challenges ahead, including the need to educate for increased uncertainty.

Mutuality Comparative education continues to promote and encourage a cross-fertilisation of ideas among research workers, fellow educators, and other academic disciplines and fields of study. Multilingual (Mollis & Marginson 1999) and non-linguistic approaches (Radnofsy 1996) should continue to increase with greater levels of ‘equity’, conveying ‘voice’ at international levels, but comparativists should not lose sight of the realities of knowledge consumption and what drives it vis-á-vis knowledge advancement for the sake of knowledge creation. Teaching and learning freedoms include research freedoms as well. Becher and Trowler (2001, p. 11) argue that most academics in the world continue to believe that they are viewed as independent professionals. This seems to suggest that the major shifts in the academic landscape involve significant environmental changes to the professoriate, the digitalisation of current research to measure ‘so-called’ activity, and the reassessment of what constitutes ‘quality’ or ‘good’ research. The structural forces at play with deciding epistemological

38

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

character and methodological standards suggest a mindset that often negates new approaches and perspectives while, at the same time, it promotes systematically-sound analysis for increased ‘standards’. The common thread is its theoretical and practical significance, contributing to the professional training of educators, informing policy and practice, and expanding sets of analytic categories and modes for examining the realities of education and society (Arnove, Kelly, & Altbach 1982, p. 3). Edding was one of the first to argue cooperation between educational and economic researchers will be soundly based if both parties are careful to avoid straying too far from the essential core of education and/or trespassing on each other’s academic territory (Edding 1965, p. 454). Bray and Manzon (2014) refer to the professoriate’s quest for distinction, which may intensify a climate of collegial competition, especially among those who hold similar positions or stances. As Holmes once stated, ‘within diversity there is unity’ (Holmes 1985, p. 344), and as long as comparativists are committed to the provision that education is a human right in ways that will benefit society, then competition will be replaced with greater opportunities to collaborate and cooperate. Danger lurks when the seeking of identifying trends is seen as superficial, practical approaches to seeking solutions is viewed as another form of ‘cherry picking’, and there is a general lack of historic and cultural understanding of how systems and societies developed. The transplantation of ideas, policies, and practices cannot and should not be implemented without the tools of both qualitative and quantitative studies. Both provide balance in their own right.

Research Rigour Comparative education research is often approached incorrectly. In an attempt to improve research quality, a systematic environmental scan and analysis helps determine

in advance what data may be relevant, what methods may be most appropriate in revealing them, and how to set forth findings (King 1965, p. 147). Preconceived notions of antici‑ pated conclusions should be avoided at all costs, as the data will likely reflect their message in due course and without bias. Hence, the importance of taking stock of what the area study is and is to become is critical in appropriately measurable relationships. The vitality and image of comparative education hangs in the balance by virtue of being a model of implementing its work (Eckstein 1970) and recognising the limits of our domain (Edwards 1970, p. 240). Those who have attempted to define the training of comparative education researchers as cross-disciplinary have created the result of ‘…doctoral dissertations written by “mules” who somehow cannot compensate for the loss of depth in each discipline by the breadth of insights gained from more general observations’ (Bereday 1967b, p. 180). Yet, as long as jobs and tenure are still secured in traditional departments, the research generated in comparative education will be written with both eyes focused on other area studies (Koehl 1977). Bereday’s ‘purist’ views suggest feral approaches to the subject, yet many scholars contend that interdisciplinary action is the only way forward for the area study to find legitimacy and useful in the foreseeable future (Carey 1966). If comparative educators choose to give their major scholarly allegiance to a variety of non-education fields – such as economics, sociology, anthropology, and history – then the area study must inevitably seem eclectic. The unifying notion is the object of education and a generally comparative approach (Lawson 1975, p. 347), with emphasis placed on sound scholarship and purposeful effort (ibid., p. 353). Systematic analysis and review are critical. Bray and Qin (2001) introduced a comparison of comparisons, Cowen (2000) emphasised the plural of comparative education(s), Fairbrother (2005) replaced

CRITICAL CHALLENGES IN APPROACHES

‘comparative’ with ‘experimentation’ and ‘dispositions’, and Marginson and Mollis (2001, p. 54) distinguished between universalism (sameness) and ultra-relativism (difference). Parkyn emphasised the importance in the study of methodology in comparative education (Parkyn 1977, p. 90), in which methodologism is equated with self-consciousness about methodology (Petty 1973, p. 389). Collaborative research has the impetus to significantly strengthen the validity of cross-cultural studies and, by incorporating the agendas, priorities, and interpretation of insiders, research findings have the potential to be more meaningful and helpful for policymakers (Crossley & Vulliamy 1997). Thus, efforts to increase the explanatory power of data and their impact on educational policy-making and governance can be achieved by linking data from different assessments, by undertaking secondary analyses of data sets and by making findings more accessible and useable for policy-makers and others (Sellar 2015, p. 772). Most of the critical reviews of comparative education literature underline the methodological weaknesses of the field by contrasting them with the higher degree of sophistication which is the practice of other comparative social sciences (Schriewer 1988, p. 27).

EVIDENCE Knowledge is not static, nor are data. The pure, unadulterated, and naked truth is not an absolute but a manifestation of social reality constructed at a particular time and place. Therefore, it is up to the researcher to recognise this before using data in ways that discover underlying truths without inciting misrepresentation, bias, or personal preference. What is often considered the most reliable source of data for comparative education are primary data, where imperfections and irregularities are disregarded when considering the validity of a researcher’s conclusions.

39

The integrity of the research must therefore meet strict ‘explicit’ standards in the criteria for determining the admissibility of data and applying appropriate measurements to identify ‘truths’. Secondary data refer to accounts from other known sources and, as a result, are less reliable. Distortions of ‘truth’ by channels of communication through which they pass may lead to imperfections in perception. Imprecise channels of communication require due diligence in scrutinising and testing the ‘cleaning’ of the data. When using secondary data, the greater the need is to err on the side of caution in interpreting and reporting research. Connell (2015) argues that the Global North has aggregated data to the extent that information (e.g. data) is assembled in ways that ensure the greater wealth and power are secured by the colonised world. This suggests that the perceived values and effects of globalisation are often contested, classified as uneven, and value laden when it comes to researching education. Patterns of imitation, difference, domination, and subordination in education policy and practice (Marginson & Mollis 2001, p. 612) continue to contort what constitutes as evidence: how it is measured, assessed, and interpolated. In their design, virtually all comparative education studies approximate an epistemological stance in which social facts function – in relationship with some aspect of the educational system – as dependent or independent variables. The complexity in circumstance and the distinction between social, mental, and physical worlds requires the need to describe reality and seek truth. As Broadfoot states: Even assuming that policy makers are themselves not tainted by corruption and self-interest, the complexity of modern societies is such that it is hard to deny the legitimacy of these rival agendas. (Broadfoot 1995, p. 299)

Obtaining more research data does not necessarily result in greater substantiation and legitimacy; however, there is an increasing

40

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

need to demonstrate variables that reflect greater veracity and the approaches used to determine how such variables were acquired, verified, and interpolated. The quality of variables cannot be overemphasised, but quality and reliability of appropriate measuring instruments are equally important, because in order for the research to be considered robust, it must reflect consistent results when the characteristics being measured have not changed.

IMPACT Measurement is just that – a measurement. Its value is limited to what is being measured, how, and why. Existing structures and human agency are used as allocated resources to measure those things that are tangible, yet for the unexacting science of education, peer review is a far greater ‘valid’ indicator than metrics (Moodie 2017) because of impact. Like knowledge and data, impact is spatialtemporal and, as such, geo-locational impacts would differ in levels of significance and a perceived sense-of-impact would also differ over time. As Campbell states: The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. (Campbell 2011, p. 34)

Connell (2015, p. 14) concurs that the knowledge production process forces research into a mould of competitive individualism, which distorts and trivialises knowledge creation. Therefore, peer review must equally require rigourous scrutiny, as its impact has just as much influence on process as it does on tangible outcome or deliverables. As Boyer points out: The scholarship of discovery, at its best, contributes not only to the stock of human knowledge but also to the intellectual climate of a college or university. Not just the outcomes, but the process,

and especially the passion, give meaning to the effort. (Boyer 1990, p. 17)

The explosive yet enormous masses of largescale data relating to educational systems, policy and practice have accumulated over the last 70 years, yet the study of impact has all but remained in its infancy stage and with dismal results. Growing tensions between powerful ‘localising’ and ‘globalising’ forces increasingly mean that local issues cannot be understood without reference to the global contexts. Therefore, a comprehensive understanding of both the local and the global is needed (Crossley & Holmes 2001, p. 396). Moreover, the allocation of resources used to measure what can be measured are not equal, hence the need to exercise caution in evaluating research ‘impact’. When PISA, PIRLS, and TIMMS came to exist, it was not anticipated that other nation-states would follow the lead of OECD member countries to measure and assess educational achievement at international levels. The process by which international assessment exercises take place denotes an empirical process that may have led to greater cooperation, integration, complementarity, and convergence within a cross-national geographical space, but disparities and mis-information continue to exist. The question of comparability rests in whether the concepts under comparison correspond, how the correspondence of measurements are assessed, how concepts are linguistically expressed, and resolved (Raivola 1985, pp. 368–369), and whether there is an even playing field. The introduction of research bibliometrics include software programs that aim to measure scholarly impact. Bibliometrics create a form of rankings that essentially undermine the essence of quality research. Notwithstanding the intensiveness in their usage, the damning of the research process is affected on all levels, from production, circulation, dissemination, and advancement. The measurement of impact gets lost in the process to measure impact. The artefacts of the

CRITICAL CHALLENGES IN APPROACHES

process include how to involve educational stakeholders or audience, how to measure change between theory and practice, and how and what to measure in terms of added value. At present, further research is necessary to develop impact measurements in comparative education.

COMMENTARY The articles analysed for this study represent only a fraction of a comprehensive range of material dedicated to qualitative and quantitative studies in comparative education, nevertheless the challenges to approaches and experiences set-forth serve to demonstrate discursive and complicated techniques that reflect a general lack of substantive engagement between qualitative and quantitative approaches. Hence, a more balanced approach to using such methods is in order to build more critical resilience and useability in comparative education research. There are basically four sides in this ‘war of paradigms’: First, there are the ‘irredentists’ who believe that no educational solution is viable unless autonomously invented by local cultural authorities. Second, there are the ‘single solution specialists’ who have an answer already ready – educational technology, vouchers, modular learning, management information systems, distance teaching, decentralisation, etc. Third, there are the ‘conspiritists’ who believe that empirical research with universal standards of excellence violates natural complexity and is politically unacceptable because it places education research institutions on the periphery at a disadvantage. … Fourth and last are the ‘modelers’ who believe in absolute interpretations of social science. (Heyneman 1993, p. 516)

The trend towards increased theoretical, epistemological, and methodological pluralism points to a crossroads for comparative education as an area study. Echoing Heyneman’s war of paradigms, there is concern that ‘streams of consciousness’ may lead to what Bernstein (1991) identified as fragmenting,

41

flabby, polemical, defensive, and fallibilistic pluralism. Fragmenting pluralism refers to a small group of scholars who may share their own biases together but who may no longer experience the need to talk outside their own group. Flabby pluralism refers to the process of borrowing from different orientations, leading to superficial poaching, questionable plagiarism and, at the same time, undermining everyone in the process. Polemical pluralism refers to one’s ‘streams of consciousness’ becoming an ideological weapon, which advances his/her own orientation at the detriment of others. Defensive pluralism, Bernstein’s notion of tokenism, is another illustration of possibilities, in which comparativists pay lip service to others, remain complementary and compliant in scholarly gatherings, but become increasingly convinced that there is nothing to learn from ‘stand-alone’ research (Bernstein 1991). In disciplines like philosophy with multiple subfields (including education) and specialties such as comparative education, scholars must want to listen to, acknowledge, and stay in touch with the theoretical works of academics and the ‘best’ work of practitioners in other subfields in the interests of advancing comparative education. Bernstein calls this behaviour engaged fallibilistic pluralism. It places new responsibilities on every comparativist, as it falls upon each individual scholar to recognise themselves as both part of the problem and part of the solution (Bernstein 1991, p. 335). The greater the ‘purity’ within theoretical, epistemological, and methodological positions and perspectives, the greater the recognition and respect between scholars. Pluralistic ecologies of practice ensure that diversity is maintained and sustainable. The taxonomy of comparative education by induction suggests that there are new research trajectories, ‘streams of consciousness’, or practices. Over the past 70 years, the classification of philosophical and methodological approaches has expanded to include:

42

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

positivist-empiricism: functional and grounded theory approaches hermeneutical-qualitative: discourse, textual, narrative, conversational analysis (including some poststructuralism approaches) critical-specialisms: feminist, postcolonial, and indigenous approaches.

These take different forms and positions about the nature of theory, epistemology, and methodology, hence the need to research them in a holistic fashion. These approaches take different views of the purpose of science: (1) a technical interest in knowledge on behalf of controlling things in the world (positivist-empiricism); (2) a practical interest in understanding things (hermeneuticalqualitative); and (3) an emancipatory interest in discovering how understanding and actions produce untoward outcomes and how by transforming understandings and actions may produce outcomes which are less untoward (for the sake of creating more reason ‑able, productive, sustainable, just, and democratic research – critical-specialisms). Such differentiation falls in line with Habermas’s ‘knowledge-constitutive interests’, applicable to different kinds of research. It also adds depth and further complexity to the purposes of comparative education research and, hence, its advancement. In the absence of an agreed upon methodology among theoreticians and practitioners, comparativists will continue to collect data and provide interpretative studies (Holmes 1977, p. 129). The future should be seen as building up a corpus of knowledge which can be used and interpreted in the light of established and emerging paradigms (ibid. 1977). In another concern, the financial difficulties of universities in many nation-states and the relative paucity of research funding presents ongoing problems for comparative education. Since most comparative educators are located within faculties of education, where these circumstances are particularly serious, the area study is in difficult straits (Kelly & Altbach 1981, p. 24). The next

period of comparative education scholarship will involve an engagement with global currents of twenty-first-century life, a rigorous blending of quantitative and qualitative methodologies in well-justified comparisons, and a commitment to the quest for more general insights about the key building blocks of education (Broadfoot 2003, p. 276). Documents, like research, mature over time. It is important to recognise good efforts despite concerns about the propensity to use quantitative information more in large-scale studies and the loss of quality in qualitative observations or interpolations of the data. Seth Spaulding (1992) recognised this early when reviewing UNESCO’s first World Education Report. Recognising how remarkable the results were even with limited budgets, the quality of the research still required ‘maturity’. Perhaps comparativists should recognise this so that we can live with both its operational and experimental trajectories.

Acknowledgement I wish to acknowledge the support of my colleagues Stephen Kemmis, Helen McAllister, Patricia Donald, and Eleanor Colla, with whom I discussed and sought advice on early drafts of this chapter.

REFERENCES Adams, Don (1990). ‘Analysis with Theory is Incomplete’. Comparative Education Review, Vol 34, No 3 (August 1900): 381–385. Alba, Alicia (1999). ‘Curriculum and Society: Rethinking the Link’. International Review of Education, Vol 45, No 5/6: 479–490. Altbach, Philip G. (1991). ‘Trends in Comparative Education’. Comparative Education Review, Vol 35, No 3 (August 1991): 491–507. Anderson, C. Arnold (1977). ‘Comparative Education over a Quarter Century: Maturity and Challenges’. Comparative Education

CRITICAL CHALLENGES IN APPROACHES

Review, Vol 21, No 2/3, The State of the Art (June–October 1977): 405–416. Arnove, Robert F. (1980). ‘Comparative Education and World-Systems Analysis’. Comparative Education Review, Vol 24, No 1: 48–62. Arnove, Robert F. (2001). ‘Comparative and International Education Society (CIES) Facing the Twenty-First Century: Challenges and Contributions’. Comparative Education Review, Vol 45, No 4: 477–503. Arnove, Robert F., Gail, P. Kelly, & Philip, G. Altbach (1982). Comparative Education. New York: Macmillan. Barber, Benjamin R. (1972). ‘Science, Salience and Comparative Education: Some Reflections on Social Scientific Inquiry’. Comparative Education Review, Vol 16, No 3 (October 1972): 424–436. Barber, Benjamin R. (1973). ‘Reply to Petty’. Comparative Education Review, Vol 17, No 3 (October 1973): 393–395. Becher, Tony & Paul R. Trowler (2001). Academic Tribes and Territories: Intellectual Enquiry and the Culture of Disciplines (2nd edition). Buckingham: The Society for Research into Higher Education and Open University Press. Benson, J. Kenneth (1990). ‘The Underdevelopment of Theory and the Problem of Practice’. Comparative Education Review, Vol 34, No 3 (August 1990): 385–392. Bereday, G. Z. F. (1959). ‘Some Methods of Teaching Comparative Education’. Comparative Education Review, Vol 1, No 1 (June 1957): 13–15. Bereday, G. Z. F. (1967a). ‘Some Methods of Teaching Comparative Education’. Comparative Education Review, Vol 1, No 1 (June 1967): 13–15. Bereday, G. Z. F. (1967b). ‘Reflections on Comparative Methodology in Education, 1964– 1966’. Comparative Education, Vol 3, No 3 (June 1967): 169–187. Bernstein, Richard J. (1991). ‘Pragmatism, Pluralism, and the Healing of Wounds’. The New Constellation (pp. 323–340). Cambridge: Polity Press. Berstecher, Dieter & Bernhard Dieckmann (1969). ‘On the Role of Comparisons in Educational Research’. Comparative Education Review, Vol 13, No 1 (February 1969): 96–103.

43

Bone, Louis W. (1960). ‘Sociological Framework for Comparative Study of Educational Systems’. Comparative Education Review, Vol 4, No 2 Special Issue (October 1960): 121–126. Boyer, Ernest L. (1990). ‘Scholarship Reconsidered. Priorities of the professoriate’. The Carnegie Foundation for the Advancement of Teaching, Special Report. Princeton, New Jersey. Bray, Mark (2002). ‘Comparative education in East Asia: Growth, development and contributions to the global field’. Current Issues in Comparative Education, Vol 4, No 2: 70–80. Bray, Mark (2003). ‘Comparative Education in an Era of Globalisation: Evolution, Missions and Roles’. Policy Futures in Education, Vol 1, No 2: 209–224. Bray, Mark (2004a). ‘Other Fields and Ours: Academic Discourse and the Nature of Comparative Education’. Comparative and International Education Review, Vol 3: 61–85. Bray, Mark (2004b). ‘Comparative Education and Allied Fields: A Perspective from the World Council of Comparative Education Societies’. Keynote address for the Australian and New Zealand Comparative and International Education Society (3 December 2004), Melbourne. Bray, Mark & Gui Qin (2001). ‘Comparative Education in Greater China: Contexts, Characteristics, Contrasts and Contributions’. Comparative Education Review, Vol 37, No 4: 451–473. Bray, Mark & Maria Manzon (2014). ‘The Institutionalization of Comparative Education in Asia and the Pacific: Roles and Contributions of Comparative Education Societies and the WCCES. Asia Pacific Journal of Education, Vol 34, No 2: 228–248. Brembeck, Cole S. (1975). ‘The Future of Comparative and International Education’. Comparative Education Review, Vol 19, No 3 (October 1975): 369–374. Brickman, William W. (1960). ‘A Historical Introduction to Comparative Education’. Comparative Education Review, Vol 3, No 3 (February 1960): 6–13. Brickman, William W. (1977). ‘Comparative and International Education Society: An Historical Analysis’. Comparative Education Review, Vol 21, No 2/3 (June/October 1977): 396–404.

44

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Broadfoot, Patricia (1977). ‘The Comparative Contribution – A research Perspective’. Comparative Education Review, Vol 13, No 2 (June 1977): 133–137. Broadfoot, Patricia (1995). ‘Notes and Comments’. Comparative Education Review, Vol 31, No 3 (August 1995): 299–301. Broadfoot, Patricia (2003). ‘Post-Comparative Education?’. Comparative Education, Vol 39, No 3: 275–277. Butts, Freeman (1973). ‘New Futures for Comparative Education’. Comparative Education Review, Vol 31, No 3. Campbell, Donald T. (2011). ‘Assessing the Impact of Planned Social Change’. Journal of MultiDisciplinary Evaluation, Vol 7, No 15 (February 2011): 3–43. Carey, Robert D. (1966). ‘Conceptual Tools for Research in Comparative Education’. Comparative Education Review, Vol 10, No 3: 418–425. Carney, Stephen (2009). ‘Negotiating Policy in an Age of Globalization: Exploring Educational ‘Policyscapes’ in Denmark, Nepal, and China’. Comparative Education Review, Vol 53, No 1 (February 2009): 63–88. Chen, Shu-Ching (1994). ‘Research Trends in Mainland Chinese Comparative Education’. Comparative Education Review, Vol 38, No 2 (May 1994): 233–252. Clayton, A. Stafford (1972). ‘Valuation in comparative education’. Comparative Education Review, Vol 16, No 3: 412–423. Clayton, Thomas (1998). ‘Beyond Mystification: Reconnecting World-System Theory for Comparative Education’. Comparative Education Review, Vol 42, No 4 (November 1998): 479–496. Connell, Raewyn (2014). ‘Using Southern Theory: Decolonizing Social Thought in Theory, Research and Application’. Planning Theory, Vol 13, No 2: 210–223. Connell, Raewyn (2015). ‘Social Science on a World Scale: Connecting the Pages’. Sociologies in Dialogue, Vol 1, No 1 (July–December 2015): 1–16. Cormack, Margaret L. (1973). ‘Is Comparative Education Serving Cultural Revolution’. Comparative Education Review, Vol 17, No 3 (October 1973): 302–306. Cowen, Robert (1981). ‘Sociological Analysis and Comparative Education’. International

Review of Education, Vol XXVII (February 2009): 385–395. Cowen, Robert (1990). ‘The National and International Impact of Comparative Education Infrastructures’. In W. D. Halls (Ed.), Comparative Education: Contemporary Issues and Trends (pp. 321–352). London: Jessica Kingsley. Cowen, Robert (2000). ‘Comparing Futures or Comparing Pasts?’. Comparative Education, Vol 36, No 3: 333–342. Crossley, Michael (2000). ‘Bridging Cultures and Traditions in the Reconceptualisation of Comparative and International Education’. Comparative Education, Vol 36, No 3, Special Number (23) (August 2000): 319–332. Crossley, Michael & Keith Holmes (2001). ‘Challenges for Educational Research: International Development, Partnerships and Capacity Building in Small States’. Oxford Review of Education, Vol 27, No 3: 395–409. Crossley, Michael & Peter Jarvis (2000). ‘Introduction: Continuity, Challenge and Change in Comparative and International Education’. Comparative Education, Vol 36, No 3, Special Number (23) (August 2000): 261–265. Crossley, Michael & G. Vulliamy (1997). Qualitative Educational Research in Developing Countries: Current perspectives. New York: Garland. Dale, Roger (2005). ‘Globalisation, Knowledge Economy and Comparative Education’. Comparative Education, Vol 41, No 2 (May 2005): 117–149. Denman, Brian D. (2017). ‘Post-Worldview? A Dialogic Meta-Narrative Analysis of North– South, South–South, and Southern Theory’. International Journal of Comparative Education and Development, Vol 19, No 2/3: 65–77. Eckstein, Max A. (1970). ‘On Teaching a “Scientific” Comparative Education’. Comparative Education Review, Vol 14, No 3 (October 1970): 279–282. Eckstein, Max A. (1975). ‘Comparative Education: The State of the Field’. Review of Research in Education, Vol 3: 77–84. Eckstein, Max A. (1977). ‘Comparative Study of Educational Achievement’. Comparative Education Review, Vol 21, No 2/3 (June– October 1977): 345–357.

CRITICAL CHALLENGES IN APPROACHES

Eckstein, Max A. (1983). ‘The Comparative Mind’. Comparative Education Review, Vol 27, No 3 (October 1983): 311–322. Eckstein, Max A. & Harold J. Noah (1985). ‘Dependency Theory in Comparative Education: The New Simplicitude’. Prospects, Vol XV, No 2: 213–225. Eckstein, Max A. & Harold J. Noah (1992). Examinations: Comparative and international studies. Vol 12. Oxford: Pergamon Press. Edding, Friedrich (1965). ‘The Use of Economics in Comparing Educational Systems’. International Review of Education, Vol 11, No 4, General Education in a Changing World: 453–465. Edwards, Reginald (1970). ‘The Dimensions of Comparison, and of Comparative Education’. Comparative Education Review, Vol 14, No 3 (October 1970): 81–92. Edwards, Reginald (1973). ‘Between the Micrometer and the Dividing Rod: Methodologies in Comparative Education. Relevant Methods in Comparative Education: A Report of a Meeting of International Experts’. In Reginald Edwards, Brian Holmes, & John van de Graaff (Eds.), International Studies in Education, No 33: 81–92. Hamburg: UNESCO Institute for Education. Edwards, Reginald, Brian Holmes, & John van de Graaff (Eds.) (1973). ‘Relevant Methods in Comparative Education: A Report of a Meeting of International Experts’. International Studies in Education, No 33. Hamburg: UNESCO Institute for Education. Epstein, Erwin H. (1994a). ‘Editorial’. Comparative Education Review, Vol 35, No 3 (August 1991): 401–405. Epstein, Erwin H. (1994b). ‘Comparative and International Education: Overview and Historical Development’. In T. Husén & T. Postlethwaite (Eds.), The International Encyclopedia of Education (Vol 2, pp. 918–923). Oxford: Elsevier Science. Epstein, Irving (1995). ‘Comparative Education in North America: The Search for Other through the Escape from Self?’ Compare, Vol 25, No 1: 5–16. Fairbrother, Gregory P. (2005). ‘Comparison to What End? Maximizing the Potential of Comparative Education Research’. Comparative Education, Vol 41, No 1: 5–24.

45

Farrell, Joseph P. (1970). ‘Some New Analytic Techniques for Comparative Educators: A Review’. Comparative Education Review, Vol 14, No 3 (October 1970): 269–278. Farrell, Joseph P. (1979). ‘The Necessity of Comparisons in the Study of Education: The Salience of Science and the Problem of Comparability’. Comparative Education Review, Vol 23, No 1 (February 1979): 3–16. Fernig, L. (1959). ‘The Global Approach to Comparative Education’. International Review of Education, Vol 5, No 3: 343–355. Fletcher, Laadan (1974). ‘Comparative Education: A Question of Identity’. Comparative Education Review, Vol 18, No 3 (October 1974): 348–353. Foshay, Arthur W. (1963). ‘The Use of Empirical Methods in Comparative Education: A Pilot Study to Extend the Scope’. International Review of Education, Vol 9, No 3: 257–268. Gezi, Kalil I. (Ed.) (1971). Education in Comparative and International Perspectives. New York: Holt, Rinehard and Winston. Gibbons, Michael, et al. (1994). The New Production of Knowledge: The Dynamics of Science and Research in Contemporary Societies. London: Sage. Grant, Nigel (2000). ‘Tasks for Comparative Education in the New Millennium’. Comparative Education, Vol 36, No 3: 307–317. Groux, Dominique (2013). ‘Comparative Education: Inventory and Perspectives from an “AFDECE” Point of View’. International Perspectives on Education and Society, Vol 2: 57–64. Hackett, Peter (1988). ‘Aesthetics as a Dimension for Comparative Study’. Comparative Education, Vol 32, No 4 (November 1988): 389–399. Hall, W. D. (1967). ‘Comparative Education: Explorations’. Comparative Education, Vol 3, No 3 (June 1967): 189–193. Hans, Nicholas (1959). ‘The Historical Approach to Comparative Education’. International Review of Education, Vol 5, No 3: 299–309. Hawkins, John N. & Val D. Rust (2001). ‘Shifting Perspectives on Comparative Research: A View from the USA’. Comparative Education, Vol 37, No 4: 501–506. Henry, Michael M. (1973). ‘Methodology in Comparative Education: An Annotated

46

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Bibliography’. Comparative Education Review, Vol 17, No. 2 (June 1973): 231–244. Heyneman, Stephen (1993). ‘Educational Quality and the Crisis of Education Research’. International Review of Education, Vol 39, No 6 Education, Democracy, and Development (November 1993): 511–517. Hilker, Franz (1964). ‘What Can the Comparative Method Contribute to Education?’ Comparative Education Review, Vol 7, No. 3 (February 1964): 223–225. Hochleitner, R. Díez (1959). ‘Utilizacion de la Educacion Comparada en el Planeamiento Integral de la Educacion’. International Review of Education, Vol 5, No 3 Comparative Education (1959): 356–366. Hoffman, Diane M. (1999). ‘Culture and Comparative Education: Toward a Decentering and Recentering the Discourse’. Comparative Education Review, Vol 43, No 4 (November 1999): 464–488. Holmes, Brian (1972). ‘Comparative Education as a Scientific Study’. International Review of Education, Vol 20, No 2 (June 1972): 205–219. Holmes, Brian (1977). ‘The Positivist Debate in Comparative Education – An Anglo-Saxon Perspective’. Comparative Education, Vol 13, No 2 (June 1977): 115–132. Holmes, Brian (1981). Comparative Education: Some Considerations of Method. London: Allen & Unwin. Holmes, Brian (1984). ‘Paradigm Shifts in Comparative Education’. Comparative Education Review, Vol 28, No 4 (November 1984): 584–604. Holmes, Brian (1985). ‘Trends in Comparative Education’. Prospects, Vol 15, No 3 (June 1972): 324–346. Hopkins, Richard L. (1973). ‘Prescriptions for Cultural Revolutions: A Reassessment of the Limits of Comparative Education Research’. Comparative Education Review, Vol 17, No. 3 (October 1973): 299–301. Husén, Torsten (1979). ‘An International Research Venture in Retrospect: The IEA surveys’. Comparative Education Review, Vol 23, No 3: 371–385. Husén, Torsten (1983). ‘The International Context of Educational Research [1]’. Oxford Review of Education, Vol 9, No 1: 21–29.

Husén, Torsten (1996). ‘Lessons from the IEA Studies’. International Journal of Educational Research, Vol 25, No 3: 207–218. Jones, Phillip E. (1971). Comparative Education: Purpose and Method. Brisbane: University of Queensland Press. Kazamias, Andreas M. (1961). ‘Some Old and New Approaches to Methodology in Comparative Education’. Comparative Education Review, Vol 5, No 2: 90–96. Kazamias, Andreas M. (1964). ‘Editorial and Correspondence’. Comparative Education Review, Vol 7, No 3: 217–222. Kazamias, Andreas M. (2001). Re-inventing the Historical in Comparative Education: reflections on a protean episteme by a contemporary player. Comparative Education, 374, 439–449. http://doi.org/10.1080/ 03050060120091247 Kelly, Gail P. (1987). ‘Comparative Education and the Problem of Change: An Agenda for the 1980s’. Comparative Education Review, Vol 31, No 4: 477–489. Kelly, Gail P. (1992). ‘Debates and Trends in Comparative Education’. In Robert F. Arnove, Philip G. Altbach, & Gail P. Kelly (Eds.), Emergent Issues in Education: Comparative Perspectives (pp. 13–22). New York: State University of New York Press. Kelly, Gail P. & Philip G. Altbach (1981). ‘Comparative Education: A Field in Transition’. In P. G. Altbach, G. P. Kelly, & D. H. Kelly (Eds.), International Bibliography of Comparative Education. New York: Praeger. Kienitz, Werner (1971). ‘On the Marxist Approach to Comparative Education in the German Democratic Republic’. Comparative Education, Vol 7, No 1: 21–31. King, Edmund (1965). ‘The Purpose of Comparative Education’. Comparative Education, Vol 1, No 3: 147–159. King, Edmund (1975). ‘Analytical Frameworks in Comparative Studies of Education’. Comparative Education, Vol 11, No 1 (March 1975): 85–103. King, Edmund (1977). ‘Comparative Studies: An Evolving Commitment, a Fresh Realism’. Comparative Education, Vol 13, No 2: 101–108. King, Edmund (1990). ‘Observations from Outside and Decisions Inside’. Comparative Education Review, Vol 34, No 3 (August 1990): 392–395.

CRITICAL CHALLENGES IN APPROACHES

King, Edmund (2000). ‘A Century of Evolution in Comparative Studies’. Comparative Education, Vol 36, No 3: 267–277. Koehl, Robert (1977). ‘The Comparative Study of Education: Prescription and Practice’. Comparative Education Review, Vol 21, No 2/3: 177–194. Kuhn, Thomas S. (1970). ‘Logic of Discovery or Psychology of Research’. In Criticism and the Growth of Knowledge (pp. 1–23). Cambridge: Cambridge University Press. Larsen, Marianne, Suzanne Majhanovich, & Vandra Masemann (2007). ‘Comparative Education in Canadian Universities’. Comparative and International Education/Éducation Comparée et Internationale, Vol 36, No 3: 15–31. Laska, J. A. (1973). ‘The Future of Comparative Education: Three Basic Questions’. Comparative Education Review, Vol 17, No 3: 295–298. Lawson, Robert F. (1975). ‘Free-Form Comparative Education’. Comparative Education Review, Vol 19, No 3: 345–353. Marginson, S., & Mollis, M. (2001). ‘The Door Opens and the Tiger Leaps: Theories and Reflexivities of Comparative Education for a Global Millennium’. Comparative Education Review, Vol 45, No 4: 581–615. Mårtensson, Pär, Uno Fors, Sven-Bertil Wallin, Udo Zander, & Gunnar H. Nilsson (2016). ‘Evaluating Research: A Multidisciplinary Approach to Assessing Research Practice and Quality’. Research Policy, Vol 45: 593–603. Masemann, Vandra L. (1982). ‘Critical Ethnography in the Study of Comparative Education’. Comparative Education Review, Vol 26, No 1: 1–15. Masemann, Vandra L. (1990). ‘Ways of Knowing: Implications for Comparative Education’. Comparative Education Review, Vol 34, No 4: 465–473. Mason, Mark (2007). ‘There’s No Such Thing as a (Substantively Distinct) Field of Comparative Education’. Paper presented at the World Congress of Comparative Education Societies (WCCES), (September 2007). Sarajevo, Bosnia & Herzegovina. McClellan, James E. (1957). ‘An Educational Philosopher Looks at Comparative Education’. Comparative Education Review, Vol 1, No 1: 8–9.

47

Mehta, Sonia, & Peter Ninnes (2003). ‘Postmodernism debates and comparative education: A critical discourse analysis’. Comparative Education Review, Vol 47, No 2: 238–255. Mitri, Tarek (1997). ‘Interreligious and Intercultural Dialogue in the Mediterranean Area during a Period of Globalization’. Prospects, Vol 27, No 1: 123–127. Mitter, Wolfgang (1997). ‘Challenges to Comparative Education: Between Retrospect and Expectation’. International Review of Education, Vol 43, No 5: 401–412. Moehlman, Arthur H. (1963). Comparative Educational Systems. Washington, DC: Center for Applied Research in Education. Mollis, Marcela & Simon Marginson (1999). ‘Comparing National Education Systems in the Global Era’. Australian Universities’ Review, Vol 42 (1999–2000): 53–63. Moodie, Gavin (2017). ‘Unintended Consequences: The Use of Metrics in Higher Education’. Academic Matters, Winter 2017 issue. Retrieved 3 January 2018, https:// academicmatters.ca/2017/11/unintendedconsequences-the-use-of-metrics-in-highereducation/ Nash, Paul (1977). ‘Introduction: The State of the Art. Twenty Years of Comparative Education’. Comparative Education Review, Vol 21, No 2/3 (June/October 1977): 151–152. Ninnes, Peter & Greg Burnett (2003). ‘Comparative Education Research: Poststructuralist Possibilities [1]’. Comparative Education, Vol 39, No 3: 279–297. Ninnes, Peter and Sonja Mehta (Eds.) (2004). Postfoundational Ideas and Applications for Critical Times. New York: RoutledgeFalmer. Noah, Harold J. (1974). ‘Fast-Fish and LooseFish in Comparative Education’. Comparative Education Review, Vol 18, No 3: 341–347. Ortiz de Montellano, Bernard R. (2001). ‘Multicultural Science: Who Benefits?’. Science Education, Vol 85, No 1: 77–79. Parkyn, G. W. (1977). ‘Comparative Education Research and Development Education’. Comparative Education, Vol 13, No 2: 87–93. Paulston, Rolland G. (1976). ‘Ethnicity and Educational Change: A Priority for Comparative Education’. Comparative Education Review, Vol 20, No 3: 269–277. Paulston, Rolland G. (1990a). ‘Toward a Reflective Comparative Education?’ Comparative

48

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Education Review, Vol 34, No 2 (May 1990): 248–255. Paulston, Rolland G. (1990b). ‘From Paradigm Wars to Disputatious Community’. Comparative Education Review, Vol 34, No 3 (August 1990): 395–400. Paulston, Rolland G. (1994). ‘Comparative and International Education: Paradigms and Theories’. In T. Husén & T. Postlethwaite (Eds.), The International Encyclopedia of Education (Vol 2, pp. 923–932). Oxford: Elsevier Science. Paulston, Rolland G. (1997). ‘Mapping Visual Culture in Comparative Education Discourse’. Compare: A Journal of Comparative and International Education, Vol 27, No 2: 117–152. Paulston, Rolland G. (2000). ‘Imagining Comparative Education: Past, Present, Future’. Compare: A Journal of Comparative and International Education, Vol 30, No 3: 353–367. Pennar, Jaan, Ivan Ivanovich Bakalo, & George ZF Bereday (1971). Modernization and diversity in Soviet education: With special reference to nationality groups. Praeger Publishers. Petty, Michael F. (1973). ‘Comment on Barber’s’ Science, Salience, and Comparative Education’. Comparative Education Review, Vol 17, No 3: 389–392. Popper, Karl (1990). A World of Propensities. Bristol: Thoemmes. Postlethwaite, T. Neville (1987). ‘Comparative Educational Achievement Research: Can It Be Improved?’ Comparative Education Review, Vol 31, No 1: 150–158. Psacharopoulos, George (1990). ‘Comparative Education: From Theory to Practice, or Are You A:\neo.* or B:\*. ist?’ Comparative Education Review, Vol 34, No 3: 369–380. Radnofsky, Mary L. (1996). ‘Qualitative Models: Visually Representing Complex Data in an Image/Text Balance’. Qualitative Inquiry, Vol 2, No 4: 385–410. Raivola, Reijo (1985). ‘What is Comparison? Methodological and Philosophical Considerations’. Comparative Education Review, Vol 29, No 3: 362–374. Ross, Heidi (1992). ‘The Tunnel at the End of the Light: Research and Teaching on Gender and Education’. Comparative Education Review, Vol 36, No 3: 343–354.

Rossello, ¯ Pedro (1963). ‘Concerning the Structure of Comparative Education’. Comparative Education Review, Vol 7, No 2: 103–107. Rust, Val D., A. Soumaré, O. Pescador, & M. Shibuya (1999). ‘Research Strategies in Comparative Education’. Comparative Education Review, Vol 43, No 1: 86–109. Samonte, Quirico S. (1963). ‘Some Problems of Comparison and the Development of Theoretical Models in Education’. Comparative Education Review, Vol 6, No 3: 177–181. Schlager, Edella (1999). ‘A Comparison of Frameworks, Theories, and Models of Policy Processes’. Theories of the Policy Process, Vol 1: 233–260. Schriewer, Jürgen (1988). ‘The Method of Comparison and the Need for Externalization: Methodological and Sociological Conceptions’. In J. Schriewer and B. Holmes (Eds.), Theories and Methods in Comparative Education (pp. 25–83). Frankfurt am Main: Peter Lang. Sellar, Sam (2015). ‘Data Infrastructure: A Review of Expanding Accountability Systems and Large-Scale Assessments in Education’. Discourse: Studies in the Cultural Politics of Education, Vol 36, No 5: 765–777. Siffin, W. J. (1969). ‘The Social Sciences, Comparative Education, the Future, and All That’. Comparative Education Review, Vol 13, No 3: 252–259. Steiner-Khamsi, Gita (2006). ‘The Economics of Policy Borrowing and Lending: A Study of Late Adopters’. Oxford Review of Education, Vol 32, No 5: 665–678. Stromquist, Nelly P. (1995). ‘Romancing the State: Gender and Power in Education’. Comparative Education Review, Vol 39, No 4: 423–454. Taba, Hilda (1963). ‘Cultural Orientation in Comparative Education’. Comparative Education Review, Vol 6, No 3: 171–176. Theisen, Gary & Don Adams (1990). ‘Comparative Education Research’. In R. Murray Thomas (Ed.), International Comparative Education: Practices, Issues and Prospects (pp. 277–303). Oxford: Pergamon. Thomas, R. Murray (1990). ‘The Nature of Comparative Education’. In R. Murray Thomas (Ed.), International Comparative Education: Practices, Issues and Prospects (pp. 1–21). Oxford: Pergamon.

CRITICAL CHALLENGES IN APPROACHES

von Wright, Georg Henrik (1971). Explanation and Understanding. London: Routledge and Kegan Paul. Watson, Keith (1998). ‘Memories, Models and Mapping: The Impact of Geopolitical Changes on Comparative Studies in Education’. Compare: A Journal of Comparative and International Education, Vol 28, No 1: 5–31. Welch, Anthony R. (1985). ‘The Functionalist Tradition and Comparative Education’. Comparative Education, Vol 21, No 1: 5–19. Welch, Anthony R. (1998). ‘The Cult of Efficiency In Education: Comparative Reflections on the Reality and the Rhetoric’. Comparative Education, Vol 34, No 2: 157–175. Welch, Anthony R. (2003). ‘Technocracy, Uncertainty, and Ethics: Comparative Education in an Era of Postmodernity and

49

Globalization’. In Robert F. Arnove & Carlos Alberto Torres (Eds.), Comparative Education: The Dialectic of the Global and Local (pp. 24–51). Lanham, MD: Rowman & Littlefield. Wilson, D. N. (1994). Comparative and international education: fraternal or Siamese twins? A preliminary genealogy of our own twin fields. Comparative Education Review, Vol 38, No. 4: 449–486. Wiseman, Alexander W. & Cheryl Matherly (2009). ‘The Professionalization of Comparative and International Education: Promises and Problems’. Research in Comparative and International Education, Vol 4, No 4: 334–355. Woody, Thomas (1955). ‘Futures in Comparative Education’. In William W. Brickman (Ed.), The Teaching of Comparative Education: Proceedings of the Second Annual Conference on Comparative Education (pp. 88–96). New York: School of Education, New York University.

3 Enduring Issues in Education: Comparative Perspectives Wing On Lee

INTRODUCTION UNESCO has been working on ‘Education for All’ (EFA) for over half a decade, and this agenda includes tackling many fundamental educational issues, such as literacy and numeracy, human rights, equity in terms of gender, race and socio-economically disadvantaged groups, the assurance of quality of education on the basis of universal basic education, etc. It is interesting to note that when UNESCO announced the post-2015 agenda, setting out the achievement targets for 2030, similar issues continued to be raised. Under the general framework of Education for Sustainable Development, according to UNESCO’s (2014) Position Paper on Education Post-2015 (‘Position Paper’ hereafter), the new agenda is quite similar to that before 2015, namely: equitable access to quality education for all, quality education and learning at all levels, equity with particular attention for marginalized groups and gender equality; but the new

additional goals are knowledge and skills for sustainable development, global citizenship and equity in the world of work. This shows that despite a new agenda on sustainable development, the old agenda remains to be achieved in the next 15 years, and it is still an unfinished agenda despite the tremendous efforts that have been made collaboratively at UNESCO and individually by different countries. The Post-2015 Position Paper began with this acknowledgement: …the EFA and Millennium Development Goal (MDG) education agendas will remain unfinished by 2015 and the continued relevance and importance of the EFA agenda are recognized. (UNESCO, 2014, para. 1)

That is, the issues of access, equity and quality in education continue to be the ‘enduring issues’ in education. In more concrete terms, the Position Paper acknowledged that since 2000, 57 million children and 69 million children of lower secondary school age are

Enduring Issues in Education: Comparative Perspectives

still out of school, of which girls remain the majority. Worldwide, some 774 million adults (aged 15 and over) are reported to be unable to read and write, of which two-thirds are women. Low literacy skills are also a concern in many high-income countries, e.g. the European Commission notes that in Europe about 20% of adults lack the literacy skills they need to function fully in a modern society (which refers to functional illiteracy). In addition, while lifelong learning and informal learning are becoming significantly emphasized worldwide, it has been found that, economically and socially, the gap between the ‘haves’ and ‘have-nots’ and/or the ‘have-more’ and ‘have-less’ are not closing, and indeed are getting bigger. The impact of cultural capital will become more rampant when informal education becomes more predominant for people’s future – those who ‘have’ have more resources and better networking to access informal learning that will differentiate them even more vividly from the ‘have-nots’ and ‘have-less’. The purpose of this chapter is to delineate in what way the promises of education are being unfulfilled, and to attempt to explain the driving forces that may prevent education from bringing about those EFA hopes for human society. The chapter hopes to enhance awareness of the undercurrents in society that may make use of the new educational agenda to perpetuate old and existing inequalities, making education reforms just new wine in the old bottle. This chapter will draw upon global and country data to demonstrate that these issues are indeed worldwide issues, although some countries may do better than the others in tackling those issues.

EDUCATION FOR ALL – AN UNFINISHED AGENDA ‘Education for All’ (EFA) has become a significant global agenda for educational development since the 1990 World Conference on

51

Education for All in Jomtien, Thailand (WCEFA, 1990). The conference set the EFA goals to be achieved by 2000. Ten years after such international goals were set, the World Education Forum in Dakar was held in 2000 to review the implementation of the EFA goals. The Forum reaffirmed and extended the Jomtien commitment towards realizing the EFA goals. While acknowledging general improvements towards these goals across the world, the Forum also acknowledged that the goals of universal primary education had not been attained. This led to the Darkar Framework for Action in 2000 (UNESCO, 2000), which set up six EFA goals to be achieved by 2015, namely: 1 Expand early childhood care and education 2 Provide free and compulsory primary education for all 3 Promote learning and life skills for young people and adults 4 Increase adult literacy by 50 percent 5 Achieve gender parity by 2005, gender equality by 2015 6 Improve the quality of education. (p. 8)

The EFA goals contributed to the further development of eight Millennium Development Goals (MDGs), following the Millennium Summit and the United Nations Millennium Declaration in 2000 (UNESCO, 2010a). EFA and gender equity in primary and secondary education were set as Millennium Development Goals (MDGs), to be achieved by 2015: • Ensuring that by 2015, all children, particularly girls, children in difficult circumstances and from ethnic minorities, have access to free primary education. • Ensuring that the learning needs of all young people and adults are met through equitable access to appropriate learning and life-skills programmes. • Achieving 50% improvement in levels of adult literacy by 2015, especially for women, and equitable access to basic and continuing education for all adults. • Eliminating gender disparities in primary and secondary education by 2005, and achieving

52

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

gender equality in education by 2015, with a focus on ensuring girls’ full and equal access to and achievement in basic education of good quality. Improving all aspects of the quality of education and ensuring excellence of all so that recognized and measurable learning outcomes are achieved by all, especially in literacy, numeracy and essential life skills. (UNESCO & UNICEF, 2013, p. 6)

It is easily identifiable that many of the EFA goals and MDGs are interrelated goals, aiming to achieve universal primary education, by means of eradicating poverty, in‑ equity, and by improving health conditions, sustainable environments and quality of education. All these are simple and clear-cut goals that we, now living in the 21st century, should expect to be taken for granted as the basic conditions for human rights. As noted by the report entitled Making Education a Priority in the Post-2015 Development Agenda (‘Priority Report’ hereafter): There is a consensus on the necessity for goals like the education MDGs and EFA, and the role these have played in shaping and advancing the education agenda. There is wide recognition that these goals have provided strategic direction to educational planning and budgeting; are an important measure to monitor progress; and have encouraged focused and sustained support from development partners. (UNESCO & UNICEF, 2013, p. 7)

In 2003, a World Bank Report entitled Achieving Universal Primary Education by 2015: A Chance for Every Child was published (Bruns, Mingat & Rakotomalala, 2003). The report reviewed the progress of educational provisions across the world in the 1990s. It acknowledged the overall improvements in educational access over the 1990s, which provided some encouraging evidence of political will to improve education (Bruns et al., 2003), but it also noted that the world remained far from attaining the Education for All (EFA) goals (i.e. universal primary school completion), and that such goals would not be reached without a significant acceleration of current progress.

With the EFA goals having gained worldwide recognition and support notwithstanding, achieving education for all continues to be, and remains, a challenge to achieve globally. Despite its legitimacy and necessity as basic human conditions to be provided by any society in order for someone to live with dignity, these goals unfortunately have not yet been satisfactorily achieved, as deplored in the Priority Report (UNESCO & UNICEF, 2013, p. 7): [The] millennium educational goals have miserably failed in terms of effective, equitable and meaningful education; there are different quality educational institutions for poor and the rich, women are yet socialized to remain inferior to men, patriarchy yet continues and girls are [a] liability, married off in childhood by their parents, there is widespread ignorance about environmental change and global warming. There are yet over populated countries and the problem of unemployment persists.

TRACKING AND REVIEWING THE EFA AGENDA Progress and Achievements The World Declaration on Education for All, published in 1990 (WCEFA, 1990), seemed to have provided optimistic and celebrative hopes for us to expect a brave new world of education for all to emerge, especially with so many countries agreeing upon the agenda, and to tremendous committed actions and efforts to spend on this agenda both nationally and internationally. To be fair, lots of progress has been made towards these goals, such as: • The number of countries achieving primary net enrolment ratio of more than 97% grew from 37 to 55 between 1990 and 2010; • In 2010, the global primary completion rate reached 90%, compared with 81% in 1999; • Girls and boys have similar chances of completing primary education in all regions except subSaharan Africa and Western Asia;

Enduring Issues in Education: Comparative Perspectives

• Globally, the youth literacy rate reached 90% in 2010, an increase of 6% since 1990; • Gender gaps in youth literacy rates are also narrowing; globally, there were 95 literate young women for every 100 young men in 2010, compared with 90 women in 1990. (UNESCO & UNICEF, 2013, p. 44)

This picture is quite different from the despicable educational situation at the time when the 1990 EFA Declaration was announced: • More than 100 million children, including at least 60 million girls, have no access to primary schooling; • More than 960 million adults, two-thirds of whom are women, are illiterate, and functional illiteracy is a significant problem in all countries, industrialized and developing; • More than one-third of the world’s adults have no access to the printed knowledge, new skills and technologies that could improve the quality of their lives and help them shape, and adapt to, social and cultural change; and • More than 100 million children and countless adults fail to complete basic education programs; millions more satisfy the attendance requirements but do not acquire essential knowledge and skills. (WCEFA, 1990, p. 1)

At the same time, the world is still facing problems such as notable debt burdens, economic stagnation and decline, widening economic disparities among and within nations, war, occupation, civil strife, violent crime, the preventable deaths of millions of children and widespread environmental degradation. These problems have constrained efforts to meet basic learning needs, while the lack of basic education among a significant proportion of the population prevents societies from addressing such problems with strength and purpose (Lee, 2004, pp. 2–3).

Concerns The year 2015 unveiled many of the underlying problems that have hindered the attainment of these human rights goals, which were revealed when UNESCO conducted

53

reviews on the progression of existing goal attainments in order to plan for new goals to be achieved in the next 15 years ahead (i.e. by 2030). For example, the UNESCO and United Nations Children’s Fund (UNICEF) jointly released a review report entitled Making Education a Priority in the Post2015 Development Agenda: Report of the Global Thematic Consultation on Education in the Post-2015 Development Agenda (UNESCO & UNIFCF, 2013). In addition, the Global Education Monitoring Reports were simultaneously published to review and comment on the levels of achievements of the EFA agenda, which was independently commissioned by a group of governments, multilateral agencies and private foundations, and facilitated and supported by UNESCO (e.g. see UNESCO, 2017). The series of reviews led UNESCO to admit in its Position Paper on Education Post-2015 (UNESCO, 2014) that the EFA goals were still an unfinished agenda: Efforts towards achieving Education for All (EFA) since the year 2000 have yielded unprecedented progress. However, the EFA and Millennium Development Goal (MDG) education agendas will remain unfinished by 2015 and the continued relevance and importance of the EFA agenda are recognized. There is a strong need for a new and forward-looking education agenda that completes unfinished business while going beyond the current goals in terms of depth and scope. (para. 1)

Evaluation The Priority Report, Making Education a Priority in the Post-2015 Development Agenda (UNESCO & UNICEF, 2013, p. 5), begins with blunt statements on the achievement gaps in the EFA development agenda, pointing out that: • In regard to the EFA agenda, progress towards getting all children into school is too slow. • Children’s education prospects are still at risk, as among children who attend school, 25% drop out before completing primary.

54

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

• There is still an obvious gender gap in access to education, as 39 million of the 57 million out-ofschool children are girls. • 49% of these 57 million out-of-school children probably never set foot in a classroom. • Regionally speaking, there is noteworthy progress that South and West Asia reduced out-of-school children by two-thirds between 1999–2011. However, more than half of the out-of-school children are clustered in sub-Sahara Africa.

Referring to the specific EFA goals, the Priority Report made the analyses shown in Table 3.1. In sum, as demonstrated in Table 3.1, there is a clear regional disparity in attaining, or not attaining, the EFA goals, and it is apparent

that low- and lower-middle-income countries have difficulties in attaining the goals. In these countries, the under-five mortality rate remains high and progress in early childhood care and education has been slow. The population of adult literacy was only reduced moderately from 881 million to 775 million – and the global illiterate population of 775  ­million is a huge and scary figure to accept. Gender equity is still a concern, considering that 68 out of the 138 countries with available data have not yet achieved gender parity in primary education. Despite general improvements in educational access, the completion rate remains unclear. The figure of 250 million children not reaching Grade 4

Table 3.1  Progress evaluation of the attainment of the EFA goals EFA Goals

Progress

EFA1 Early childhood care and education

The MDG target for child mortality was unlikely to be met, as Under-5 mortality remains high, at 123 per 1,000 in sub-Saharan Africa and 88 per 1,000 in South and West Asia. The figures are also high at 29% for all children aged 5 or under and 40% in low-income countries. Gross enrolment ratios in pre-primary education are 49% in East Asia and the Pacific and 68% in Latin America and the Caribbean. Progress has been slowest in low-income countries, with low rates and little improvement seen in sub-Saharan Africa (from 12% to 17%) and the Arab States (from 16% to 19%). Of the countries with available data in 2010, the 25 countries with an ECCE index score between 0.80 and 0.95 are mostly middle-income countries in Central Asia, Central and Eastern Europe, and Latin America and the Caribbean. The remaining 42 countries, with an index score below 0.80, are mostly low- and lower-middle income countries, and the majority of them are located in subSaharan Africa. According to the current progress, the target for universal primary education will likely be missed – despite out-of-school children of primary school age was reduced from 108 million in 1999 to 61 million in 2010. More than half of this improvement takes place in sub-Saharan Africa. Despite a rapid reduction rate taking place in 1999, but then the rate started slowing in 2004, and progress has stalled since 2008. The MDG Report 2012 (United Nations, 2012) made the same observation that 2004 marked the beginning of a slow progress. The number of countries with a primary net enrolment ratio of more than 97% increased from 37 to 55 (out of 124) between 1999 and 2010. However, drop-out remains a problem in low-income countries, where an average 59% of children starting school reached the last grade in 2009. The problem is particularly acute for children who start late, and the drop-out rate is highest during the first few years of schooling.

EFA2 Universal primary education

(Continued)

Enduring Issues in Education: Comparative Perspectives

55

Table 3.1  (Continued) EFA Goals

Progress

EFA3 Youth and adult learning needs

Despite a global increase in the number of children enrolling in secondary school, the lower secondary gross enrolment ratio (GER) was just 52% in low-income countries in 2010. Although the number of out-of-school adolescents of lower secondary school age was reduced from 101 million in 1999 to 71 million in 2010, it has stagnated since 2007. Three out of four out-of-school adolescents live in South and West Asia and subSaharan Africa. Most countries were expected to miss EFA goal 4, some by a large margin. There were still 775 million adults who could not read or write in 2010, about twothirds of them women. Globally, the adult literacy rate has increased over the past two decades, from 76% in 1985–1994 to 84% in 2005–2010. Partly because the world’s population has grown, the number of illiterate adults has decreased modestly, from 881 million to 775 million. Convergence in enrolment between boys and girls has been one of the successes of the EFA since 2000. However, of the 138 countries with data available, 68 have not achieved gender parity in primary education, and girls are disadvantaged in 60 of them. The Arab States and sub-Saharan Africa have yet to achieve parity, while South and West Asia reached parity in 2010. The number of countries with severe gender disparities halved between 1999 and 2010, from 33 countries to 17. At the pre-primary level, gender parity has been achieved on average, although this reflects low enrolment rates for both boys and girls. At the secondary level, 97 countries have not reached gender parity; and in 43 of them, girls are disadvantaged. In sub-Saharan Africa, there are large regional disparities at the tertiary level, with 10 boys for every 6 girls studying at this level in sub-Saharan Africa, compared to 8 boys for every 10 girls in North America and Western Europe. The 2012 MDG Report found similar results. Gender parity at the tertiary level hides both large regional disparities and gender differences in areas of study. The 2010 MDG Report noted that women were over represented in the humanities and social sciences and significantly under-represented in science, technology and, especially, engineering. Completion rates also tend to be lower among women than men. Of around 650 million children of primary school age, 250 million either do not reach Grade 4 or, if they do, fail to attain minimum learning standards. Pupil–teacher ratios at the primary level improved globally between 1999 and 2010, especially in East Asia and Latin America. But they worsened in sub-Saharan Africa and South and West Asia. Of the 100 countries with data at the primary level, in 33 less than 75% of teachers were trained to a national standard. Even those who have received training are not always well prepared to teach in early grades.

EFA4 Improving adult literacy

EFA 5 Assessing gender parity and equality in education

EFA 6 The quality of education

Source: Adapted from UNESCO & UNICEF, 2013, Appendix I, pp. 43–45

56

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

is a concern to be addressed. In addition, the finding that 33 countries have less than 75% of teachers trained to national standards is a pessimistic picture, making us wonder why some countries allowed the majority of undertrained teachers to teach in the classroom. The Global Education Monitoring Report 2017/18 (UNESCO, 2017) provided a similar picture with more updated data: • In 2010–2015, completion rates were 87% for primary, 69% for lower secondary and 45% for upper secondary education. • About 387 million children of primary school age, or 56%, did not reach the minimum proficiency level in reading. • Less than one in five countries guaranteed 12 years of free and compulsory education.

PERENNIAL OBSTACLES HINDERING PROGRESS IN EDUCATIONAL EQUITY Hindering Factors The Priority Report attributed the failure to achieve the EFA goals to two major factors. First was the realization that the EFA target was seen as more of a technical target than conceiving educational improvement as a holistic and integrated issue. Second, priorities were not given equal weighting in implementation, e.g. access to primary school was given a higher priority than the other aspects of education, and access is given priority at the expense of quality and equity (UNESCO & UNICEF, 2013). The Report continues to highlight some of the major shortcomings in the implementation process that have hindered the achievement of the various EFA goals, as follows: 1 A narrow vision of education access: Progress on early childhood care and education has been too slow. 2 Lack of a focus on quality. 3 Gender equality is not a reality. 4 Underinvestment in education.

It is worthwhile taking a closer look at these possible factors.

A narrow vision The six EFA goals were not treated evenly and, as aforementioned, in a holistic and integrated manner. It has been observed that the EFA Goal 2 (i.e. universal completion of primary education) was given an overwhelming priority among all the EFA goals. And this has in effect de-prioritized the other EFA goals, such as early childhood education care, secondary, tertiary and vocational education. The overall concern over primary enrolment has also led to the negligence of the completion rate, which does not lead to a clear drop in the illiterate population, only to a slightly reduced one, from 880 million to 775 million.

Lack of focus on quality Another side effect of the focused attention towards enrolment is, in addition to the negligence of the completion rate, to give equal attention towards quality learning outcomes. The fact that 250 million people are not reaching Grade 4 is a warning sign that expanded access to schooling in terms of primary enrolment only produced a misleading picture of improvement. In essence, without a guarantee of quality learning outcomes, the world is still inhabited by a large population without a level of education that is sufficient for their survival.

Gender inequality The Priority Report cited surveys showing that in 55 developing countries, girls are more likely to be out of school at a lower secondary age than boys, regardless of the wealth or location of the household. In addition, almost two-thirds of the 775 million illiterate adults are women. There are also hidden inequalities in tertiary education as women are over-represented in the humanities and social sciences and significantly under-represented in engineering, science and technology.

Enduring Issues in Education: Comparative Perspectives

Underinvestment in education The lack of political will to invest in education has been found to be a significant factor for the failure to achieve the EFA goals. In 2010, the EFA Global Monitoring Report estimated that an additional US$16 billion per year would be needed to provide basic education for all children, youth and adults by 2015 (UNESCO, 2010b). However, a more recent estimate found that the funding gap had increased to US$26 billion. This funding gap provides a clear explanation about why there was such a shortfall in attaining the EFA goals, and why some countries would allow the majority of their own teachers to teach without proper training. The setting of EFA goals is a major means of achieving educational equity, or alleviating educational inequity, by means of enhancing access to education, and thus the failure to attain the EFA goals shows that such educational equity has not yet been achieved. Whenever there is inequity, it is always the disadvantaged SES groups that suffer. As shown from the above review, it is apparent that it is those low- and lower-middle-income countries that have difficulties in providing equal access to education for all.

Cultural capital Education equity can still perpetuate even after access to education for all is achieved – when cultural capital functions to differentiate between those who can afford to pay for the kind of school activities that would bring about high performance. For example, poor kids cannot afford to participate in many cocurricular activities. Project learning and critical thinking are culture-and class-bound, and the academic language used in school is favourable to the language and argument approaches used in middle-class and educated families. The increased emphasis on that experiential learning through study abroad and exchange requires a lot of extra resources that cannot be afforded by children from poorer families. The growing emphasis on the

57

mastery of English as an international language adds difficulties for those children who do not have the resources to acquire another language in addition to their mother tongue. The growing importance of ICT in learning also significantly gives an advantage towards children from families that can provide extra resources to purchase both hardware and software. The recent education reforms in many countries that emphasize co-curricular activities and experiential learning will require additional resources and this may create difficulties for children coming from deprived families. This is a kind of ‘new poverty’ that perpetuates deprivation for the disadvantaged in a different way. The above review suggests that educational inequity remains the crux of problems in universalizing primary education. The gap between the rich and the poor continues to perpetuate the education divides in society (Lee, 2007; Majhanovich, 2014).

TURNING POINT: ENHANCING QUALITY IN EDUCATION TO ACHIEVE EDUCATIONAL EQUITY The above discussion shows some perennial problems in educational investment to achieve universal primary education. Oftentimes, educational resources are limited, and policymakers have to struggle between investing in the few achievers in the education system or spreading the resources thinly to expand access to education for everyone (but without sufficient money and human resources to track attendance and guarantee the completion rate). However, this seems to be a vicious circular line of thinking that may require finding solutions from another perspective. Given that resources are limited, how should resources be distributed among the populace such that it would benefit the country in terms of its overall competitiveness? Policymakers often face such a dilemma. However, this dilemma, from the beginning, assumes a zero-sum

58

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

game both in terms of the distribution of talents in a society and the distribution of resources. More importantly, the dilemma was generated by making the two concepts mutually exclusive and dichotomized. The OECD Report (2012), in its analysis of the impact of achievement gaps, reported that economies are incurring significant recurring economic loss for the citizens who are unable to perform to their optimum capacity. Utilizing economic modelling to correlate cognitive skills to economic growth reveals (with certain caveats) that minor improvements in the skills of a nation’s workforce can have a major impact on that country’s future progress. In recent publications, educators and policymakers appear to have taken on a more realistic approach, recognizing that within a limited timeframe and resources, they have to prioritize education reform to make it effective. Although it is no secret that educators work with limited resources, it appears that the high-performing education systems have been more deliberate to report how they have calculated their cost and found ways to maximize their resources more efficiently by prioritizing teacher quality and development, among other fundamental and important agendas in education. It has also been suggested that the glass-ceiling and limitations of current education system and frameworks can only be broken with highquality educators and policymakers who have the willingness and ability to drive change for the betterment of their educational systems and economies (OECD, 2012; Tucker, 2011). ‘Equity and quality’ is an emerging terminology or concept that focuses on how investment in the quality of education may bring about a general elevation of equity. Schleicher (2014) points out that the PISA findings show that equity and quality in education are not mutually exclusive concepts. Investing in high-quality childhood education and initial schooling, particularly for children from socio-economically disadvantaged backgrounds, is an efficient strategy to ensure that children start strong

in their education careers so that first skills beget future skills. The OECD published a report in 2012 entitled Equity and Quality in Education: Supporting Disadvantaged Students and Schools (OECD, 2012). The Report firmly states: The evidence is conclusive: equity in education pays off. The highest performing education systems [in PISA studies] across OECD countries are those that ‘combine high quality and equity’. In such education systems, the vast majority of students can attain high level skills and knowledge that depend on their ability and drive, more than on their socio-economic background. (p. 14)

The PISA 2009 Report also states that: PISA suggests that maximizing overall performance and securing similar levels of performance among students from different socio-economic backgrounds can be achieved simultaneously. These results suggest that quality and equity need not be considered as competing policy objectives. … These [high-performing] countries display high student performance and, at the same time, a below-average impact of economic, social and cultural status on student performance. (as cited in OECD, 2010, p. 57)

Schleicher (2014) points out that Shanghai, South Korea, Canada and Japan are the education systems that have achieved the highest scores in reading and, at the same time, have been least affected by students’ home background, as compared to the OECD average. It is interesting to note that, as shown in Table 3.2, the high-performing education systems in the PISA 2009 have Gini indexes that show considerable disparities in terms of the gap between the rich and the poor. However, despite a notable gap between the rich and the poor, the average high scores in PISA 2009 illustrate that the average quality of education in these countries are among the highest in the world. On this issue, the Human Development Report 2013 (UNDP, 2013) interprets that these education systems have provided a very high quality of education that benefits the whole population regardless of the socio-economic conditions of the students:

Enduring Issues in Education: Comparative Perspectives

59

Table 3.2  Gini index of high-performing education systems in PISA, 2007 HDI rank (2007)

Country

Richest 10% to poorest 10%

Gini index

4 12 23 24 26 92

Canada Finland Singapore Hong Kong SAR South Korea China

9.4 5.6 17.7 17.8 7.8 13.2

32.6 26.9 42.5 43.4 31.6 41.5

Note: The Gini index lies between 0 and 100. A value of 0 represents absolute equality and 100 absolute inequality

In the most recent PISA, conducted in 63 countries and territories in 2009, many countries showed impressive strides in quality of learning outcomes. Students from Shanghai, China, outperformed students from 62 countries in Reading, Mathematics and Science skills. They were followed by students from the Republic of Korea, Finland and Hong Kong (SAR) in reading; Singapore, Hong Kong, China (SAR) and the Republic of Korea in mathematics; and Finland, Hong Kong, China (SAR) and Singapore in science. … Investments by some countries in education quality will likely bring future payoffs in a more knowledge-driven globalized world. (UNDP, 2013, p. 33)

In the IEA International Civic and Citizenship Study (ICCS) 2009 Report (Schulz, Ainley, Fraillon, Kerr, & Losito, 2009), it was reported that Hong Kong’s achievements in student performance was not influenced by: • On average, across ICCS countries, parental occupational status accounted for 10% of the variance in scores on the civic knowledge scale. However, there were considerable differences in this percentage across countries. For Hong Kong SAR, it ranged from 0.5% to 20%. (p. 81) • Although the size of the difference between students with or without an immigrant background varied across countries, in every system except Hong Kong SAR, the pattern was for students without such a background to score higher than students from immigrant families. (p. 76)

Analyzing the student performance in the various PISA studies in relation to their Economic, Social and Cultural Status (ESCS), Ho’s (2013) conclusion about Hong

Kong is strikingly similar to the findings of the ICCS 2009: Hong Kong being in the top three on the graph with gentle gradient indicates that Hong Kong’s 15-year-olds perform well in reading, mathematics and science, and the impact of ESCS is modest. We can argue tentatively that Hong Kong is providing education opportunity with relatively high quality and high equity regardless of their ESCS. (p. 34)

On the issue of equity and quality in education, the Grattan Report (Jensen, 2012) makes the following observation: • High-performing education systems in East Asia have successfully increased performance while maintaining, and often increasing, equity. Compared to Australia and most OECD countries, a child from a poorer background in these systems is less likely to drop out or fall behind. • There is less of a gap between high- and lowperforming students in South Korea, Shanghai and Hong Kong compared to many other OECD education systems. • Low-performing students are also better prepared for their future. The bottom 10% of math students in Shanghai perform at a level that is 21 months ahead of the bottom 10% of students in Australia. This gap rises to 24 months in the UK, 25 months across the average of the OECD, and 28 months in the USA. • Increasing performance and equity has been achieved with high and increasing participation. For example, 30 years ago about 40% of young South Koreans (aged 25–34) finished secondary education. Now the figure is 98%, 10 percentage points above the OECD average. (p. 10)

60

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

What’s more, the Grattan Report argues that the high quality in student learning outcomes overall shows the equity of the system. The high-performing education systems show an interesting dimension towards the equity– quality debate. Does equity lead to quality in education, or would quality enhance equity? These high-performing education systems seem to show that education quality can offset economic inequalities, or can become an equalizer that enhances equity. In a report looking at the PISA results, the Asia Society (2012) defines high-performing education systems as systems that can achieve both equity and quality: The highest performing education systems are those that combine quality with equity. Equity in education means that personal or social circumstances, such as gender, ethnic origin or family background, are not obstacles to achieving educational potential (definition of fairness) and that all individuals reach at least a basic minimum level of skills (definition of inclusion). In these education systems, the vast majority of students have the opportunity to attain high-level skills, regardless of their own personal and socio-economic circumstances. Within the Asia-Pacific region, for example, South Korea, Shanghai-China, and Japan are examples of Asian education systems that have climbed the ladder to the top in both quality and equity indicators. In North America, Canada is among such countries as well. The United States is above the OECD mean in reading performance but below the mean with regard to equity. (Asia Society, 2012, p. 6)

CONCLUSION The efforts to improve educational provision globally and across countries are all based on a belief in that education can function as a change driver, for the individuals to strive for a better state of being, whether economically, socially and culturally speaking. From a macroscopic perspective, we believe that education is a means for the eradication of illiteracy, poverty eradication, equity promotion, and an overall improvement of the

economic productivity of the country, as stipulated in an Asian Development Bank’s document, entitled Framework and Criteria for the Appraisal and Socioeconomic Justification of Education Projects (ADB, 1994, p. 5): • Education can play a direct role in poverty reduction by enhancing the marketable skills of the economically disadvantaged and vulnerable groups and by expanding their ability to take advantage of income generation possibilities and available social services. • Education plays a key role in promoting the interests of women and increasing their diversified impact and contribution to national development goals. Women must have equal access to and participation in educational activities. • Through its impact on employment opportunities and earning potential, education alters the value placed on children and the willingness of parents to invest more in each child’s development. • Education contributes directly and indirectly to a higher level of socio-cultural and economic development that provides sufficient resources to address effectively environmental issues.

However, the promise of education has not been fulfilled. A number of evaluation reports, such as the aforementioned UNESCO Post2015 Position Paper (UNESCO, 2014), the 2013 Priority Paper (UNESCO & UNICEF, 2013) and the Global Education Monitoring Report 2017/18 (UNESCO, 2017), have provided similar observations that the same problems persist, despite the tremendous efforts expended by governments and international agencies. As highlighted by the Priority Report (UNESCO & UNICEF, 2013, p. 5) in regard to the EFA agenda, progress towards getting all children into school is too slow: • Children’s education prospects are still at risk as, among children who attend school, 25% drop out before completing primary schooling; • There is still an obvious gender gap in access to education, as 31 million of the 57 million out-ofschool children are girls; • 49% of these 57 million out-of-school children probably never set foot in a classroom; and

Enduring Issues in Education: Comparative Perspectives

• Regionally speaking, there is noteworthy progress that South and West Asia reduced out-of-school children by two-thirds between 1999–2011. However, more than half of the out-of-school children are clustered in sub-Sahara Africa.

And as highlighted in the 2017 Global Education Monitoring Report (UNESCO, 2017, p. 118): • In 2010–2015, completion rates were 83% for primary, 69% for lower secondary and 45% for upper secondary education; • About 387 million children of primary school age, or 56%, did not reach the minimum proficiency level in reading; and • Less than one in five countries guaranteed 12 years of free and compulsory education.

The problem may be due to some fundamental presumptuous issues such as the embeddedness of inequality and inequity in societies which is hard to eradicate. As reported widely, the income gap between the rich and poor is generally rising in modern societies today. Despite that, the universalization of education is expanding across the world. The Gini co-efficient indexes in many countries continue to be high even though their educational provision is basically universal. And, as aforementioned, even when everyone can go to school, new pedagogical strategies, which seem to liberate learners, are requiring more family resources to support them, such as the additional resources to buy computer equipment and to subscribe to wifi, all sorts of training in co-curricular activities, such as music, sport and art engagements, and the resources to support the children to participate in all kinds of exchange programmes, study abroad programmes and immersion programmes, plus internships. The increased introduction of experiential learning in school actually requires quite a lot of additional resources from home to support children in order for the engaged children to benefit from all these new provisions in school. On the lower end, where income is a barrier for access to schooling, governments need to

61

remove school charges so that children from poor families will not be deprived of educational opportunities. On this, the Education Counts report specifies that primary school fees, a major barrier to educational access, are still collected in 89 countries out of the 103 countries surveyed (UNESCO, 2010). On the higher end, where the goals of universalizing primary education are basically achieved, governments have to be aware of the emergence of new forms of poverty. For example, many development analysts have already identified the emergence of urban poverty. The many educational reforms reviewed above suggest new educational emphases on middle-class culture and the need for significant extra resources for individuals to meet those new educational targets. The disadvantages imposed upon the relatively poor families will make the poverty cycle difficult to break. For example, when the equal education opportunity in terms of access is achieved, the reforms and additional demands for performances make the disavantaged continue to be disadvantaged. Eradicating inequalities through education may result into perpetuating poverty. Governments need to consider measures to help the relatively poor to meet the new demands in educational reforms, which requires new resources to be made available. The recent discourse on ‘equity and quality’ seems to be one good solution to create a turning point in resolving these issues. The analyses of factors of success in the OECD’s various PISA studies over the past 15 years seem to find some insights that can bring about a different perspective to resolve the issue. Their main major finding is that if we study the high-performing countries in the various PISA exercises, these countries have not resolved the income-gap issue, as indicated by their high Gini indexes. However, what they have observed is that even under such circumstances, if the governments spend efforts to improve the quality of the education provided, such as elevating and standardizing the quality of teaching and school environments, students coming from disadvantaged

62

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

families, such as low-income and immigrant families, will have an equal chance to excel in the education system. One illuminating finding from the PISA studies is the identification of ‘resilience index’. This suggests that in high-performing countries, the percentage of resilient students is also high, i.e. despite coming from unfavourable SES backgrounds, these students can still attain high achievements in the PISA tests. As aforementioned, both the Grattan Report (Jenson, 2012) and the Asia Society Report (2012) came to the same observation that high-performing education systems are those countries which are able to combine the equity and quality issues, or, to put it in another way, to achieve equity by raising the general quality of schooling. In a paper discussing the directions for the post-2015 education agenda, Sayed and Ahmed (2015, p. 335) point out that: ‘The analysis thus far suggests that the articulation of equitable and inclusive quality as a goal cements the turn towards prioritization of quality and frames its pursuit within a social justice perspective, consistent with the emphasis on education as a human right and as a public good. This is potentially a huge quality agenda.’ Indeed, the Priority Paper (UNESCO & UNICEF, 2013) has also picked up ‘equity as quality’ as a new thinking and direction towards the achievement of the EFA goals towards 2030. It is hoped that the new target, with a focus on educational quality, will be able to bring about a new phase that can enhance the achievement of EFA for most of the countries in the world.

REFERENCES ADB (Asian Development Bank). (1994). Framework and Criteria for the Appraisal and Socioeconomic Justification of Education Projects. Manila: Asian Development Bank. Asia Society. (2012). Equity and quality in education: Supporting disadvantaged students and schools. Retrieved from https://asiasociety.org/education/equity-and-quality-education

Bruns, B., Mingat, A., and Rakotomalala, R. (2003). Achieving Universal Primary Education by 2015: A Chance for Every Child. Washington, DC: The World Bank. Ho, E. S. C. (2013). Overall quality and equality of Hong Kong basic education system from PISA 2000+ to PISA 2006. In E.S.C Ho (Ed.), Multilevel Analysis of the PISA Data: Insight for Policy and Practice (pp. 17–36). Hong Kong, China: Hong Kong Institute of Educational Research. The Chinese University of Hong Kong. Jensen, B. (2012). Catching up: Learning from the best school systems in East Asia. Grattin Institute. Retrieved from grattan.edu.au/ wp-content/uploads/2014/04/130_report_ learning_from_the_best_detail.pdf Lee, W. O. (2004). Equity and Access of Education: Themes, Tensions, and Policies. Manila: Asian Development Bank/Hong Kong: Comparative Education Research Centre, University of Hong Kong. Lee, W. O. (2007). ‘Education policy and planning that empowers: Eradication of poverty and inequalities in education’, Journal of Education for International Understanding, Vol. 3, pp. 93–105. Majhanovich, S. (2014). ‘Neo-liberalism, language policy and practice issues in the Asia-Pacific region’, Asia Pacific Journal of Education, Vol. 34, No. 2, pp. 168–183. OECD. (2007). PISA 2006: Science Competencies for Tomorrow’s World: Volume 1: Analysis. Paris: PISA, OECD Publishing. dx.doi.org/ 10.1787/9789264040014-en. OECD. (2010). PISA 2009 Results: What students know and can do. Student Performance in Reading, Mathematics and Science, Vol. I. Paris: PISA, OECD Publishing. dx.doi.org/ 10.1787/9789264091450-en. OECD. (2012). Equity and Quality in Education: Supporting Disadvantaged Students and Schools. Paris: OECD Publishing. Sayed, Y. and Ahmed, R. (2015). ‘Education quality, and teaching and learning in the post 2015 education agenda’, International Journal of Educational Development, No. 40, pp. 330–338. Schleicher, A. (2014). Equity, Excellence and Inclusiveness in Education Policy Lessons from Around the World. Paris: OECD. Schulz, W., Ainley, J., Fraillon, J., Kerr, D., and Losito, K. (2009). ICCS 2009 International

Enduring Issues in Education: Comparative Perspectives

Report: Civic knowledge, attitudes, and engagement among lower secondary school students in 38 countries. Amsterdam: IEA. Retrieved from: www.iea.nl/fileadmin/user_ upload/Publications/Electronic_versions/ ICCS_2009_International_Report.pdf Tucker, M. (2011). Standing on the shoulders of giants: An American Agenda for Educational Reform. Washington DC: National Center for Education and the Economy. Retrieved from ncee.org/wp-content/ uploads/2011/05/Standing-on-the-Shouldersof-Giants-An-American-Agenda-for-EducationReform.pdf. UNESCO. (2000). The Darkar Framework for Action. Education for All: Meeting Our Collective Commitments. Adopted by the World Education Forum, Darkar, Senegal, 26–28 April 2000. Paris: UNESCO. UNESCO. (2010a). Education Counts: Towards the Millennium Development Goals. Paris: UNESCO. UNESCO. (2010b). EFA Global Monitoring Report 2010: Reaching the Marginalized. France: UNESCO.

63

UNESCO. (2014). Position Paper on Education Post-2015. Paris: UNESCO. UNESCO. (2017). Accountability in Education: Meeting Our Commitments. Global Education Monitoring Report 2017/18. Paris: UNESCO. UNESCO & UNICEF. (2013). Making Education a Priority in the Post-2015 Development Agenda: Report of the Global Thematic Consultation on Education in the Post-2015 Development Agenda. Paris: UNESCO & UNICEF. United Nations. (2012). The Millennium Development Goals Report 2012. New York: United Nations. Available at: www.un.org/ millenniumgoals/pdf/MDG%20Report%20 2012.pdf. United Nations Development Programme (UNDP). (2013). The Rise of the South: Human Progress in a Diverse World. Human Development Report 2013. New York: UNDP. Retrieved from: http://hdr.undp.org/sites/default/files/ reports/14/hdr2013_en_complete.pdf WCEFA (World Conference on Education for All). (1990). World Declaration on Education for All. New York: WCEFA Inter-Agency Commission.

4 Riddled with Gaping Wounds: A Methodological Critique of Comparative and International Studies in Education: Views of a Professor R u i Ya n g

INTRODUCTION In September 2017, I received an invitation from an editor of the Shanghai-based scholarly newspaper Social Sciences Weekly (社会 科学报), published in the Chinese language, to write an article to celebrate the 200th anniversary of Marc-Antoine Jullien’s (1816/17) Esquisse et Vues Préliminaries d’un ouvrage sur L’Éducation Comparée [Sketch and Preliminary Views for a Work on Comparative Education] as an epochal moment in the establishment of Comparative Education as a scientific field of academic study (Epstein, 2017; Jullien, 1816/1964). The invitation reminded me immediately of a recent visit to the Comparative Education Research Center in the Faculty of Education at the University of

Hong Kong by Professor Ruth Hayhoe, a distinguished scholar in comparative higher education and one of the most highly regarded experts on Chinese education in the world. She was past President of the Comparative and International Education Society (CIES) during 1999–2000 and the CIES Honorary Fellow in 2011. During her visit she introduced the latest version of a Comparative Education textbook, compiled for teacher education students by her together with a group of her colleagues in the Ontario Institute for Studies in Education at the University of Toronto. During her seminar I commented on the first chapter of the book as a typical Western version of how Comparative Education was established and developed as a field of study, starting formally with European scholars

RIDDLED WITH GAPING WOUNDS

(Bickmore, Hayhoe, Manion, Mundy & Read, 2017). Indeed, while ancient people in some other countries, especially those with age-old histories, had long observed and recorded education other than their own, it is only the Western version that has been widely recognized as the formal beginning of Comparative Education as an academic discipline in both the West and the non-West. By the same token, how Comparative Education is taught and researched is also judged by the standards practiced in Western societies. While such practices have their own problems, their problems cause much further problems in the wide non-Western world. Even renowned Western scholars in the field with good knowledge of and respect for educational traditions of other civilizations could still be much influenced by, and therefore confined to, their own cultural positionality. This is why the introductory chapter of Comparative and International Education: Issues for Teachers contains little elements from non-Western societies (Bickmore et al., 2017). From the point of view of global readers, combing different educational traditions is necessary for Comparative Education to have real meaning. In my view, Comparative Education has not been able to contribute significantly to educational reforms and development, nor to theoretical understanding of education. Focusing on methodology, this chapter offers a critique of the field. Methodology is used here as the lens through which we view, undertake, and translate our research (Walter, 2014). The chapter cites examples from East Asia for illustration due to the author’s theoretical specialization and professional background. For an academic in East Asia embroiled in Western discourses, the contemporary education system involves cultural masquerades due to an intrusion of Western influences. As an Asian scholar doing Comparative Education, it has often been a painful experience to survive global scholarly relations, with only accidental and transient thrilling moments. Taking the errors of the past as

65

starting points for new directions, this chapter attempts to capture the perceived realities of comparative education research from a nonWestern eye. Instead of refuting theories that have been developed in the West, it intends to apply and re-theorize them based on East Asian experience, and, if possible, to critique hegemonic notions of westernization and globalization. It first charts in broad outline the status quo of comparative education research. Based on contemporary knowledge turn and citing examples from East Asia, it then suggests how Comparative Education could be better developed by resolving substantial differences and even conflicts between Western and non-Western approaches to scholarship.

LINGERING EUROCENTRISM AND ITS MANIFESTATIONS Comparative and international studies in education have failed to achieve highly. They have rarely delivered what they promise. The field is indeed in a state of crisis. Paolone (2016) made such an assessment based on the activities of the Italian Comparative Education Society. After years of studying the disciplinary development of Comparative Education, Wolhuter, Karras, and Calogiannakis (2015) attribute its lack of achievement to challenges created by new global societal contextual factors. There are some chronic and fundamental reasons. Comparative Education is just a single spot on a leopard that infers what the animal looks like. Modern social sciences originated in the nineteenth century of Occidental descent (Wallerstein, 1997), especially in the period after the French Revolution when political conflict, rapid urbanization and social turmoil convulsed European societies. Intellectuals, such as Auguste Comte (1798– 1857), who wanted to find regularities, even laws, in social life that resembled Newtonian physics, sought to explain both the bewildering chaos and the new possibilities around

66

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

them. The social science we teach and learn in universities today emerged in response to European problems and is based almost exclusively on the 19th century European experience. As a result of the then specific time and space, it becomes weak when either time or space changes. The highly institutionalized social sciences have always sought to transcend history without much success. With globalization, both time and space have changed dramatically. However, the heartland of social sciences in the West has largely turned an epistemologically blind eye to new changes. Conventional social studies face far more fundamental challenges in a context of globalization. The social science more than often fails to be able to interpret what is going on especially in non-Western societies. The misjudgment about Donald Trump’s election victory is a clear indicator of some fundamental problems that run deep in contemporary mainstream social research. If the perceived best social scientists could be so wrong about a society they are familiar with, one can logically question how poorly they could observe the societies that are strikingly different from theirs culturally, politically and economically. Similarly, Comparative Education has been dominated by Western scholarly discourses both in the West and in non-Western societies. As globalization comes to town and penetrates the deepest crevices of human endeavor, comparative and international studies in education face serious challenges from within and without. This has been a chronically epistemological issue for the field. Established in universities throughout the world, Comparative Education has a number of interrelated weaknesses. It is outside the scope of this chapter to go into details of them. Instead, this chapter focuses selectively on the most fundamental issue – Eurocentrism. Social science has been Eurocentric throughout its institutional history (Wallerstein, 1997). Comparative Education claims to serve to combat provincialism and ethnocentrism (Phillips & Schweisfurth, 2008), with

respect for others, and is concerned not to be Eurocentric, as a matter of pride (Takayama, Sriprakash & Connell, 2017). As a field of research, it is defined by cross-cultural pursuits. It is thus disappointing to find it is still impressively parochial (Welch, 2003). A great proportion of its researchers and their works have contributed to Eurocentrism in one way or another, privileging the Western perspective over others, particularly in the studies of non-Western societies, which is epitomized often by Western bias and a lack of intimate knowledge of and respect for those societies. Western scholars and their works dominate scholarship on comparative and international education. Although ways in which cultures have coexisted are apparent (Gundara, 2000), non-Western educational traditions have been largely neglected. NonWestern researchers continue to look to their Western counterparts for theory, methodology and choice of subject matter. As a European affair and related directly to the Enlightenment, Eurocentrism is ‘a unique set of beliefs, and is uniquely powerful, because it is the intellectual and scholarly rationale for one of the most powerful social interests of the European elite’ (Blaut, 1993, p. 10). It was justified by two interrelated premises: that the West, in being Modern, was the avant-garde of progress and that the history of the West should be the fate of all humanity. Because of modernity’s historical imbrication with the West, one cannot speak of it without implicating Eurocentrism and vice versa. The ideologies of Euro-modernity have permeated the social fabric of non-Western societies since European expansion in the modern times. Western civilization is the highest achiever in conquering the external world, with an instituted divorce between science on the one hand and philosophy and the humanities on the other, the separation of the quest for the true and the quest for the good and the beautiful (Wallerstein, 1995). As the basis of modern university systems, this conceptual split has enabled the modern world to

RIDDLED WITH GAPING WOUNDS

put forward the bizarre notion of valueneutrality, which has been greatly influential in social sciences. It sustains Eurocentrism and led to widespread universalism in modern social sciences, including Comparative Education. Universalism holds that there exist scientific truths that are valid across all of time and space. What happened in Europe in the sixteenth to nineteenth centuries represents a pattern that is applicable everywhere. Only in the last twenty years or so, has the legitimacy of this divorce been challenged for the first time in a significant way. The term ‘Eurocentrism’ is often understood as the tendency of ‘the West’ to view the world and evaluate it through its cultural norms, mores, and standards. Unlike typical ethnocentrisms, Eurocentrism can transcend its own cultural particularity and manifest it in the non-West. Indeed, many non-Western comparative education researchers observe their own societies with a Eurocentric perspective because their training and education has been Western in nature (Grigorenko, 2007; O’Sullivan, 1999; UNESCO, 1998). Ethnocentrisms are most often unintended, unconscious, unavoidable, and are the symptoms of a benign innocence and ignorance for both Western and non-Western researchers. It appears whenever the West is admired as the standard. The belief in the superiority of Western civilization has been and remains self-conscious and calculated on the part of its proponents. It is an effort that has received tremendous institutional support – political, economic, and intellectual – and that has steadily been consolidated since the Renaissance. The notion of Western superiority persists even if it is not readily acknowledged. It is widely apparent in comparative and international studies in education. This was the norm in the epoch of classical colonialism but, significantly, it is still the case now. Although West/non-West relations today are no longer ordered by overtly Eurocentric justifications, they continue to be predicated on the Western belief that it is morally superior and that it is its right to act on such a basis.

67

Discourses of globalization have prioritized Western perspectives. The criteria for scientific knowledge are often largely defined as knowledge that is based on ‘Western’ experiences (Appadurai, 2006), based on ‘Western’ research methods (Yang, 2006), and are disseminated in the English language (Ng, 2012). Scholarship from rich Western countries is often denoted as the norm for academic knowledge while scholarship from other regions is often dismissed. ‘Global’ knowledge is only constituted of research and ideas developed in Western countries and defined particularly according to AngloAmerican paradigms. While some in the West, especially those who criticize Western hegemony, have knowledge of this, it is hard for them to understand profoundly the impact on non-Western societies in terms of both scale and depth. Meanwhile, non-Western researchers attempt desperately to become as Western as they possibly can, although few of them truly grasp the essence of Western civilization. Built upon their social life, Western discourse can never truly and fully express the experiences of the peoples in wide nonWestern societies. As a result, the extraordinarily rich historical and cultural experiences in the non-Western world have not been expressed in the literature on Comparative Education. At the same time, the educational experiences accumulated by non-Western societies have the potential to contribute greatly to theorization of educational policy and practice. Scholars from both Western and non-Western backgrounds need to understand Western theories and the societies they study thoroughly enough to achieve this goal. Such profound understanding of both has been lacking on each side. Eurocentrism has epistemological implications. With a lingering attachment to ideas, methodologies, and divisions that marked their birth in the nineteenth century, social science needs to interrogate Western universalism and methodological positivism. It has to stop its ridiculous aping of the natural sciences, with the reductionist separation of

68

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

disciplines and rationalist assumptions in the search for hidden truths and social laws. The time is up for the canon of modernist social science. The demand for the ‘opening’ of the social sciences (Wallerstein et  al., 1996) reflects perceived changes in the very basis of what social sciences are: how they relate to the natural sciences; how they relate to each other; how they are taught; and what research means ontologically and epistemologically (Burawoy, 2007). For comparative and international studies in education, the intense controversy swirling around social science research methodologies and approaches has not led to any substantial improvement of the field. Indeed, this lingering dispute has pitted many researchers and students in the field against each other. There has been an abundant literature on both the contestation and the affirmation of comparative education theory. In the studies of comparative and international education, it is not rare to find colonialism, the belief in objectivity and universalism without contextual bonds, abstract interpretation of educational experiences as something taking place as separated from time, space and nature, and the state-centered way of interpreting reality. Many researchers claim that they conduct ‘scientific’ research. Yet, they have little understanding of social research methods and their weaknesses. This is widespread from Western to non-Western societies, including some well-known scholars. The knowledge they have acquired and the training they have received are not sufficient for them to reflect deeply upon such inadequacies (Zhao, 2015). Right from the outset, social science has been greatly influenced by natural science. Like most studies in the social sciences, comparative education research relies heavily on Weber’s notion of ‘ideal types,’ especially in comparing education in different nations. Such a cornerstone has been seriously problematized by contemporary scholars for its essentializing tendency to focus on extreme phenomena and to overlook the connections between them, which are a salient feature of the globalization era.

Abstracted empiricism is another manifestation of Eurocentrism. It was coined by Wright Mills (1959), who was concerned that sociology would become saturated in information but lacking in ideas, especially with the dawn of high-speed computers. It is the practice of gathering sociological data for one’s own sake without developing a theoretical framework that would give that data meaning and value. This is also seen in Comparative Education, where we have more and more empirical works while their authors do not always demonstrate a substantial understanding of the realities. They also lack a strong sense of the problematic, especially the sense of history. Although they often present a set of rich empirical data, they fail to come to real terms of those data. With many seemingly correct details, the general picture they have is often inaccurate and even misleading. Such studies have only paid attention to concepts, approaches, and/or theoretical frameworks on the surface, failing to delve deeply into the socio-historical meanings behind these concepts, approaches, and theoretical framework. As a matter of fact, they are essentially to insert certain empirical data (usually from non-Western societies) into other abstract concepts, approaches, and theoretical frameworks (often from Western societies). Such works might look neat superficially, while they essentially fail to get to the root of the matter. This is a particular problem with the studies of non-Western societies by both domestic and foreign researchers. It is also evident in the studies of developed societies, although the problems of such societies may appear less prominent. The greatest challenge that Eurocentrism poses lies most immediately and imposingly in the mundane: not in reified theoretical constructs but in the lived experiences of the everyday. It is lived out and lived with, felt before it is thought out. Indeed, it needs to be noted that Eurocentrism is foremost a political problem; it is the inspiration for and, simultaneously, the outcome of colonial arrangements structuring the world for the better

RIDDLED WITH GAPING WOUNDS

part of the past centuries. Within a society it also exists, taking the form of discrimination, particularly in countries of migrants. Shaun Harper, for example, has called out racism in American academy. According to him, white supremacists had marched on American campuses. American higher education has been historically, and continues to be, dominated by ‘white power’ – architecturally, compositionally, curricularly and editorially (Lederman, 2017). Many comparative and international education researchers who explore the hegemonic discursive formations of globalization to uncover processes of inclusion and exclusion, and the subjugation of knowledges, are themselves Eurocentric. Therefore, it is important to explore how globalization is constructed in hegemonic discourses in order to analyze whose perspectives are included and whose are excluded (Robertson, 2011).

NEW REALITIES OF KNOWLEDGE Non-Western societies have been struggling with their long-desired integration between their traditional and Western cultural heritages ever since their early encounters with the West. They are confronted with a difficult choice: the dominant Western knowledge on the one hand, and their strong indigenous traditions on the other, with constant tensions between the two. Fundamental assumptions of their indigenous knowledges have rarely been presented as established sets of beliefs and as processes or coherent methods of learning and teaching. After absorbing Western system for centuries, non-Western societies have been institutionally westernized, with ‘academic colonization’ in their social inquiry (Hwang, 2016). Most social science researches have designated to accumulate empirical data under the guidance of Western theoretical models. Such a cruel fact needs to be acknowledged in comparative and international studies in education. Although the search for more relevant social

69

science is picking up intensity and sincerity (Lee, 2000), too often and too many researchers choose to complain about such unequal relations in world scholarship instead of addressing them. They are far more critical than constructive. While they are good at exposing the evil of the existing system, their suggestions of practical reform for the future are not as valuable as their indictment of the past (Higginson, 1979). As a large-scale and yet fragmented process (Appadurai, 1996; García-Canclini, 1989; Hannerz, 1987; Martín-Barbero, 1993; Pieterse, 1994), globalization has come to town, penetrating the deepest crevices of human endeavor. Many incongruous facets of human existence have been forced together into a giant tumbler, ‘giving rise to contradictory but also generative responses’ (Odora Hoppers, 2009, p. 601). In this sense, globalization is new, bringing all peoples into direct contact at all times for the first time in human history. ‘Previously excluded and excised “objects” are now occupying intimate spaces with those who had believed that their subject position was ordained by God’ (ibid., p. 609). Knowledge of and respect for others have become a basic condition for sustainability of any society, giving more weight to and present further demand on comparative social studies. This has significant epistemological implications. In such a dynamic episode, Western values are no longer seen as the only authority. The intellectual traditions of those excluded and epistemologically disenfranchised gain attention, acquire agency, and demand a new synthesis. There is an urgent need for an ‘integrative paradigm shift’ (ibid., p. 602). As the moral and intellectual ground for co-existence and codetermination is fast increasing, questions around them are being asked at the most penetrating levels. Featured by uncertainty, globalization is also an opportune moment to develop new and different intellectual and academic discourses. Although even the most enlightened initiatives which take account of universal principles still revolve around the nation, and

70

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

the way in which knowledge within the official school curriculum is selectively developed based on a narrow understanding of the nation, it is increasingly likely to devise an intercultural understanding of knowledge from across cultures and civilizations to obviate a clash of civilizations (Gundara, 2014). Despite great difficulties, especially for non-Western societies, the possibility of reimagining and redesigning education in the context of globalization is real. The processes of globalization of education need to be reconsidered and reanalyzed. Change in the imaginaries and enactments of globalization can be initiated from every network node, as much in the United States as in East Asia – the networks are non-hierarchical and rhizomatic (Deleuze & Guattari, 1988; Lundberg, 2013). Human society has had Eurocentric (Bernal, 1987), Indo-centric (Chaudhuri, 1990) and Sino-centric (Hamashita, 1988) memories, histories, and understandings of the past, which may not be a sufficient basis for future cross-cultural education and engagement. The attempt is not to replace one type of centrism with another, which reinforces centric intellectual tunnel visions, but to develop a more holistic and non-centric formulation of issues about the substance of intercultural and civic education (Gundara, 2014). Some propose to recognize ‘bonding’ within a group and to use this as a basis for bridging or linking with other groups on a sustained basis (Putnam, Feldstein, & Cohen, 2003, pp. 280–281). Gundara (2014), however, criticizes it as a prerogative of Eurocentric notions of the modern world system and essentially nineteenth-century constructions as articulated by Wallerstein (1974, 2006). He calls educators to pool civilizational knowledge in ways that do not polarize peoples but help to develop more syncretism that recognizes difference and diversity but also allows for the nurturing and the development of points of mutuality and similarity between beliefs and values. Odora Hoppers (2009) stresses re-strengthening core values from different traditions of knowledge and

living. In her eye, the assumption of superiority of the West and its patronizing obsession with facilitating the entry of traditional societies into the ‘developed’ world is brought under sharp scrutiny. Western modernization progress and thought are only a temporary epoch in human history. She proposes reengagement with the more holistic integrated conceptualizations of sustainable life held by cultures that have not been down the path of westernization. It is a rapprochement of modern and older cultures, including modern culture’s older roots, where each complementing the other opens up the possibility of a viable future for humankind (Fatnowna & Pickett, 2002; Huntington, 1996). Both notions of ‘syncretism’ and ‘rapprochement’ are the efforts to draw insights from other traditions and cultures from around the world and make them part of the global discourse. They aim to tackle transcultural relations that are complex, processual, and dynamic. According to Kraidy (2002), the local reception of global discourses and practices is necessarily a site of cultural mixture. Schools and universities are a space where intercultural and international communication practices are continuously negotiated in interactions of different power, vividly demonstrating how non-Western contexts encode Western representations. Hybridity is seen as a by-product of the transcultural dynamics between tradition and modernity, as illustrated by Appadurai’s (1996) notion of ‘disjuncture,’ Martín-Barbero’s (1993) reformulation of the concept of ‘mediations,’ and García-Canclini’s (1990) ‘cultural reconversion.’ As a site for conceptualizing global/ local articulations, it emerges as a privileged characterization of cultural globalization (Fukuyama, 1992; Huntington, 1996). When the outside/inside distinction fails in a context of globalization, there is an intense search for ways to discuss, construct, and institute initiatives at local and global levels. It is a process of engaging with colonialism in a manner that produces a program for its dislocation (Prakash, 1995), which is made possible not

RIDDLED WITH GAPING WOUNDS

only by permitting subalterns direct space for engaging with the structures and manifestations of colonialism, but also by inserting into the discourse arena totally different meanings and registers from other traditions. Where the above notions fall short is how they perceive the formation of the contemporary discourse of the West and the nonWest, that is, how the West and the non-West are constituted and how relations between Western and non-Western societies come to be represented. While their intention has been well taken, their approaches would not be effective. Indeed, they are intellectually inappropriate and practically misleading. Educational development in non-Western societies needs to be located into a coordinate system that includes the past, the present, the indigenous and the foreign/Western, aiming at building their own knowledge systems that can provide their people with a spiritual homeland. Due to the fact that the West has come to the rest of the world with enormous prestige, the global knowledge landscape has been changed with Western knowledge at the center as the only legitimate knowledge worldwide. The intellectual scenarios in nonWestern societies have become highly complex. In contrast, the West has been the only one that has maintained its own conventional knowledge without fundamental external influences systemically. While it has been dominant on a global scale, it needs to learn from other civilizations to survive globa­ lization. For non-Western societies, Western learning has become the most important part of their modern knowledge system. Without Western knowledge, neither national nor individual development could be possible. Although the penetration of Western knowledge into every corner of our societies has been profound, it is shocking to see how much comparative education research remains bogged down in a quagmire of a dichotomy between the West and the Rest. In this sense, a coordinate system that incorporates the past, the present, the traditional, and the Western is entailed.

71

Many non-Western societies built their modern education systems based on Western experience after their political independence. What underlies their systems are some core Western values which have been absent in their traditional cultures. The Western values are not always compatible with their traditions. The coexistence of different and even conflicting values has proven to be a great challenge for such societies, leading to serious divides between their formal school curriculum and socio-economic realities. The tensions between the different value systems have greatly constrained the functioning of their educational systems, with consequential poor effectiveness of their education systems. The complexity is that the worldwide spread of Western influence has already become a precondition for nation-building in nonWestern societies. It is no longer constructive to simply complain about this as over westernization. The realistic approach is to find new ways to incorporate the West without losing their cultural identities. Indeed, very few societies, if any, could afford to find ways without significantly incorporating Western knowledge and values. While unfair and unethical for many, such a situation has been caused by historical facts. It is thus more sensible to find ways to address it rather than simply trying to reject it. Institutionalization of Western learning in non-Western societies has long begun, as part of profound social transformations since the spread of Western learning in modern times. The shift from traditional learning to Western knowledge is both ideological and institutional. The two dimensions could be viewed respectively as the mind and body of the shift. While their traditional minds could and should never be entirely transformed according to Western experiences, their modern education systems have been gradually and fundamentally institutionalized based on Western practices. There have been tensions and mismatches between the alreadytransformed body and the under-transformed mind. Their contemporary academic systems

72

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

are Western transplants with little space for their indigenous intellectual traditions. The transformation is much more than paradigm shifts and a change of approaches. It was fundamental, comprehensive, and multiple in both layer and dimension. Non-Western societies are pressured to understand both Western and traditional knowledge thoroughly in order to conciliate both. Built upon Western experience, their education produces people with little knowledge of their traditions which continue to influence the societies. Therefore, even when many educated non-Western’s are determined to achieve the integration, they are not well equipped to do so. It is thus theoretically inappropriate and practically unconstructive to try to draw a dividing line between traditional and Western knowledge. Some scholars have long acknowledged this. For instance, Western theories and methods in social sciences have long become the basis for university curriculum, including in the studies of Chinese classics. By the 1930s scholars studying Chinese literature all agreed that a thorough knowledge of both Chinese and Western literature was necessary to achieve innovation in literary research. As Fu Sinian (1896–1950) observed in 1919, ‘If you are to research Chinese literature, yet never understand foreign literature, or if you are to document the history of Chinese literature yet have never read any of the history of foreign literature, you will never ever grasp the truth’ (Fu, 2003, p. 1492). Most recently, Bowden (2009) observes that while East and West have had their share of skirmishes and still have their differences, they have also influenced each other and borrowed heavily among themselves in the marketplace of ideas. This dimension of East–West relations is something that is overlooked, even denied, when many speak of the history and ongoing relations between peoples of the East and those in the West. He criticizes the fact that significant elements of the Western canon of political thought have denied both the contribution and the capacity of the East – and

others – to add anything of value to the history of ideas catalogue. He highlights the common intellectual ground and the inevitable and unavoidable borrowing and exchange of ideas between the East, the West, and other traditions of thought. Therefore, for comparative education researchers, the West is always present either explicitly or implicitly as the backdrop. Deep knowledge of both what is researched and the West is always required. Shortage of one of them will lead to failed studies in comparative and international education. This goes far beyond the much-documented rhetoric of context matters (Crossley, 2002). For many non-Western scholars, unfortunately, their societies have rarely been able to have thorough knowledge of both. Even the most developed nations, such as Japan and Singapore, are still struggling with it. For a country as powerful as China, with rich historical heritage and remarkable economic development over recent decades, this has continued to be the greatest cultural challenge. Liang Shuming (1893–1988) (Liang, 1921/1990, p. 50) remarked in 1921 that ‘Chinese people will never gain a clear understanding if they only remain within the structures of Chinese society; if only they first look to others and then at themselves, then they will immediately understand.’ Today, China’s leading scholars complain about the lack of understanding of the West, on the one hand, and even less knowledge of their own culture and society, on the other, by Chinese intellectuals (Zhao, 2016). For comparative education researchers from a Western background, it is imperative to be aware that the modern rise of Europe was a result of borrowing ideas from other civilizations. Early interactions between Europe, the Middle East and Asia were part of the development of the Renaissance and contributed to scientific and secular knowledge during the Enlightenment and led to Europe’s modern rise. As Europeans in the last two centuries have unquestionably sat on top of the world, Western scholars are much less motivated to truly learn from others.

RIDDLED WITH GAPING WOUNDS

With their little knowledge of others, many of them are poorly situated to conduct research on non-Western societies.

EXAMPLES FROM EAST ASIA Let me now cite examples from East Asia to illustrate specifically what I have argued in the previous sections of this chapter. My first example is an analysis of the literature on East Asian higher education. I searched for the literature in late 2015 using IngentaConnect via the electronic library at the University of Hong Kong.1 The key words used for searching were East Asia and higher education. There was no time limit set for the search. Altogether, fifty-five items popped up on my computer screen, with the earliest published in 1995 and the latest in 2015. They covered a wide range of topics and those on education are at various levels of schooling. Each was checked carefully. Three of them were not available at the time. Seventeen items were excluded because of their irrelevance. While there were so many irrelevant items included in the list, some works that fall squarely into the field of East Asian higher education to my knowledge were excluded. This might be due to the coverage of the search engine and/or the indexing situations of different academic journals. There were four book reviews that cannot be regarded as research work. Thirty-one publications were finally included in the following analysis. They were published in 1995 (1), 1998 (1), 2001 (1), 2002 (1), 2003 (1), 2005 (2), 2006 (1), 2007 (6), 2008 (1), 2009 (1), 2011 (4), 2012 (1), 2013 (4), 2014 (4), and 2015 (2). Among them, some works were only remotely related to East Asia. A number of works claimed research on East Asia yet only looked at one society. Altogether, fourteen were considered directly on East Asia. The theoretical frameworks employed by the thirty-one publications can be categorized into four: pure description of East

73

Asian practices (2), using foreign/Western theories to analyze East Asia (15), using East Asian examples to confirm foreign/Western theories (13), and trying to challenge foreign/ Western theories (1). This scenario echoes the findings from a similar survey of literature in 2013 which used Hong Kong and education policy as key words (Yang, 2013a). In the survey, only one piece of work among seventythree publications was trying to challenge the existing Western theory. Interestingly, the author of the work was from Australia originally, with decades of academic working experience in Hong Kong. In the current survey, the author is from Germany working at a Japanese university. The fact that they are both Western working in East Asia is not purely accidental. While most Western scholars observing from the West tend to be Euro-centered, East Asian local researchers have had a strong colonial mindset, which is most evident in former colonized societies, such as Hong Kong. It is also strong among the societies, such as China and Japan, where although there was no political colonization there has been colonization of their mind. A typical research work produced by East Asian researchers is constituted of Western theories with local examples. The ‘academic colonization’ defined by Hwang (2016) remains prevalent throughout East Asia.2 The thirty-one publications cited a total of 1,270 references. Among them, 1,159 (91.26%) were in English, while 10 (0.78%) were in German. Ninety-four (7.46%) government documents were cited, of which 77 were from East Asia and 17 from the West. Seven local references were cited, among them, four were published in Sri Lanka and two were in Chinese. The Chinese ones were indeed the same work cited twice by the same author in two different publications. The imbalance between Western (especially English) literature and local scholarship cannot be more evident. This contrast explains how researchers lack a local perspective. Such a gap often leads to divides among local scholars and between those publishing in

74

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

English and those mainly writing in local languages. Major East Asian societies, including the Chinese mainland, Japan, Korea, and Taiwan, have all developed a solid research system operating in local languages. For both international and local researchers, it is no longer feasible to continue to ignore the increasingly large bodies of literature the system produces. Indeed, the width and depth of such locally produced literature are often of high quality. Anyone who aims to truly understand East Asia needs to pay serious attention to it. The fact that local literature, especially those on local history and culture in native languages, has been incorporated so little is indeed a serious issue for research on East Asian higher education. Higher education is deeply rooted in culture, and universities are after all cultural institutions. They are most profoundly influenced by the cultural conditions of their societies locally, regionally, and globally. East Asian higher education development is fundamentally about the relations between Western and East Asian cultural values. Whether or not East Asian societies can fulfil their long-desired integration between the two value systems is the true meaning of and biggest challenge for East Asian higher education development. Within such a cultural context, researchers have to understand their own cultures and societies well. Yet, understanding their own cultures and histories has been extremely difficult for the generations born after the mid-twentieth century due to the dramatic historical changes in these societies. Without understanding their own histories, cultures, and societies, it is just not possible for them to build a truly locally-based perspective. It would also be beyond their capacity to challenge the ofteninappropriate Western perspective in observing East Asian societies. The above example reveals the politics of representation and authenticity placed at the core and how much the intellectual mind of East Asian comparative higher education researchers has been dominated by the global

West. In a context of Western dominance in contemporary social inquiry in East Asia, the fact that a locally-based researcher adopts a Western lens to observe her/his own society is unsurprising although quite unfortunate. This includes a large number of researchers within East Asia and many studying and working overseas. It calls for a plurality of system models to render transparent the possible analytical schemas and to analyze each system from more than one vantage point, as phenomena significant from several different vantage points take an added importance and facilitate generic global analysis (Marginson, 2014). The second example is about how to observe East Asia’s higher education reality. As modern universities in East Asia are Western transplants, forging their identity is inevitably an arduous task. East Asia’s strikingly different cultural roots and heritages have led to continuous conflicts between their indigenous and Western higher education values. While the institutionalization of modern universities in East Asia has been based on Western values, an often informal yet powerful system supported by traditional culture has never stopped exerting influence on higher education. The two systems often do not support each other. Instead, constant tensions between them reduce the effectiveness of university operation. Although there have been strong attempts to indigenize the Western idea of a university (Yang, 2013b), little has been achieved. While there is growing pride in the idea that East Asian universities are not willing to assume that Western models define excellence, few – both within and outside the region – have been able to theorize their differences from Western universities. Combining East Asian and Western ideas of a university is a major issue of fundamental importance. Based on the findings from my research project, which was supported by the Hong Kong Research Grant Council during 2013– 2017, I find that fundamental values underlying the university have begun to take their roots in East Asia, contributing to a narrowing

RIDDLED WITH GAPING WOUNDS

of the conventional gap between Western and East Asian ideas of a university. East Asian higher education and academic elites express their optimism openly and firmly, based on their confidence in the cultural support they receive. The culture is a combination of the traditional and the Western. It has become normal practice in East Asian universities to ‘have traditional values for conducting oneself and Western values for conducting business’ (Yang, 2017). Nearly all the respondents included Western knowledge in their talks. This needs to be understood in a context of contemporary East Asian society and culture that have been profoundly influenced by Western values, as a consequence of the westernization of world education (Grigorenko, 2007; Latouche, 1996; UNESCO, 1998). As Western knowledge becomes part of East Asia’s contemporary knowledge system, it is already impossible for East Asians to talk about higher education without mentioning the West. This is most evident in everyday teaching, research, and administration in East Asian universities as well as in the speeches delivered by their presidents and in institutional development plans. With an understanding of traditional and Western knowledges by the elites, East Asia’s very best universities have the promise to integrate both traditions in their day-today operation. Such a bi-culturality, or even multi-culturality, is in stark contrast to the still largely mono-cultural university operation environment in the West. However, the literature on East Asian higher education has turned a blind eye to such extraordinary phenomena. This is a substantial misjudgment, as being able to learn from other cultures has become crucially important for the sustainable development of any society in an era of globalization (Cheng, 2007). This also explains why comparative and international studies have not contributed effectively to higher education reform and development in East Asia. Considering the fact that East Asian societies have been forced to learn from the West since the nineteenth century,

75

their remarkable achievement in higher education is profound and comprehensive with significance at various levels. However, it has been observed only superficially by researchers from both within and outside the region. Limited by the powerful influence of Western theorization of higher education, they have not been able to capture the most essential meaning of East Asia’s experiment, which could have furnished their studies with abundant material and insight.

CONCLUDING COMMENTS In an increasingly technologized world, we are becoming more and more specialized. We tend to pay much attention to details but isolate them and often ignore the whole. This is generally the case in contemporary social research and more so in comparative and international education. It is manifested in individual research as well as in national policy-making. However, we are also moving toward a global civilization with many local cultures. Globalization pushes the problems of the whole into everybody’s face. The global whole is now homing in on us (Schäfer, 2001). Our challenges are profoundly complex, demanding our well-thought responses. We are pressured to bring the whole back. This explains why physicist Gell-Mann (1997) is concerned with the many unsustainable policies in trends of the present. He fears that intelligent life on earth is not taking good care of its future, and he wants us to look at the whole because the combined effects of global history are threatening the whole. He calls for an assessment of the global state of affairs. According to him, human welfare and that of the planet require sustainability of the whole. It is vitally important that we supplement our specialized studies with serious attempts to take a crude look at the whole. Like other social science disciplines, Comparative Education has become an

76

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

island unto itself in a high-tech environment, savvy about the particular, critical about the global, and ignorant about the humanity (Schäfer, 2001, p. 314). Without a thorough understanding of the new reality of contemporary knowledge production and politics, it has instead continued to locate what it studies in opposite positions, something that is both fallacious and intellectually misleading. Comparative and international education research has not been able to achieve its aims morally, theoretically, and practically. If the current state of affairs continues, comparative and international education researchers will never be able to understand the whole and are almost doomed to fail to achieve what they have long intended. Meanwhile, a comparative and international perspective has never been more important in an unprecedentedly connected world. Hallak (1991, p. 1) once remarked that ‘comparative studies – carefully designed, conducted and used – are more than ever necessary for the improvement of educational policy and decision-making.’ It is even more so today. With new changes in geopolitical relations and increasingly intensified globalization, the very conceptualization of problems in comparative research needs fundamental change (Crossley & Watson, 2003). Over two decades ago Wallerstein (1997, p. 22) stressed that ‘If social science is to make any progress in the twenty-first century, it must overcome the Eurocentric heritage which has distorted its analyses and its capacity to deal with the problems of the contemporary world.’ Likening it to a hydraheaded monster with many avatars, he claims that ‘It will not be easy to slaughter the dragon swiftly. Indeed, if we are not careful, in the guise of trying to fight it, we may in fact criticize Eurocentrism using Eurocentric premises and thereby reinforce its hold on the community of scholars’ (ibid., p. 94). The situation of Comparative Education has largely remained so. It continues to be preoccupied only with Western intellectual traditions as the benchmark to the exclusion of

others. An opening of the Western mind to these assumed-to-be alien traditions of sociocultural thought reveals that the purportedly competing and non-compatible traditions of thought might in fact have considerably more in common than what sets them apart, thus opening the way for an authentic intercivilizational dialogue that focuses more on cooperation and less on clashes (Bowden, 2009). Fortunately, over the past few decades we have begun to see a wave of decolonization in the academy across a wide array of disciplines, including education, with growing self-conscious rethinking and reorientation.

Notes 1  It needs to be noted that this survey has clear limitations. Its aim is not to provide a comprehensive survey of the existing English literature on East Asian higher education. Rather, it attempts to offer an example of some current literature to illustrate who are observing East Asian higher education developments and from what angle. 2  It is important to point out that with recent remarkable social development in major East Asian societies, a small number of (usually the best) local researchers have started to become more confident. At the same time, and quite unfortunately, there have been some signs of dangerous academic nationalism.

REFERENCES Appadurai, A. (1996). Modernity at large: Cultural dimensions of globalization. Minneapolis, MN: University of Minnesota Press. Appadurai, A. (2006). The right to research. Globalization, Societies and Education, 4, 167–177. Bernal, M. (1987). Black Athena. London: Free Association Press. Bickmore, K., Hayhoe, R., Manion, C., Mundy, K., & Read, R. (Eds.) (2017). Comparative and international education: Issues for teachers. Toronto: Canadian Scholars Press. Blaut, J. M. (1993). The colonizer’s model of the world: Geographical diffusions. New York: Guilford Press.

RIDDLED WITH GAPING WOUNDS

Bowden, B. (2009). The ebb and flow of peoples, ideas and innovations in the river of inter-civilizational relations: Toward a global history of political thought. In T. Shogimen & C. J. Nederman (Eds.), Western political thought in dialogue with Asia (pp. 87–107). Lanham, MD: Lexington Books. Burawoy, M. (2007). Open the social sciences: To whom and for what? Portuguese Journal of Social Sciences, 6, 137–146. Chaudhuri, K. N. (1990). Asia before Europe: Economy and civilization of the Indian Ocean from the rise of Islam to 1750. Cambridge: Cambridge University Press. Cheng, C. Y. (2007). Philosophical globalization as reciprocal valuation and mutual integration: Comments on the papers of Tang Yijie and Roger Ames. In D. H. Zhao (Ed.), Dialogue of philosophies, religions and civilizations in the Era of globalization (pp. 65– 76). Washington, DC: The Council for Research in Values and Philosophy. Crossley, M. (2002). Comparative and international education: Contemporary challenges, reconceptualization and new directions for the field. Current Issues in Comparative Education, 4, 81–86. Crossley, M., & Watson, K. (2003). Comparative and international research in education: Globalization, context and difference. London: RoutledgeFalmer. Deleuze, G., & Guattari, F. (1988). A thousand plateaus: Capitalism and schizophrenia. London: Athlone. Epstein, E. H. (2017). Is Marc-Antoine Jullien de Paris the ‘father’ of comparative education? Compare: A Journal of Comparative and International Education, 47, 317–331. Fatnowna, S., & Pickett, H. (2002). The place of indigenous knowledge systems in the postpostmodern integrative paradigm shift. In C. A. Odora Hoppers (Ed.), Indigenous knowledge and the integration of knowledge systems: Towards a philosophy of articulation (pp. 257– 285). Claremont, CA: New Africa Books. Fu, Sinian (2003). Review of Wang Guowei’s Song-Yuan xiqu shi. In Z. S. Ouyang (Ed.), Fu Sinian quanji [Collections of Fu Sinian] (pp.  1492–1494). Changsha: Hunan Education Publishing House. Fukuyama, F. (1992). The end of history and the last man. New York: Avon.

77

García-Canclini, N. (1989). Culturas híbridas: Estratgegias para entrar y salir de la modernidad [Hybrid cultures: Strategies for entering and leaving modernity]. Mexico City, Mexico: Grijalbo. García-Canclini, N. (1990). Cultural reconversion (H. Staver, Trans.). In G. Yúdice, J. Franco, & J. Flores (Eds.), On edge: The crisis of Latin American culture (pp. 29–44). Minneapolis, MN: University of Minnesota Press. Gell-Mann, M. (1997). The simple and the complex. In D. S. Alberts & T. J. Czerwinski (Eds.), Complexity, global politics, and national security (pp. 3–28). Washington, DC: National Defense University. Grigorenko, E. L. (2007). Hitting, missing, and in between: A typology of the impact of western education on the non-western world. Comparative Education, 43, 165–186. Gundara, J. S. (2000). Issues of discrimination in European education systems. Comparative Education, 36, 223–234. Gundara, J. S. (2014). Global and civilizational knowledge: Eurocentrism, intercultural education and civic engagements. Intercultural Education, 25, 114–127. Hallak, Jacques (1991). Educational policies in a comparative perspective: Suggestions for a research agenda. Paris: UNESCO International Institute for Educational Planning (IIEP). Hamashita, T. (1988). The tribute trade system and modern Asia. The Toyo Bunko, 46, 7–24. Tokyo: Memoirs of the Research Department of Toyo Bunko. Hannerz, U. (1987). The world in creolization. Africa, 54, 546–559. Higginson, J. H. (Ed.) (1979). Selections from Michael Sadler: Studies in world citizenship. Liverpool: Dejall & Meyorre. Huntington, S. P. (1996). The clash of civilizations and the remaking of world order. New York: Simon & Schuster. Hwang, K. K. (2016). From cultural rehabilitation to cultural renaissance. In C. P. Chou & J. Spangler (Eds.), Chinese education models in a global age (pp. 87–101). Singapore: Springer. Jullien, M.-A. (1816/1964). Jullien’s plan for comparative education (S. Fraser. Trans.). New York: Teachers College Bureau of Publications.

78

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Kraidy, M. M. (2002). Hybridity in cultural globalization. Communication Theory, 12, 316–339. Latouche, S. (1996). The westernization of the world: The significance, scope, and limits of the drive towards global uniformity. Cambridge: Polity Press. Lee, S. H. (2000). The rise of East Asia and East Asian social science’s quest for self-identity. Journal of World-System Research, 6, 768–783. Lederman, D. (2017). Higher Education’s ‘White Power’. Inside Higher Ed, 10 November, www.insidehighered.com/news/2017/11/10/ head-higher-ed-research-group-calls-outdominance-white-power (accessed 20 December 2017). Liang, M. (1921/1990). Substance of Chinese culture (SUBS-CC). In S. B. Zhang (Ed.), Liang Shuming quanji [Collections of Liang Shuming] (pp. 3–16). Jinan: Shandong People’s Press. Lundberg, A. (2013). Get connected! Collaborative adventures in networked spaces of learning. Proceedings of the Second International Conference of Emerging Paradigms in Business and Social Sciences (pp. 1–27). Dubai: Middlesex University. Marginson, S. (2014) Academic freedom: A global comparative approach. Frontiers of Education in China, 9, 24–41. Martín-Barbero, J. (1993). Communication, culture and hegemony: From the media to mediations. London: Sage. Mills, C. W. (1959). The sociological imagination. New York: Oxford University Press. Ng, S. W. (2012). Rethinking the mission of internationalization of higher education in the Asia-Pacific region. Compare: A Journal of Comparative and International Education, 42, 439–459. Odora Hoppers, C. A. (2009). Education, culture and society in a globalizing world: Implications for comparative and international education. Compare: A Journal of Comparative and International Education, 39, 601–614. O’Sullivan, E. (1999). Transformative learning. Toronto: University of Toronto Press. Paolone, A. R. (2016). Comparative education in an age of crisis: Challenges and opportunities in contemporary Italy. In A. W. Wiseman

& E. Anderson (Eds.), Annual review of comparative and international education 2015 (pp. 117–125). Bingley, UK: Emerald Publishing. Pieterse, J. N. (1994). Globalization as hybridization. International Sociology, 9, 161–184. Phillips, D. & Schweisfurth, M. (2008). Comparative and international education: An introduction to theory, method and practice. London: Continuum. Prakash, G. (1995). Introduction: After colonialism. In G. Prakash (Ed.), After colonialism: Imperial histories and post-colonial displacements (pp. 3–20). Princeton, NJ: Princeton University Press. Putnam, R. D., Feldstein, L. M., & Cohen, D. (2003). Better together: Restoring the American community. New York: Simon & Schuster. Robertson, S. L. (2011). The new spatial politics of (re) bordering and (re) ordering the stateeducation-citizen relation. International Review of Education, 57, 277–297. Schäfer, W. (2001). Global civilization and local cultures: A crude look at the whole. International Sociology, 16, 301–319. Takayama, K., Sriprakash, A., & Connell, R. (2017). Toward a postcolonial comparative and international Education. Comparative Education Review, 61, S1–S24. UNESCO (1998). World education report 1998. Paris: UNESCO Publishing. Wallerstein, I. (1974). The modern worldsystem: Capitalist agriculture and the origins of the European world-economy in the sixteenth century. New York: Academic Press. Wallerstein, I. (1995). ‘Capitalist civilization’, Wei Lun lecture series ii, Chinese University Bulletin, 23; reproduced in Historical capitalism, with capitalist civilization. London: Verso. Wallerstein, I. (1997). Eurocentrism and its avatars: The dilemmas of social science. Sociological Bulletin, 46, 21–39. Wallerstein, I. (2006). European universalism: The rhetoric of power. New York: The New Press. Wallerstein, I. et al. (1996). Open the social sciences: Report of the Gulbenkian commission on the restructuring of the social sciences. Stanford, CA: Stanford University Press. Walter, M. (2014). Social research methods. Melbourne: Oxford University Press.

RIDDLED WITH GAPING WOUNDS

Welch, A. R. (2003). The discourse of discourse analysis: A response to Nines and Burnett. Comparative Education, 39, 303–306. Wolhuter, C., Karras, K., & Calogiannakis, P. (2015). The Crisis in World Education and Comparative Education. Paper presented at the Annual International Conference of the Bulgarian Comparative Education Society, Sofia, Bulgaria, 10–13 June. Yang, R. (2006). What counts as ‘scholarship’? Problematizing education policy research in China. Globalization, Societies and Education, 4, 207–221. Yang, R. (2013a). Doing comparative education research in Chinese societies: Personal reflections. Paper presented at XVI World Council

79

of Comparative Education Societies World Congress 2013, Buenos Aires, 25 June. Yang, R. (2013b). Indigenizing the western concept of the university: Chinese experience. Asia Pacific Education Review, 14, 85–92. Yang, R. (2017). The cultural mission of China’s elite universities: Examples from Peking and Tsinghua. Studies in Higher Education, 42, 1825–1838. Zhao, D. X. (2015). The Confucian-legalist state: A new theory of Chinese history. New York: Oxford University Press. Zhao, L. W. (2016). China sociology yearbook 2011–2014. Beijing: China Social Sciences Press.

This page intentionally left blank

PART II

Measurement Methods in Comparative Education Research

This page intentionally left blank

5 Challenges in International LargeScale Educational Surveys F o n s J . R . v a n d e V i j v e r, N i n a J u d e and Susanne Kuger

CHALLENGES IN INTERNATIONAL LARGE-SCALE EDUCATIONAL SURVEYS International large-scale assessments (ILSAs) are coming of age. We focus in the present chapter on surveys that assess educational achievement as well as its context(s); context measures are devoted to information about backgrounds of students (such as age and gender), formal and perceived conditions of schooling/education characteristics (an example of the latter would be perceived teaching style) and, increasingly, motivation, interest, social-emotional skills, and personality. To refer to the latter, ‘non-cognitive variables’ has become the common term. Participants in these studies are usually one or more of the following groups: students, parents/caregivers, teachers, and school principals. The most prominent examples of ILSAs are: International Early Learning Study (IELS), Programme d’Analyse des Systèmes Educatifs de la CONFEMEN (the standing committee of ministers of education of francophone African countries; PASEC),

Programme for the International Assessment of Adult Competencies (PIAAC), Progress in International Reading Literacy Study (PIRLS), Programme for International Student Assessment (PISA) and PISA for Development, the Southern and Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ), Teacher Education and Development Study in Mathematics (TEDS-M), and Trends in International Mathematics and Science Study (TIMSS). Our chapter is relevant for a broader set of surveys, including national surveys of educational progress (such as the National Assessment of Educational Progress, NAEP, in the US) and international surveys that do not assess educational achievement but schooling aspects relevant for teaching and learning, such as the Teaching and Learning International Survey (TALIS). Data obtained in these studies tend to serve multiple aims. In addition to study-specific aims, there are usually two overarching aims: the obtained information should be valuable and meaningful from a policy perspective

84

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

and the data should be internationally comparable, complying with scientific criteria to establish this comparability. The present chapter describes the interplay of science and policy to achieve these goals. We first focus on a theoretical framework to describe comparability issues in ILSAs. We then describe conceptual frameworks, which are project-specific documents that describe the theoretical underpinning of the constructs of the study and their links. This is followed by a more detailed description of operational aspects of these projects, including measures taken to ensure data comparability and relevance for participating countries.1 We have extensive experience with all aspects of such projects, notably PISA and TALIS.

COMPARABILITY IN ILSAS: GENERAL FRAMEWORK Bias challenges the cross-cultural comparability of scores (van de Vijver & Leung,

1997). This comparability is an important aspect of educational surveys. One of the aims of such surveys is to inform educational policy in participating countries by providing opportunities to compare aspects assessed in ILSAs, such as educational achievement, teacher characteristics, and non-cognitive student characteristics, across countries. As a consequence, the usefulness of surveys for countries would be seriously challenged if constructs and/or scores would not be comparable. Comparability is tested in the last stages of an ILSA after data have been collected. Yet, comparability is at the forefront in all stages of a project; thus, indicators2 are included and formulated in a manner that minimizes the likelihood of bias. Bias is defined as nuisance (i.e., unintended, non-target) factors that challenge the validity of instruments applied in cross-­ cultural studies (van de Vijver & Leung, 1997) (see Box 5.1). If an indicator is biased, score differences on the indicator do not have the same meaning within and across

Box 5.1  Types of bias, equivalence and invariance  Bias Nuisance (i.e., unintended, non-target) factors that challenge the validity of instruments applied in crosscultural studies Construct bias The construct that is the target of the assessment has a different meaning across cultures Method bias Bias due to incomparability due to differences in sampling, administration of the test instruments, or mode of administration Item bias (also labeled differential item functioning) The most specific type of bias, means that the meaning of an item is different in at least one of the countries Equivalence, invariance Comparability of data Configural invariance Indicators measuring a construct cover facets of this construct in all cultures studied Metric invariance Indicators that represent a construct have the same factor loadings on the latent variable in structural equation modeling, or discrimination parameters in item response theory in all countries Full-score equivalence or scalar invariance Indicators have the same intercepts (i.e., point of origin) and the same metric across cultures and scores can be compared across cultures

Challenges in International Large-Scale Educational Surveys

countries. An important example (and a major threat of cross-cultural studies that are based on self-reports) are response styles (Paulhus, 1991); such a style refers to a tendency not to express one’s true view but to prefer a certain way of expressing oneself. For example, it has been documented that East-Asian students tend to avoid the extremes of Likert scales, whereas Central- and South-American students tend to use extremes more frequently (Buckley, 2009; Harzing, 2006; He et  al., 2017). Now, such a tendency would not be a problem if some indicators would ask for motivation and some for lack of motivation, such as an instrument in which half of the items refer to motivation and the other half to a lack thereof. However, many measures used in ILSAs, such as PISA, TIMSS, and PIRLS, employ construct measures in which all indicators are formulated in the direction of the construct to minimize cognitive load and speed up the administration process. As a consequence, the tendency to prefer a limited range of options in Likert scales (the middle or the extremes) is confounded with actual motivation scores; country differences in motivation scores can then not be taken at face value. The example illustrates that bias, if not appropriately taken into account, can be misinterpreted as real cross-cultural differences. When we discuss bias in this chapter, we integrate various concepts that have been proposed in the literature. Examples are nontarget factors, auxiliary traits, non-target variance or construct-­ irrelevant variance (e.g., American Education Research Association, National Council on Measurement in Education & American Psychological Association, 2014; Haladyna & Downing, 2004; Messick, 1995). The bias that is discussed here does not refer to fairness and equity, but more to lack of identical meaning of scores across cultures. Three types of bias can be distinguished, each with their own origin: the construct, assessment method, or items (Hambleton, Merenda, & Spielberger, 2005; van de Vijver & Leung,

85

1997). Construct bias indicates that the construct that is the target of the assessment has a different meaning across cultures. It has been argued, for example, that cultures differ in origin (and hence, indicators) of well-being. In individualistic cultures, well-being is more viewed as a personal characteristic that can be measured by the momentary or long-term evaluation of one’s life (Kitayama, Markus, & Kurokawa, 2000). In collectivistic cultures, however, well-being is more associated with embeddedness and belonging; as a consequence, indicators of well-being would need to be more related to satisfaction with relationships in collec­ tivistic cultures. If such a construct would be measured in a large-scale survey, the choice of indicators will be very important: Will individualistic or collectivistic indicators or a combination be chosen? So, the problem is not so much the (ir)relevance of well-being as a universal construct, but the challenge is more in the choice of indicators. Construct bias is usually not a problem in educational ILSAs for pragmatic reasons. Large-scale educational surveys do not tend to emphasize constructs that have indicators that can be expected to differ considerably across countries but tend to focus on domains that are relevant for all countries and that allow for meaningful country comparisons. Thus, if topics that were originally identified as universally relevant for learning and teaching tend to be relevant for only a subset of the participating countries’ educational policies, they might be rejected for inclusion in the study. Recently, some ILSAs, like PIAAC and PISA, offer so-called national optional questionnaires for countries to choose from a variety of topics that might support their current policy needs (Jude & Kuger, 2018). This issue of relevance for all countries will become more pressing as more low- and middle-income countries are involved in ­ ILSAs (Serpell, 2007). Method bias means that there is incomparability due to differences in sampling, administration of the test instruments, or mode of

86

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

administration. Sampling issues are unlikely in large-scale educational surveys as much attention is paid to the sampling frame and probability sampling is the norm. Coverage could become an issue if the school-going population is systematically different across countries, due to systemic factors such as differential school dropout (for coverage of sampling rates in different countries, see for example, OECD, 2017). In particular, when low- and middle-income countries are involved, coverage could become an issue. Recently, PISA for Development was launched (Ward & Zoido, 2015; Willms & Tramonte, 2015). The project focuses on low- and middle-income countries so as to make PISA more inclusive; educational performance, background characteristics of students, parents/guardian, and schools, and their associations are examined. The project creates unique challenges, such as the need to adapt measures (e.g., the measure of appliances in the household had to be adapted) and the need to reach (15-year old) adolescents who are no longer school-going. The most important source of method bias in background questionnaires in ILSAs comes from response styles. Below we describe procedures to deal with response styles in more detail. Item bias (also labeled differential item functioning), the most specific type of bias, means that the meaning of an item is different in at least one of the countries. A more precise, psychometric definition specifies that an item is biased if persons with the same trait, but coming from different cultures, are not equally likely to endorse the item (Holland & Wainer, 1993; van de Vijver & Leung, 1997). The main reasons for item bias in the background questionnaires of educational surveys are linguistic (e.g., poor translation) and cultural (e.g., differential applicability of item contents across cultures). What do we mean with this comparability, also referred to as equivalence and invariance? Three levels of equivalence, the more

common term for this comparability, have been proposed (Milfont & Fischer, 2010; van de Vijver & Leung, 1997). The three levels have testable, statistical definitions and refer to the question of whether constructs, measurement units (such as the units of a Likert or frequency scale), or full scores can be compared. Configural invariance means that indicators measuring a construct cover facets of this construct in all cultures studied. In statistical terms, indicators of a construct exhibit the same configuration of salient and non-salient factor loadings across cultures. If a measure reaches configural equivalence, it indicates that the scale measuring the construct refers to the same concept and has the same elements across cultures (Segeritz & Pant, 2013). Establishing configural invariance is often a first step in the statistical analysis of cross-cultural data to ensure comparability. If confirmed, configural invariance implies that the construct(s) measured can be meaningfully applied in all participating countries. Metric invariance suggests that indicators that represent a construct have the same factor loadings on the latent variable in structural equation modeling, or discrimination parameters in item response theory in all countries. This type of invariance means that if a Likert scale has been used in a crosscultural study, the scale units can be compared across the countries (e.g., the distance between ‘completely disagree’ and ‘disagree’ is identical across countries). Though applied less frequently, invariance issues are also relevant in frequency response scales, for example with indicators of the type ‘How often do you…?’. Equivalence analyses of scales in ILSAs show that metric invariance is a common finding, which supports the assumption that correlations of constructs can be compared across countries, but means cannot. Returning to our example about response styles, metric invariance of a motivation scale would indicate that motivation scores, such as country averages, cannot

Challenges in International Large-Scale Educational Surveys

be compared across countries but that correlations between motivation and achievement can be compared across countries. Metric invariance is a common finding in ILSAs. However, for comparative purposes, scalar invariance is often needed. Without scalar invariance, league tables are invalid. The highest level of equivalence is called full-score equivalence or scalar invariance. This level of equivalence is confirmed if indicators have the same intercepts (i.e., point of origin) and the same metric across cultures. Observed and latent scores can be validly compared across cultures only in the case of scalar invariance (Matsumoto & van de Vijver, 2011). Full-score equivalence is often considered the aim of comparisons in ILSAs; yet, it is important to note that fullscore equivalence is often established in cognitive test scores only and hardly ever found in ILSAs’ questionnaire data. The main reason is the restrictive nature of, notably, the requirement of identity of intercepts. Empirical studies have shown that intercept differences, indicative of item bias, are difficult to avoid in comparisons involving many countries; for example, slight differences in connotations of words in different languages can create item bias (for other treatments of measurement invariance, see Dimitrov, 2010 and Nagengast & Marsh, 2014). The statistical procedures used to establish the existing degree of comparability differ somewhat across projects and constructs, mainly depending on which type of data are analyzed (achievement versus context scales). Achievement items are frequently analyzed using item response theory, whereas questionnaire scales are usually analyzed using confirmatory factor analysis, using linear modeling. In the last years there is a growing tendency to analyze Likert data using categorical models, which often amounts to the use of some form of item response theory (e.g., Glas & Hendrawan, 2005; Liu, Wilson, & Paek, 2008; Buchholz & Jude, 2017). In all analyses of ILSA data, equivalence plays a crucial role, which is well documented in the

87

technical reports. This attention is already present in the analysis of Field Trial data. Expert groups discuss equivalence issues extensively and decisions about which items to retain and which to skip after the Field Trial are based on various considerations (see Kuger, Klieme, Jude, & Kaplan, 2016), with equivalence and other psychometric characteristics, such as internal consistency, playing a pivotal role. The importance of equivalence tests could be easily underrated by reading project summaries for policy makers and the public as these usually do not pay much attention to equivalence issues (as being too technical). If the statistical procedures to examine invariance are implemented, it is rather common in ILSAs to find that scalar invariance is not supported. Various ways have been used to deal with this issue. The first and most popular is to skip items that are biased so that the remaining items provide a score that is more comparable across countries. The approach is not without limitations. It is often unclear from a theoretical perspective why specific items would need to be removed. In many instances so many items are biased that the remaining items may provide a poor rendering of the underlying construct; a mechanical removal of biased items may yield scales that show a poor coverage of the underlying construct. Another approach is to interpret the bias as a source of country-specific information that requires further examination (Poortinga & Van der Flier, 1988). The approach is never employed in ILSAs. Another approach is to relax the statistical criteria to evaluate the fit of models (Buchholz & Jude, 2017). There is some preliminary evidence that shows that current criteria for good fit in multigroup confirmatory analytic models are too strict (e.g., Rutkowski & Svetina, 2014, 2017). The problems to adequately deal with these invariance challenges in ILSAs has led to work on new designs that set out to overcome limitations of existing measures, such as response styles. We review such design adaptations below.

88

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

IMPLEMENTATION OF A COMPARABILITY FRAMEWORK All ILSAs nowadays base their instrument development and study implementation on detailed frameworks (for example, Mullis, Martin, Kennedy, Trong, & Sainsbury, 2009, describe the PIRLS 2011 framework). A framework defines and is implemented on different levels. On the policy level, the framework mandates the particular assessment activity and ideally defines the goals and stakeholders involved (Clarke, 2012). International comparability is usually achieved on policy level, because most countries implement ILSAs for the same reasons, need similar information for policy making and education systems world-wide have similar stakeholders. On the measurement level, the framework describes what is being assessed (Kellaghan, Greaney, & Murray, 2009), including knowledge areas as well as context indicators. The framework can also describe expected associations between the constructs measured. A comparability framework is usually developed from scratch for each wave of a study or builds on frameworks from earlier waves. New frameworks need to incorporate recent developments in educational contexts and policies (Jude, 2016). Thus, expert groups for each cycle of a study use existing frameworks as a point of departure and build upon them to write a new version. The process entails a review of current policy interest to be addressed, with feedback from stakeholders involved in education in all participating countries. Based on this feedback, the framework specifies relevant context indicators that can relate to models of educational effectiveness to be addressed in the study, including dependent and predictor variables. Separate frameworks are developed for the cognitive domains (e.g., reading literacy) and questionnaires (see for example, Mullis & Martin, 2017). Yet, a certain integration and coordination are required as context questionnaires need to be able to contribute to the analysis of

achievement results. The development of the conceptual framework is a complex process as it needs to integrate a theoretical underpinning of the study with interests of specific stakeholders (e.g., a strong policy interest in a specific aspect of teaching or mathematics instruction). Conceptual frameworks are usually developed in an iterative fashion, in which an expert group is responsible for the formulation of a sound scientific basis of the study, followed by various feedback rounds by all stakeholders. A first version of the conceptual framework is usually available well before the pilot testing, but the final version may only be available well after the Field Trial. Important strategic decisions are specified in the framework, including: • The definition of the construct(s) to be measured: Can the construct be defined and adequately operationalized in all participating countries? For example, is sense of school belonging a concept that can be meaningfully applied and measured in all participating countries? • A specification of the measurement approach: Which format should the test employ? Should a standard self-report be used to assess interest in science or should a forced-choice method be used to minimize effects of social desirability and other response styles? • A specification of the method of analysis to determine the extent of comparability of the results: Which statistical procedures (e.g., structural equation modeling, using a linear model or using a categorical model or item response theory) will be used to establish comparability and what will be done if comparability is not fully supported, as typically observed in largescale surveys?

In most cases, conceptual frameworks do not elaborate on operational aspects of a study, such as the specific item development process or translation procedures. Yet, these frameworks play a crucial role in projects as they ensure that all stakeholders, who are involved in the iterative process, agree on strategic choices made, which have major implications for the resulting data and their comparability (see also OECD, 2015).

Challenges in International Large-Scale Educational Surveys

Detailed information on technical issues are documented in the technical reports of the studies (including the scaling approaches, see for example, Martin, Mullis, & Hooper, 2016; OECD, 2014), and for PISA in a recent publication by Lietz, Cresswell, Rust and Adams (2017). To keep ILSAs manageable and yet retrieve relevant and comparable information, projects standardize their frameworks and assessments to a maximum degree and adhere to an etic methodology, which means that the same questions are asked in all participating countries, with minimal need to accommodate complex translation and cultural issues. The pilot testing, sometimes combined with cognitive interviewing (Willis, 2004), and a Field Trial, combined with extensive feedback from participating countries (usually from National Program Managers), are used to develop and fine-tune indicators so as to ensure that item sets do not require extensive adaptations, which would challenge comparability. Frameworks thus define a priori what aspects of outcomes, student learning domains, and learning contexts should be considered in the assessment and how they should be measured (thereby influencing the scope for later interpretation). A more emic approach, i.e., studying learning in its contexts while explicitly avoiding any prior theoretical, global mindset, is perceived to yield less comparable information, particularly because too many different stakeholders (e.g., policy makers, school administration, parents, students, teachers, principals and researchers) could try to influence data analyses and interpretation. Thus, the involvement of countries in the framework development, as mentioned above, serves as a safeguard to ensure the up-to-date nature of educational topics or curriculum approaches in different nations that then are harmonized through the frameworks to deliver comparable indicators. The focus on etic measures and a framework that is relevant in all countries can be easily misconstrued as endorsing a viewpoint that

89

education is only about universal issues of teaching and learning. Rather, the etic perspective is more a consequence of the need to make international comparisons possible; by definition, culture-specific educational aspects cannot and do not need to be compared across countries. So, the focus on etic aspects is pragmatically and not ideologically based. An important aspect of all studies is to find a common overarching theoretical approach to motivate the inclusion of constructs in the framework. Most school achievement studies (e.g., PIRLS, PISA, and TIMSS) therefore refer to the overarching idea of educational effectiveness, a generic term for factors that make a school a good school, and set up frameworks that aim at covering all aspects that facilitate countries’ comparison regarding the respective effectiveness of their educational systems (Creemers, 2006; Dronkers & Avram, 2009; Reynolds et  al., 2014; Scheerens & Bosker, 1997). Such frameworks include constructs that have been shown to relate to effectiveness in education across multiple cultural settings and across time. This approach may lead to the inclusion of constructs that are seemingly irrelevant in a particular country (which could be the case when a certain measure is not implemented at all), but which are very important in other contexts and could have a powerful impact when implemented. One example is tracking in secondary schools, which is a highly powerful factor in some educational systems, while non-existent in others; other examples are home-schooling, charter-schools, remote schooling, boarding schools, and private additional tutoring such as cram schools. The inclusion of such relevant constructs that do not apply to or exist in all countries still requires a common understanding of their meaning. A typical example for non-equivalence in meaning is ‘public schools’, which are vastly different in England and Wales than in most other countries in the world. These examples underline how important it is to set up frameworks that

90

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

strictly adhere to the idea of international comparability on the conceptual and operational levels. Depending on the frameworks and their intended policy targets, differences can be demonstrated when comparing existing international large-scale surveys mainly contributed to by OECD countries: • TIMSS follows a curricular approach, thus resulting in a grade-based sample and tests for mathematics and science every four years since 1995. • PISA implements a competence-based approach, assessing reading, mathematical and science literacy every three years since 2000 in an age-based sample, including a new innovative domain for each cycle. • TALIS assesses information on professional development, teaching beliefs and practices, as well as assessment in schools through questionnaires for teachers and principals on a five-year basis since 2008. It is likely that the next TALIS wave, which would be for 2023 if the five-year schedule would be maintained, will not take place in 2023 but in 2024. The study will then be integrated with PISA scheduled for 2024. This integration would mean that TALIS (with its emphasis on teachers and principals) and PISA (with more emphasis on students, although data of parents, teachers and principals are also considered) are involving the same schools.

Klieme (2016) highlights the following five areas of differences between PISA and TIMSS: (1) The curriculum approach of TIMSS that is supposed to be valid across countries and thus includes indicators focusing more strongly on knowledge versus the life-skill approach of PISA that is reflected in the indicators being embedded in real-world problems; (2) the grade-based selection of whole classes (TIMSS) versus the age-based selection of 15-year old students in schools and different grade levels (PISA); (3) the participation of different OECD and nonOECD countries in both studies; (4) the mode of assessment on paper (TIMSS) and computer (PISA) (Alexander et  al., 2017); (5) the scaling approaches for both cognitive and questionnaires that differ between both

studies, with TIMSS using a one-parameter model and PISA using a Generalized Partial Credit model and a more comprehensive approach for scaling the trend across cycles. More detailed information about school effectiveness can be gained in studies that incorporate video analyses. As part of TIMSS 1995, a video study with more than 200 randomly selected eighth-grade mathematics lessons in three countries was conducted focusing on actual classroom instruction. As an extension, the TIMSS 1999 Video Study focused on eighth-grade mathematics and science teaching in seven countries. The study involved videotaping and analyzing teaching practices in more than 1000 classrooms (Jacobs, Hollingsworth, & Givvin, 2007). To date, these two video studies are the largest of their kind in the field of educational assessment. The TALIS Video Study that is currently in preparation (see Praetorius et al., this volume) will explore classroom teaching and methodologies to capture real teaching practices across the participating countries. Thus, this new approach aims at combining teacher-based surveys with student assessments (OECD, 2016b). While those large-scale assessments mentioned above originated from initiatives in OECD countries, other international studies set out to accommodate policy needs of non-OECD countries. One example is the Southern and Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ) that aims to specifically assess the conditions of schooling and performance levels of learners and teachers in its member countries since 1995 (ACER, 2015). Its purpose equals those of the OECD studies regarding information for policy decisions, but also focuses on capacity building for educational researchers in those countries to implement evaluations and analyses of the results (Murimba, 2005). A similar approach can be found in the recent initiative of the OECD that launched in 2014 the so-called ‘PISA for Development’, an ILSA for low- and

Challenges in International Large-Scale Educational Surveys

middle-income countries. In parallel to the PISA assessment, PISA for Development is currently implemented in eight participating countries mainly in South America and Africa. It aims to provide policy insight for teaching and learning, along with capacity building in ILSAs for national purposes. Furthermore, the long-term aim is to enhance the regular PISA study by including those countries that successfully implemented PISA for Development (OECD, 2016a) into future PISA cycles. However, it still needs to be seen if the aims of PISA as a study originating in the OECD member states and the respective resulting frameworks can fulfil the intended goals of educational monitoring and capacity building for countries that might need information on different or more country-specific context indicators.

CONSEQUENCES FOR STATE-OF-THEART EDUCATIONAL ILSAS Challenges in international assessments can arise on different levels that should all be covered in the respective frameworks, including challenges at the policy level, challenges in research methodology, and challenges in the interpretation of results. Each ILSA wave is a hugely complex undertaking, involving many stakeholders, participating countries, and spanning several years. Quality assurance is key from the very beginning to the very end (Jude & Kuger, 2018). In the beginning of the project, the emphasis is often on the development and refinement of the conceptual framework, whereas more operational aspects tend to become more important in later stages. It is our experience that project time lines are rather rigid and that milestones such as dates of data transfer to the group of reporting analysts and countries or release of the technical reports are reached as planned many years before. In this section we describe the challenges of each phase. This section provides some more details on

91

typical measures that are introduced to assure international comparability and acceptance in most ILSAs.

Policy The link between ILSAs and policy requires closer scrutiny as the link can be addressed from different angles. The first is the influence of policy makers on ongoing projects. Participating countries have an important say in each wave. Projects such as PISA and TALIS have a governing board, the formal meeting of representatives of the OECD’s secretariat (usually the managers and analysts involved in the project), representatives of the participating countries (usually managers of ministries of education responsible for the project), and representatives of the contractors (educational researchers and analysts of the institutes and company/companies responsible for designing and conducting the study). The participating countries take the final decisions about all aspects of the study in meetings of the board. Contractors and secretariat have prepared proposals and decisions (usually after extensive discussions among each other and with expert groups). During these meetings the scientific and policy aspects of the projects are both considered. For example, in early stages of the PISA and TALIS projects, there is a priority rating exercise in which countries rank a number of topics that could be part of the new cycle. These ranks reflect the relevance of a topic for specific countries. A topic like migration may be deemed relevant by only some countries, which could imply that its overall assessment gets a low priority rating. A solution that is sometimes chosen in cases where constructs are very high on the agenda of only some countries is to turn the assessment of the construct into a national option, which means that countries can opt in or out (but that all countries opting in administer the same measure). These priority ratings play an important role in the remainder of the

92

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

project and help to decide which constructs (not) to assess in the project. It is important to note that the dialogue between science and policy is at the heart of ILSAs and that the final, implemented measures reflect both considerations. The second angle is an evaluation of the policy impact of ILSAs. Not surprisingly, large-scale evaluations have come under scrutiny. We focus here on the PISA project as it is the most widely discussed study. In our view, three types of concerns have been expressed. In our presentation of these concerns, we mainly refer to sources in the literature. However, it is important to add that these issues are also discussed when the projects are carried out (such as in meetings with expert groups) and that some of these concerns are also addressed in reports, notably the technical project reports. The first type of concern is technical. We list a snapshot of the technical concerns that have been raised: PISA is a correlational study and cannot claim causality (e.g., Fernandez-Cano, 2016) (interestingly, this concern is emphatically supported by consortia and expert groups; we deal with this issue in more detail below); the reliability, validity, comparability of constructs or statistical analyses reported are inadequate (e.g., Goldstein, 2004); countries differ in eligible student populations (e.g., special needs students may be in scope or out of scope); lack of convergence across projects on how to measure pivotal concepts such as the opportunity to learn (Suter, 2017); ‘PISA is assessing knowledge and skills for PISA—and that alone’ (Dohn, 2007, p. 10; see also Zeidler & Sadler, 2011), which addresses the concern that PISA is often described as assessing life skills where the indicator contents refer more to an academic context than to real-life situations. The so-called innovative domains in PISA are covered by additional instruments that address, among other topics, financial literacy, global competence, and creativity. These content areas typically have a clearer link with everyday life than some of

the reading, math, and science indicators. It is our experience in these projects (and our reading of the literature) that some of these issues are seriously addressed (e.g., equivalence analyses and avoidance of causal language in reports are issues that are taken very seriously), whereas the link between indicator contents and daily life is much more difficult to implement as these adjustments will be hard to reconcile with the principle that the same indicators are used in all countries. A second type of concern deals with what is done with PISA scores vis-à-vis policy (Suter, 2011). PISA has been highly influential in informing educational policy (Grek, 2009). Rank orders of countries on core vari­ables (the so-called ‘league tables’) are widely publicized in media even if technical experts systematically warn against the straightforward usage of such rankings given the absence of full-score equivalence of scores that are the basis of these league tables. Also, it has been argued that educational reform has been based on PISA findings where the link between scores and educational reform was difficult to understand (Gür, Çelik, & Özo˘glu, 2012, describe such an example in Turkey). In short, policy implications and findings are sometimes hard to reconcile (Teltemann & Klieme, 2016). The third type of concern is a consequence of the combination of novelty and impact of PISA. In retrospect, the rise of PISA and the rise of globalization are related. PISA and TIMSS were the first studies that compared educational systems in a systematic manner and this formula turned out to be highly successful (Grek, 2009; Rautalin & Alasuutari, 2009). However, with the desire to compare educational output came the wish to use identical indicators in all participating countries (i.e., stimuli that can explain learning in all countries). This ‘one-size-fits-all’ philosophy has drawbacks. Educational needs in lowand middle-income countries do not need to be the same as those in more affluent countries. It could be argued that probability sampling in countries that are very different in

Challenges in International Large-Scale Educational Surveys

educational expenditure, dropouts, access to higher education, and labor market prospects of adolescents will yield representative, yet incomparable samples and that educational needs will vary accordingly. The question can be asked ‘whether we are really comparing like with like’ (Forestier & Adamson, 2017, p. 364). One approach to deal with this shortcoming of ILSAs is to organize relevant topics in thematic clusters of policy interest and to implement several measures per cluster. Thus, the context questionnaires can hold modules or options with several indicators describing similar or parallel processes in different cultural regions or subsets of countries. Data analyses can then use, say, one indicator for school accountability for a certain subset of countries while another indicator is more meaningful in another subset of countries. The PISA 2015 framework started a classification of topics into policyrelevant areas, and consequently developed additional measures for areas of high relevance (Kuger et  al., 2016). PISA 2018 offered several additional questionnaires as national options, allowing countries to assess context information from different stakeholders and addressing additional topics. However, the aim of international comparisons can only add to the national needs for educational monitoring (and cannot substitute these). For policy makers, globalization in education has probably never been associated with a one-size-fits-all approach, but projects like PISA tend to emphasize commonalities and to downplay countryspecific aspects.

PREPARATION OF FRAMEWORKS We start with the process that should lead to international consensus on the framework and operational procedures. The procedures might vary in line with the purposes of different studies, but there is great overlap in

93

general steps and milestones of preparation and operation. As mentioned above, important questions in the preparation of the framework are, for example: Which outcomes are relevant and possible to assess in this kind of study? Which aspects of the learning contexts are relevant and possible to assess? Who are the most knowledgeable and reliable sources of information in the field? There are marked differences between studies in how they answer these questions (see above). Most studies with a repeated assessment design (e.g., PIRLS, PISA, and TIMSS) have an additional goal of reporting on trends at system level; whether a construct or indicator has been used before plays an important role in the discussion about maintaining or changing it. Core aspects of measures often remain unchanged over cycles to enable trend analyses. Repeated measurements of constructs and comparability of assessments over time are therefore often a priority (Kuger et  al., 2016), although there is growing appreciation of the need to update indicator contents and the opportunities that modern statistical procedures such as item response theory provide to deal with instruments that are not entirely identical across waves. In addition, there are often political interests involved – policy makers have preferences that vary across time and countries. That is the reason why these decisions are typically negotiated between the many involved parties and groups of experts, from a number of different countries and interdisciplinary fields, involved in framework development. These experts represent different cultural regions of the world and work in various disciplines, such as developmental psychology, cognitive psychology, problem solving, didactics, education sciences, economy, and sociology. Different expert groups discuss the frameworks for the cognitive test domains and the context assessment and consult each other; for example, the mathematics expert group defines a framework specifying an assessment of mathematics

94

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

skills, competencies, or school achievement; the expert groups may receive consultation from experts in problem solving, reading, or general school didactics (for further details, see Jude, 2016). The framework is then used to develop assessment materials (i.e., cognitive tests and background questionnaires). In addition to trend indicators, developers often create new material to cover innovative cognitive domains or new demands in reporting topics, and to adapt the material to new assessment formats (e.g., when a study is moved to a computer-based assessment) as the respective frameworks are being updated. These novel features always undergo empirical testing in a Field Trial. Field Trial instruments contain more, in some cases many more, items that can eventually be used, and usually some methodological studies to ensure comparability. Field Trial data are used to examine psychometric properties and establish which constructs can be measured, which items in scales can be skipped without loss of good psychometric properties, and to evaluate the performance of the instruments (e.g., descriptives and internal consistencies) in each country. Even though this information would be highly informative for future rounds of the ILSA, results of the Field Trials usually remain unpublished. One notable exception is the publicly available documentation of PISA Field Trial questionnaires (Kuger et al., 2016; accessible at http://daqs. fachportal-paedagogik.de/search/show/ survey/177). With such a procedure, the questionnaire design balances countries’ interests and needs, theoretical research knowledge, and study operations. Other studies handle countries’ needs differently. Thus, the SACMEQ studies are implemented in vastly different educational systems. Eventually the international consortium adapted assessments to national particularities to such a degree that it resulted in several individual assessments with only very little overlap in assessments across countries.

INSTRUMENT PREPARATION AND ADAPTATION Following the framework development, experts in the fields are asked to prepare the assessment instruments and countries and international experts must come to an agreement as to which instruments are most suitable to assess the domains and constructs defined in the framework(s) in the most appropriate way for the target sample in all participating countries. The degree of country involvement in this step varies from study to study and depends on the countries’ capabilities and study goals. For studies with an emphasis on trend reporting, a great share of indicators for any assessment cycle is chosen from previous cycles. Importantly, sometimes existing indicators are outdated (e.g., questions on participants’ computer use) so that new assessment material needs to be developed. All newly developed material must fulfill several criteria. It must be politically and culturally appropriate to be presented to participants in all regions (which means that the materials must respect local religious, dietary, cultural norms and values, and comply with local laws) and must be presented in a way that can be translated into all relevant target languages and formats (e.g., left-toright, right-to-left, and top-to-bottom writing directions of languages). To achieve this goal, item developers take several measures. They involve experts from different regions of the world in the very first steps of indicator development and then conduct in-depth focus group discussions in different countries to make sure that members of the target sample understand a draft measure similarly in all cultural regions. For several questions in the context questionnaires there can be options to add national indicators that may be used to add more information on topics that are of particular relevance for some countries only or that may help to better explain country differences (e.g., indicators on migration

Challenges in International Large-Scale Educational Surveys

background, cultural wealth, or particular educational pathways). On the language side, great progress has been made in the last decades. ILSAs almost never use simple translation and back-translation procedures (Brislin, 1973), but use more sophisticated procedures that are better able to integrate the perspectives of the multiple countries involved. Translatability of a word or text and its assessment have become more salient; the concept of ‘translatability assessment’ is gaining salience as a way of addressing the question of whether linguistic and cultural features can be retained when rendering an indicator in another language (Conway, Acquadro, & Patrick, 2014; Dept, Ferrari, & Halleux, 2017). One important linguistic question that is rarely addressed in research is the effect of translating answering scales (typically ranging from ‘totally agree’ to ‘totally disagree’ with various intermediate anchors) that might influence scaling and invariance, as mentioned above (cf. De Jonge, Veenhoven, & Kalmijn, 2017; Weijters, Cabooter, & Schillewaert, 2010; Yang, Harkness, Chin, & Villar, 2010). After instrument development, one of the most important steps in balancing international comparability and accounting for countries’ individual educational systems is (1) instrument adaptation and (2) translation. These two steps ensure that the tests and questionnaires assess internationally comparable data, but still accommodate national, culture-specific particularities and are presented in a language known to participants (see, for example, the International Testing Commission (ITC) guidelines for translating and adapting tests; International Journal of Testing, 2018). The following paragraphs provide a brief overview of some procedures involved. While ILSAs intend to assess internationally comparable data and provide comparisons between educational systems, many cultural differences can be found in the national realizations of how schooling

95

and education are enacted in local settings. One prominent example in PISA would be the question asking about parental support at home, featuring an item on whether parents and students share the main meal. The term ‘main meal’ would be adapted to the countries’ particulars, such as breakfast or dinner. This step of adapting the international instruments to local conditions, cultural norms, and meaningful examples is called localization of materials (Upsing, Gissler, Goldhammer, Rölke, & Ferrari, 2011) and precedes instrument translation. During this phase of instrument preparation, all countries suggest adapted versions for the assessment instruments and engage in finding the best possible solution with the international consortium as well as an additional native speaker and expert of the education system in the target country. The goal of this negotiation process is to ascertain that a student that reads an item in cultural region A understands a maximally similar concept as a student in cultural region B. After approval of these adaptations, the instruments are translated into the target language. This process again consists of several steps of translation into the target language, back-translating into the language of the source version, and again translating into the target language as well as a consecutive debate that resolves any inconsistencies. These steps again are negotiated and agreed upon between the countries’ national management teams, the international consortium, and independent native speakers that are experts in either the testing domain or the local educational system (Dept, Ferrari, & Halleux, 2017). As a result, all test and questionnaire materials for each study are implemented in the different target languages and are culturally appropriate for different regions of the world (for an example, see the localized and translated questionnaire material for PISA 2015 at http:// daqs.fachportal-paedagogik.de/search/show/ survey/177?language=en).

96

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

DESIGN CONSIDERATIONS AND ITEM FORMATS TO INCREASE COMPARABILITY Different design-based approaches have been developed to enhance the comparability of constructs and scores. These approaches try to reduce or eliminate cross-cultural differences in scores that are influenced by nontarget constructs, notably response styles. We describe three approaches that have been used frequently in cross-cultural comparisons. The first is within-subject standardization (Fischer, 2004). Computationally, the procedure is straightforward: each response of a scale of an individual is corrected by subtracting the global mean of the individual from each score; in this procedure each individual will have the same mean of zero. The procedure is frequently employed in crosscultural value research (e.g., Schwartz, 1992). The adequacy of the approach hinges on what economists call a ‘fixed pie’ argument. In the context of values assessment, the argument says that persons cannot differ in their endorsement of all values together and that they can only differ in endorsement of specific values. For example, a person can score high on modernity whereas another person scores low on modernity, but on a comprehensive test of values, the global mean should be the same for all persons. This type of score correction is never used in educational ILSAs, because the fixed pie argument almost never holds in the assessment of the non-cognitive characteristics used in ILSAs. For example, scales of motivation typically have all indicators formulated in the same direction and the fixed pie argument (holding that each person should have the mean score) is then clearly invalid. An alternative to this within-subject standardization, somewhat akin to the fixed pie argument, is the use of forced choice. Students then make a number of choices between two or more alternatives and the total score is derived from the relative

preferences; the total number of choices is identical across respondents, but relevant individual differences can be derived from preferences for certain types of choices. In the PISA 2012 Field Trial, learning strategies were assessed using Likert-type scales and also using forced choices, in which students had to indicate their preference for a learning style, thereby making comparisons between three styles (Kyllonen & Bertling, 2014). The differences in correlations with achievement at country level were remarkable. For the Likert scores a negative correlation was found whereas a positive (and expected) correlation was found for the forced choice format. It may be noted that forced choice items are easy to implement and do not require much more testing time, so the advantages of forced choice are achieved at a relatively low cost, assuming that an adequate set of indicators can be generated. The third procedure involves so-called ‘anchoring vignettes’ (King & Wand, 2007), referring to short descriptions of hypothetical people’s lives relevant to the domain of interest. Participants are presented with descriptions of hypothetical persons – often on their attitudes, behaviors, or other characteristics that are relevant to the study topic. (Salomon, Tandon, & Murray, 2004; see also van de Vijver & He, 2016). The anchoring vignette approach requires vignette equivalence, which is the assumption that respondents in all cultures interpret the vignettes in the same, intended way. So, anchoring vignettes are assumed not to show any form of bias. This latter assumption has not been extensively tested in a cross-cultural framework and may indeed be hard to achieve in ILSAs; a recent cross-cultural study involving students from 16 countries found strong evidence that anchoring vignettes did not show full-score equivalence (He et al., 2017), which would threaten their adequacy as the basis of a score-correction mechanism (see also Marksteiner, Kuger, & Klieme, 2018). In the PISA 2012 Field Trial, anchoring vignettes were used in the Teacher Support

Challenges in International Large-Scale Educational Surveys

97

scale (Kyllonen & Bertling, 2014). The correlation with achievement was +.03 at individual level and -.45 at country level; after correction, these values were +.13 and +.29, respectively. Obviously, the positive correlations, based on the anchoring vignettes, are more intuitive. Despite these promising results, it is unlikely that anchoring vignettes will be implemented at a large scale in ILSAs in the future. First, vignette equivalence will be a challenge in ILSAs. Second, anchoring vignettes increase the assessment time and reading load of survey instruments, which form an operational challenge, given that reading skills vary considerably within and across countries in ILSAs and reading load differs across languages (items are systematically shorter in some languages than in others). Third, league tables based on anchoring vignettes often look very different from league tables based on conventionally scored Likert scales. There is no systematic evidence that shows that validity of scores increases or decreases when anchoring vignettes have been used. He et  al. (2017) did not find a systematic change in correlations of scores based on anchoring vignettes with other scales including correlations with raw scores or scales also based on anchoring vignettes. So, the issue of the impact of anchoring vignettes on validity is unresolved, although it is clear that the strong increase in validity, which might have been expected, has not been confirmed.

frequent steps and consequences for preparing the international data sets:

STUDY IMPLEMENTATION AND ANALYSES

The analysis of Field Trial data is the most crucial part in an ILSA as it is designed to improve the measurement quality of all assessment instruments. Therefore, data need to be analyzed per country and across all countries. Based on the theoretical framework, scales are being identified and analysis procedures are specified in the so-called analysis plan. Starting with descriptive analyses to identify distribution and missing values, latent constructs are also evaluated. Depending

Just as for the different stages of study preparation, study managers intend to avoid as much bias as possible during the study implementation phase. Several quality assurance steps and measures are available to standardize study implementation, data collection and analyses to a maximum degree. This section explains some of the most

(a) Standardization of test administration: With the transition from paper-based to computer-based assessment in recent ILSAs, standardization of the assessment has become much easier. The layout of instruments, the navigation between test indicators and the assessment time can be monitored or even managed centrally across all countries. Moreover, test-taker behavior can be monitored in retrospective by analyzing the logfiles (unobtrusive measures of response latencies collected while the questionnaires are filled out), identifying cases with issues in engagement or test administration (Christoph, Goldhammer, Zylka, & Hartig, 2015; Goldhammer, Martens, Christoph, & Lüdtke, 2016). (b) Data entry, cleaning, and post-processing: Using computer-based assessment platforms, processing of the data becomes easier, as data are exported from the assessment platform into standardized data management systems. Data are usually cleaned and processed in the countries’ national centers, which includes coding of open-ended responses by national experts. Data quality is then checked by the country together with the international contractors and compared against technical standards, for example by analyzing missing values and sample sizes (post-­ processing). During the Field Trial, all procedures are tested, including data cleaning and identification of any issues with the assessment procedures to be corrected in the main survey. Also, indicators are already scaled for the Field Trial and results are discussed with the expert groups. All measures are used to derive recommendations on how to revise the assessment instruments, both tests and questionnaires, for the main survey.

98

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

on the analysis plan for the specific study and also on the measure – whether it is a trend indicator or a newly developed scale – different exploratory and confirmatory procedures are applied, ranging from exploratory factor analysis to scaling based on item response theory, to ensure high reliability of measures. Statistical analyses address equivalence issues, as described before. In addition, the validity of measures is analyzed by studying their relationships to similar or divergent measures, student test results or other important outcomes (e.g., school climate, student interest, or teacher collaboration in school). These procedures are usually described in advance and tailored to the needs of the framework, as the selected measures for the main survey need to represent the constructs highlighted in the framework.

CONCLUSION We have described in this chapter how educational ILSAs are an example of a remarkably fruitful cooperation between scientists and policy makers. We have highlighted the challenges of these projects. Many challenges are operational; running projects with so many stakeholders that have to be completed in a limited amount of time and with limited resources is a huge task. In addition, ILSAs have to face challenges in cross-cultural and cross-disciplinary communication. We highlighted the importance of coherent frameworks for all stages of an ILSA project. It is our impression that over time, quality standards in content selection as well as data analysis have been developed further and have become stricter, reacting to insights from educational research as well as to needs from educational policy makers when using ILSA data for reporting and international comparison. Over the years, procedures for item writing, obtaining feedback on their quality in Field Trials, and statistical issues to address invariance of data across countries have been

optimized and professionalized, greatly adding to the credibility of these studies. Despite all the quality assurances that are built into the survey process, it is important to acknowledge that educational ILSAs have their limitations, such as their focus on selfreports and on what is similar across curricula across countries, and that these characteristics create a bias towards Western-based curricula. Future approaches should not only take into account matching perspectives from different stakeholders in the educational process, but might also account for longitudinal measurement of context indicators as well as knowledge gained from more qualitative approaches. In this endeavor, the need for balancing trend measurement over time and adapting measures for innovations becomes clear. Still, despite these limitations, educational ILSAs have been remarkably successful in comparing student performances and curricula.

Notes 1  We use the term ‘country’ in this chapter in a loose sense in that entities participating in ILSAs are often but not always nation states. 2  The term ‘indicator’ is used in this chapter to refer to items, subscales, and scales. The latter terms are only used for their more specific contents.

REFERENCES ACER (January 2015). The Southern and Eastern Africa Consortium for Monitoring Educational Quality. Assessment GEMs No. 8. Melbourne, Australia: ACER. Alexander, R., Lüdtke, O., Köller, O., Kröhne, U., Goldhammer, F., & Heine, J. H. (2017). Herausforderungen bei der Schätzung von Trends in Schulleistungsstudien. Eine Skalierung der deutschen PISA-Daten. Diagnostica, 63, 148–165. American Education Research Association, National Council on Measurement in Education & American Psychological Association

Challenges in International Large-Scale Educational Surveys

(2014). Standards for educational and psychological tests. Washington, DC: American Educational Research Association. Brislin, R. (1973). Questionnaire wording and translation. In R. Brislin, W. Lonner, & R. Thorndike (Eds.), Cross-cultural research methods (pp. 32–58). New York: Wiley. Buchholz, J., & Jude, N. (2017). Scaling procedures and construct validation of context questionnaire data. In OECD, PISA 2015 technical report (pp. 289–344). Paris: OECD. Buckley, J. (2009). Cross-national response styles in international educational assessments: Evidence from PISA 2006. NCES Conference on the Program for International Student Assessment: What we can learn from PISA, Washington, DC. Christoph, G., Goldhammer, F., Zylka, J., & Hartig, J. (2015). Adolescents’ computer performance: The role of self-concept and motivational aspects. Computers & Education, 81, 1–12. Clarke, M. (2012). What matters most for student assessment systems: A framework paper. Systems Approach for Better Education Results (SABER) student assessment working paper, no. 1. Washington, DC: World Bank. Conway, K., Acquadro, C., & Patrick, D. L. (2014). Usefulness of translatability assessment: Results from a retrospective study. Quality of Life Research, 23, 1199–1210. Creemers, B. P. (2006). The importance and perspectives of international studies in educational effectiveness. Educational Research and Evaluation, 12, 499–511. De Jonge, T., Veenhoven, R., & Kalmijn, W. (2017). Diversity in survey questions on the same topic. New York: Springer. Dept, S., Ferrari, A., & Halleux, B. (2017). Translation and cultural appropriateness of survey material in large-scale assessments. In P. Lietz, J. C. Cresswell, K. F. Rust, & R. J. Adams (Eds.), Implementation of large-scale education assessments (pp. 168–192). New York: Wiley. Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct validation. Measurement and Evaluation in Counseling and Development, 43, 121–149. Dohn, N. B. (2007). Knowledge and skills for PISA: Assessing the assessment. Journal of Philosophy of Education, 41, 1–16.

99

Dronkers, J., & Avram, S. (2009). Choice and effectiveness of private and public schools in seven countries: A reanalysis of three PISA data sets. Zeitschrift für Pädagogik, 55, 895–909. Fernandez-Cano, A. (2016). A methodological critique of the PISA evaluations. RELIEVERevista Electrónica de Investigación y Evaluación Educativa, 22(1). Fischer, R. (2004). Standardization to account for cross-cultural response bias: A classification of score adjustment procedures and review of research in JCCP. Journal of CrossCultural Psychology, 35, 263–282. Forestier, K., & Adamson, B. (2017). A critique of PISA and what Jullien’s plan might offer. Compare: A Journal of Comparative and International Education, 47, 359–373. Glas, C. A., & Hendrawan, I. (2005). Testing linear models for ability parameters in item response models. Multivariate Behavioral Research, 40, 25–51. Goldhammer, F., Martens, Th., Christoph, G., & Lüdtke, O. (2016). Test-taking engagement in PIAAC. OECD Education Working Papers, No. 133. Paris: OECD Publishing. Retrieved from http://dx.doi.org/10.1787/5jlzfl6fhxs2-en Goldstein, H. (2004). International comparisons of student attainment: Some issues arising from the PISA study. Assessment in Education: Principles, Policy & Practice, 11, 319–330. Grek, S. (2009). Governing by numbers: The PISA ‘effect’ in Europe. Journal of Education Policy, 24, 23–37. Gür, B. S., Çelik, Z., & Özo˘glu, M. (2012). Policy options for Turkey: A critique of the interpretation and utilization of PISA results in Turkey. Journal of Education Policy, 27, 1–21. Haladyna, T. M., & Downing, S. M. (2004). Construct-irrelevant variance in high-stakes testing. Educational Measurement: Issues and Practice, 23, 17–27. Hambleton, R. K., Merenda, P. F., & Spielberger, C. D. (Eds.) (2005). Adapting educational and psychological tests for cross-cultural assessment. Hillsdale, NJ: Erlbaum. Harzing, A. W. K. (2006). Response styles in cross-national survey research: A 26-country study. International Journal of Cross-Cultural Management, 6, 243–266.

100

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

He, J., & van de Vijver, F. (2016). Response styles in factual items: Personal, contextual and cultural correlates. International Journal of Psychology, 51, 445–452. He, J., van de Vijver, F. J. R., Fetvadjiev, V. H., Dominguez-Espinosa, A., Adams, B. G., Alonso-Arbiol, I., Aydinli-Karakulak, A., Buzea, C., Dimitrova, R., Fortin Morales, A., Hapunda, G., Ma, S., Sargautyte, R., Schachner, R. K., Sim, S., Suryani, A., Zeinoun, P., & Zhang, R. (2017). On enhancing the crosscultural comparability of Likert-scale personality and value measures: A comparison of common procedures. European Journal of Personality, 31, 642–657 Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates. International Journal of Testing (2018). ITC guidelines for translating and adapting tests (second edition), International Journal of Testing, 18(2), 101–134. Jacobs, J. K., Hollingsworth, H., & Givvin, K. B. (2007). Video-based research made ‘easy’: Methodological lessons learned from the TIMSS video studies. Field Methods, 19, 284–299. Jude, N. (2016). The assessment of learning contexts in PISA. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective (pp. 39–51). New York: Springer International. Jude, N., & Kuger, S. (2018). Questionnaire development and design for international large-scale assessments (ILSAs): Current practice, challenges, and recommendations. Commissioned Papers on International LargeScale Assessments (ILSAs) for the National Academy of Education. Retrieved from http:// naeducation.org/wp-content/uploads/2018/04/ Jude-and-Kuger-2018-FINAL.pdf Kellaghan, T., Greaney, V., & Murray, T. S. (2009). Using the results of a national assessment of educational achievement. National assessments of educational achievement (Vol. 5). Washington, DC: World Bank. King, G., & Wand, J. (2007). Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis, 15, 46–66. Kitayama, S., Markus, H. R., & Kurokawa, M. (2000). Culture, emotion, and well-being:

Good feelings in Japan and the United States. Cognition & Emotion, 14, 93–124. Klieme, E. (2016). TIMSS 2015 and PISA 2015. How are they related on the country level? Retrieved from www.dipf.de/de/publikationen/ pdf-publikationen/Klieme_TIMSS2015andPISA2015.pdf Kuger, S., Klieme, E., Jude, N., & Kaplan, D. (Eds.) (2016). Assessing contexts of learning world-wide: An international perspective. New York: Springer International. Kyllonen, P. C., & Bertling, J. P. (2014). Innovative questionnaire assessment methods to increase cross-country comparability. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 277–286). Boca Raton, FL: CRC Press. Lietz, P., Cresswell, J. C., Rust, K. F., & Adams, R. J. (Eds.) (2017). Implementation of largescale education assessments. New York: Wiley. Liu, O. L., Wilson, M., & Paek, I. (2008). A multidimensional Rasch analysis of gender differences in PISA mathematics. Journal of Applied Measurement, 9, 18–35. Marksteiner, T., Kuger, S., & Klieme, E. (2018, online first). The potential of anchoring vignettes to increase intercultural comparability of non-cognitive factors. Assessment in Education: Principles, Policy & Practice. Martin, M. O., Mullis, I. V. S., & Hooper, M. (Eds.) (2016). Methods and procedures in TIMSS 2015. Retrieved from http:// timssandpirls.bc.edu/publications/timss/2015methods.html Matsumoto, D., & Van de Vijver, F. J. R. (Eds.) (2011). Cross-cultural research methods in psychology. New York: Cambridge University Press. Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749. Milfont, T. L., & Fischer, R. (2010). Testing measurement invariance across groups: Applications in crosscultural research. International Journal of Psychological Research, 3, 111–121.

Challenges in International Large-Scale Educational Surveys

Mullis, I. V., & Martin, M. O. (Eds.) (2017). TIMSS 2019 assessment frameworks. Retrieved from Boston College, TIMSS & PIRLS International Study Center website: http://timssandpirls. bc.edu/timss2019/frameworks/ Mullis, I. V., Martin, M. O., Kennedy, A. M., Trong, K. L., & Sainsbury, M. (2009). PIRLS 2011 assessment framework. Amsterdam: International Association for the Evaluation of Educational Achievement. Murimba, S. (2005). Evaluating students’ achievements: The Southern and Eastern African Consortium for Monitoring Educational Quality (SACMEQ). Mission, approach and projects. Prospects, XXXV(1). Nagengast, B., & Marsh, H. (2014). Motivation and engagement in science around the globe: Testing measurement invariance with multigroup structural equation models across 57 countries using PISA 2006. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 317–345). Boca Raton, FL: CRC Press. OECD (2014). PISA 2012 Technical report. Paris: OECD. OECD (2015). PISA 2018 Technical standards. Paris: OECD. Retrieved from www.oecd.org/ pisa/pisaproducts/PISA-2018-TechnicalStandards.pdf OECD (2016a). TALIS 2018 video study and global video library on teaching practices. Paris: OECD. Retrieved from www.oecd.org/ edu/school/TALIS-2018-video-study-brochure-ENG.pdf OECD (2016b). PISA for Development brief – 2016/07(July). Paris: OECD. Retrieved from www.oecd.org/pisa/aboutpisa/PISA-FORDEV-EN-1.pdf OECD (2017). PISA 2015 Technical report. Paris: OECD. Paulhus, D. L. (1991). Measurement and control of response biases. In J. Robinson, P. Shaver, & L. Wrightsman (Eds.), Measures of personality and social psychological attitudes (Vol. 1, pp. 17–59). San Diego, CA: Academic Press. Poortinga, Y. H., & van der Flier, H. (1988). The meaning of item bias in ability tests. In S. H. Irvine & J. W. Berry (Eds.), Human abilities in

101

cultural context (pp. 166–183). Cambridge: Cambridge University Press. Rautalin, M., & Alasuutari, P. (2009). The uses of the national PISA results by Finnish officials in central government. Journal of Education Policy, 24, 539–556. Reynolds, D., Sammons, P., De Fraine, B., Van Damme, J., Townsend, T., Teddlie, C., & Stringfield, S. (2014). Educational effectiveness research (EER): A state-of-the-art review. School Effectiveness and School Improvement, 25, 197–230. Rutkowski, L., & Svetina, D. (2014). Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educational and Psychological Measurement, 74, 31–57. Rutkowski, L., & Svetina, D. (2017). Measurement invariance in international surveys: Categorical indicators and fit measure performance. Applied Measurement in Education, 30, 39–51. Salomon, J. A., Tandon, A., & Murray, C. J. (2004). Comparability of self rated health: Cross sectional multi-country survey using anchoring vignettes. British Medical Journal, 328(7434), 258–261. Scheerens, J., & Bosker, R. (1997). The foundations of educational effectiveness. Oxford: Pergamon. Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In M. Zanna (Ed.), Advances in Experimental Social Psychology, 25, 1–65. Segeritz, M., & Pant, H. A. (2013). Do they feel the same way about math? Testing measurement invariance of the PISA ‘students’ approaches to ‘learning’ instrument across immigrant groups within Germany. Educational and Psychological Measurement, 73, 601–630. Serpell, R. (2007). Bridging between orthodox western higher educational practices and an African sociocultural context. Comparative Education, 43, 23–51. Suter, L. E. (2011). International comparative studies in education. In W. S. Bainbridge (Ed.), Science, technology, engineering, and mathematics (STEM) education: Leadership in science and technology: A reference handbook (pp. 842–850). Thousand Oaks, CA: Sage.

102

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Suter, L. E. (2017). How international studies contributed to educational theory and methods through measurement of opportunity to learn mathematics. Research in Comparative and International Education, 12, 174–197. Teltemann, J., & Klieme, E. (2016). The impact of international testing projects on policy and practice. In G. Brown & L. Harris (Eds.), Handbook of human and social conditions in assessment (pp. 369–386). New York: Routledge. Upsing, B., Gissler, G., Goldhammer, F., Rölke, H., & Ferrari, A. (2011). Localisation in international large-scale assessments of competencies: Challenges and solutions. The International Journal of Localisation, 10, 44–57. Van de Vijver, F. J. R., & He, J. (2016). Bias assessment and prevention in non-cognitive outcome measures in PISA questionnaires. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning worldwide: An international perspective (pp. 229– 253). New York: Springer International. van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis for cross-cultural research. Newbury Park, CA: Sage. Ward, M., & Zoido, P. (2015). PISA for development. ZEP: Zeitschrift für Internationale Bildungsforschung und Entwicklungspädagogik, 38(4), 21–25.

Weijters, B., Cabooter, E., & Schillewaert, N. (2010). The effect of rating scale format on response styles: The number of response categories and response category labels. International Journal of Research in Marketing, 27, 236–247. Willis, G. B. (2004). Cognitive interviewing: A tool for improving questionnaire design. Thousand Oaks, CA: Sage. Willms, J. D., & Tramonte, L. (2015). Towards the development of contextual questionnaires for the PISA for development study. OECD Working Papers. Paris: OECD. Yang, Y., Harkness, J. A., Chin, T. Y., & Villar, A. (2010). Response styles and culture. In J. A. Harkness, M. Braun, B. Edwards, T. P. Johnson, L. Lyberg, P. P. Mohler, & T. W. Smith (Eds.), Survey methods in multinational, multiregional, and multicultural contexts (pp.  203–223). Hoboken, NJ: Wiley. Zeidler, D. L., & Sadler, T. D. (2011). An inclusive view of scientific literacy: Core issues and future directions. In C. L. Östman, D. A. Roberts, P. Wickman, G. Erickson, & A. MacKinnon (Eds.), Exploring the landscape of scientific literacy (pp. 176–192). New York: Routledge/Taylor and Francis Group.

6 Non-Cognitive Attributes: Measurement and Meaning Mary Ainley and John Ainley

The last decade of the 20th century witnessed a rapid increase in research attention to attitude and value constructs: interest (Hidi, 1990), self-regulation (Pintrich & De Groot, 1990), self-concept (Marsh, 1993), self-efficacy (Bandura, 1997; Pajares, 1996), and theories such as Eccles’ Expectancy-Value theory (Eccles & Wigfield, 2002), and Pekrun’s (2000) Control-Value theory of achievement emotion. Interest was identified as an important resource for learning (Renninger, Hidi, & Krapp, 1992), and positive correlations between interest and learning were widely reported (Schiefele, 1996). The inclusion of attitude and value constructs in International Large-Scale Assessments (ILSAs) in education was a response to the prominence of the motivation, value and self-belief constructs in education and educational psychology. The expert reference groups associated with ILSAs such as PISA, PIRLS and TIMSS drew on this research literature and included a range of questions concerned with non-cognitive constructs in their student questionnaires.

Inevitably this has led to comparisons between countries exposing some unexpected associations between non-cognitive variables and achievement. One of these unexpected findings is referred to as the ‘attitude–achievement paradox’; positive associations between noncognitive variables and achievement at the student level become negative associations in between-country comparison of means. This paradox has been documented across domains and replicated across assessment cohorts (see e.g., Kennedy & Trong, 2006; Kyllonen & Bertling, 2014). This chapter commences with a consideration of some assumptions behind the inclusion of non-cognitive constructs in ILSAs and we then explore how the PISA expert reference groups have described the place of non-cognitive constructs in their literacy frameworks. Similar analyses could be applied to other ILSAs, such as TIMMS and PIRLS. We then focus on the attitude– achievement paradox and consider research findings from both PISA and TIMMS that

104

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

offer some insights for disentangling strands of the paradox. We use attitude and attitudinal rather than non-cognitive to reflect the content of the attitude–achievement paradox. Non-cognitive acknowledges that more than just cognition is involved in students’ achievement (SchiepeTiska, Roczen, Müller, Prenzel, & Osborne, 2016). Attitude and value constructs have featured in all PISA surveys and recently the scope of non-cognitive constructs in PISA has been broadened to include aspects of personality, health, well-being and social competencies (Bertling, Marksteiner, & Kyllonen, 2016).

INCLUSION OF ATTITUDINAL CONSTRUCTS IN ILSAS There are two key assumptions concerning the place of attitudes in the assessment of achievement. First, based on existing evidence linking attitudinal constructs with students’ achievement, ILSA designers assume a positive link between students’ attitudes and achievement. Attitudinal constructs are analysed as they are expected to predict achievement and assumed to promote achievement. The cross-sectional character of most ILSA data sets and the underlying correlational nature of most analyses provide information about associations rather than causes. Despite official disclaimers, such associations often are interpreted as if the predictive factor influences or promotes achievement. A second key assumption is that attitudinal constructs represent important outcomes of learning. This is most often discussed in terms of students’ willingness to use mathematics, reading, or science competencies to deal with real-world issues and problems. It is also suggested that attitudinal outcomes may foreshadow future learning, especially the likelihood of participation in lifelong learning.

PREDICTING AND PROMOTING ACHIEVEMENT Factors that Predict Achievement Research in the 1990s typically reported that factors such as motivation in the form of interest and enjoyment, task value and selfbeliefs such as self-efficacy and self-concept were associated with achievement (Eccles, Wigfield, & Schiefele, 1998; Pajares, 1996; Schiefele, 2009). In this climate, ILSAs afforded opportunities to confirm these findings across countries, genders and school systems as well as in relation to socioeconomic status differences. Comparing results over waves from different years might also identify patterns of change and stability across cohorts and across domains. One consistent finding across ILSAs has been a positive association between interest and enjoyment of a domain and student performance. Simultaneously, cross-country comparisons generally have indicated negative associations. Countries where students score highest on interest and enjoyment items are often countries with the lowest achievement scores. Countries with the highest achievement scores generally score in the lower ranges on the interest and enjoyment items.

Factors that Promote Achievement Establishing that motivation and achievement are associated does not clarify the processes through which this occurs. Conclusions from PISA 2000 in relation to reading achievement suggested: Cognitive and non-cognitive components of reading engagement go hand in hand. … Cognition and motivation, proficiency and engagement in reading have an entangled relationship. Cause cannot be disentangled from effect. Proficient readers are more engaged in reading and, as a result, acquire more knowledge and skills. Students with poor reading habits often find reading material too difficult … and develop a negative attitude towards reading. (OECD, 2002, p. 121)

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

Williams, Williams, Kastberg, and Jocelyn (2005) suggested that this pattern of relations between engagement and competence describes a reciprocal relation. Factors such as interest and enjoyment of reading directly impact reading activity, which influences reading literacy, which in turn impacts students’ interest and enjoyment of reading. However, the cross-sectional and correlational nature of findings from ILSAs places limitations on conclusions concerning the association between attitudinal factors and achievement. Reciprocal relations between interest and enjoyment and reading competence are assessed best with longitudinal data. Effect follows cause and then becomes the cause in the sequence of achievement behavior that follows. In addition, attitudes may influence achievement in other ways. Is it that motivation sets limits on achievement? Does motivation mediate the effects of factors such as gender and socio-economic status on achievement? Are there cultural factors that moderate how motivation influences achievement? Each form of influence has different implications for educational practice. While longitudinal studies are needed to confirm the processes whereby attitudes contribute to achievement, the findings from ILSAs and some of the questions raised can be addressed through multivariate secondary analyses of these data sets.

AS OUTCOMES OF LEARNING AND ACHIEVEMENT A consistent emphasis in statements of the objectives of ILSAs is concern over the application of student competencies in real-world contexts. For example, the mathematics expert group for PISA 2012 argued that students who feel positive towards mathematics, who are interested and enjoy mathematics, who feel confident using mathematics for specific tasks (self-efficacy), feel confident about the

105

general domain (self-concept), and are not anxious about mathematics, are more likely to learn and to use mathematical skills. Hence, not only are attitudes concerning mathematics expected to influence achievement, but development of positive attitudes to mathematical contents and to their own mathematical competencies is an important outcome of students’ mathematics education (OECD, 2013). Evidence of the importance of attitudinal factors for the application of competencies in work, in general life contexts, and as lifelong learners also requires input from research that goes beyond the cross-sectional character of the main ILSAs. Some countries have based longitudinal studies on PISA samples and this allows some exploration of the longerterm outcomes of the attitudes, values and achievements of students. In Australia, the Longitudinal Surveys of Australian Youth (LSAY, Australian Government, 2016) built a series of cohorts on the PISA cycles from 2003, 2006, 2009 and 2015. Following their participation in PISA, typically about 14,000 students were surveyed through annual computer-assisted telephone interviews until they reached 25 years of age. These annual surveys explore educational, occupational and social transitions of young people, and how achievement, aspirations, attitudes and skills are associated with these transitions (Australian Government, 2016). Parker and colleagues (2014), using latent path modelling of data from the 2003 cohort, investigated how self-efficacy and self-concept were associated with longer-term outcomes: overall achievement at the end of high school, entry to university and undertaking post-school STEM studies. Homel and Ryan (2014) used longitudinal data from the 2009 cohort to identify the effects of aspirations at age 15 for high school completion and for studying at university, beyond the effects of school achievement or family background. Ainley and Ainley (2015) used data based on the 2006 cohort to investigate influences of motivation and science achievement at age 15 on uptake of science studies in the final year of high school.

106

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

In Canada, the Youth in Transition Survey (YITS) followed 30,000 Canadian students who participated in PISA 2000 through a series of interviews every two years through to 2010. After allowing for the influences of other student characteristics, higher achievement in PISA predicted high-school completion and participation in post-secondary education (OECD, 2010a). Early labormarket outcomes, such as earnings at age 21, also appeared to be associated with PISA competencies at age 15. The Canadian follow-up also involved a reassessment of reading at age 24, thereby providing information about the development of reading competence between ages 15 and 24 (OECD, 2012). In Denmark, a longitudinal study based on the PISA 2000 cohort (Jensen & Andersen, 2006) indicated that four years later, in addition to the direct and indirect effects of social background, the educational status of participants was largely determined by their reading achievement and academic self-image at age 15. Similar longitudinal studies have been conducted in Switzerland (see Bertschy, Cattaneo, & Wolter, 2008). This sample of longitudinal studies based on PISA cohorts provides evidence that attitudes towards key schooling domains have implications for students’ longer-term educational outcomes.

ATTITUDES AND THE DOMAIN LITERACIES IN PISA In this section we review the nature of achievement as assessed in ILSAs by exploring definitions of literacy guiding the design and construction of PISA surveys from 2000 to 2015. We examine the place of attitudes in the frameworks informing these definitions and focus on achievement defined as literacy in the domains of reading, mathematics and science. In the educational research literature on the motivation of achievement, reasons for actions and competence beliefs are distinguished (Pintrich, Marx, & Boyle, 1993). Pintrich et al.

proposed that these separate motivation components interact in achievement settings. Reasons for doing a task (e.g., interest, value) function as drivers activating students’ competence beliefs, beliefs about ability to perform the task at hand (e.g., self-efficacy), and beliefs about capacity to achieve in the domain (e.g., self-concept). Another important distinction concerns the relation between motivation and engagement. It has been suggested that, ‘Motivation is about energy and direction, the reasons for behavior, why we do what we do. Engagement describes energy in action, the connection between person and activity’ (Frydenberg, Ainley, & Russell, 2006, p. 1). Self-report scales invite students to indicate how they perceive their motivation for reading, mathematics or science in terms of reasons for actions (interest, enjoyment and values) and competence beliefs (self-efficacy and selfconcept). Engagement with learning and achievement, whether in reading, mathematics or science, requires that reasons for learning and competence beliefs are translated into appropriate domain-related activities. These perspectives inform our examination of the definitions and models of literacy for the three PISA literacies: reading, mathematics and science.

READING In PISA 2000 and 2009, the major domain of assessment was reading literacy with subsidiary domains of mathematics and science. The first PISA survey defined literacy in terms of the cognitive processes underpinning achievement in each of the three domains but did not include reference to any relation between cognitive and attitudinal dimensions. For example: Reading literacy is defined in PISA as the ability to understand, use and reflect on written texts in order to achieve one’s goals, to develop one’s knowledge and potential, and to participate effectively in society. (OECD, 2001, p. 21)

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

The accompanying student questionnaire included attitudinal and behavioural variables, such as time spent on reading, the range and types of books read, and interest and enjoyment of reading. The types of materials students reported reading were used to create reader profiles ‘defined by the frequency and diversity of material read’. For example, magazines were the most frequent reading material for the least diversified reader profile, while the most diversified reader profile extended to frequent reading of ‘more demanding and longer texts (namely books)’ (OECD, 2002, p. 110). Cross-country comparisons indicated differences in the percentage of students grouped in each of the four profiles. For example, the smallest percentages in the low diversity profile were found in Northern European countries of Finland, Norway and Sweden while countries like Belgium, France, Greece and Luxembourg had more than a third of students exhibiting this profile. Countries such as Japan, Belgium, Finland and Norway had fewer than 20% of students in the high diversity profile. In contrast, more than one-third of students in Australia, New Zealand and the UK exhibited this profile. In addition, attitudinal items were combined into an index of ‘engagement in reading’ with the overall finding that: On the one hand ‘engaged readers’ regularly read different kinds of print; on the other hand, they have developed positive attitudes towards reading, and their interest in reading and motivation to read are strong. They think that reading is a valuable activity, one that provides them with a source of pleasure and knowledge. (OECD, 2001, p. 108)

It is important to note that this engagement index includes not only aspects such as interest, enjoyment and valuing of reading, but also reading activities. The student questionnaire included a wider set of attitudinal items than has been referred to above and when student-level and countrylevel results were considered, a number of these scales showed biases consistent with the attitude–achievement paradox. Interest in

107

reading, instrumental motivation, and use of control strategies, along with another five scales were listed as ‘student attributes that cannot be directly compared across cultures. Self-concept in reading and self-efficacy were ‘student attributes that can be compared across cultures’ (OECD, 2003, p. 39, bold in original). Multivariate analyses identified how motivation and learning strategies together contribute to reading literacy performance. Interest in reading, instrumental motivation and self-efficacy predicted use of control strategies that predicted reading performance. In addition to the positive effects of interest in reading and self-efficacy on performance, use of control strategies mediated the effects of the three predictors on performance. While cross-country comparisons of the associations between performance and specific attitudinal scales indicated the attitude– achievement paradox, when combined into multivariate models of the structural relations between the scales and performance, the pattern of relations across countries was similar. The second PISA cycle in which reading achievement was the major domain, in 2009, generally confirmed the 2000 findings. Overall, when students enjoy reading, they performed significantly better than students who did not enjoy reading. For OECD countries, enjoyment of reading accounted for 18% of the variation in performance. However, there were significant numbers of students reporting that ‘they do not read for enjoyment at all’. The latter tended to be over-represented at the lower proficiency levels and there were differences between countries as to whether students who do not read for enjoyment were concentrated at lower proficiency levels. For example, for Israel, Belgium, Qatar and Brazil, the gradient of enjoyment across the reading proficiency levels was very ‘gentle’, which is indicative of a weak association between enjoyment and performance. Other countries, such as Australia, Estonia, the Czech Republic and Finland, had a steeper gradient across proficiency levels, which is indicative of a

108

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

stronger association between enjoyment and performance (OECD, 2010b, p. 30). The framework for PISA 2009 employed an expanded model of the relation between attitudes and reading proficiency. ‘Students who are highly engaged in a wide range of reading activities and who adopt particular strategies to aid them in their learning are more likely than other students to be effective learners and to perform well at school’ (OECD, 2010b, p. 26). Although the PISA 2009 reading assessment was a snapshot in time, the survey framework informing item construction proposed an engagement– performance cycle, in effect, a reciprocal relation between engagement with reading and reading achievement. ‘As students read more they become better readers; and when they read well and expect good performance in reading, they tend to read more and enjoy reading’ (OECD, 2010b, p. 27). Analyses of the relations between reading habits, strategies and performance assessed a model where reading habits and approaches to learning mediated the association of both gender and socio-economic background with reading literacy performance. Enjoyment of reading, which was strongly associated with the other attitude to reading indices, was used to index reading habits, and awareness of summarizing strategies was used to index learning approaches. Reading habits and learning strategies functioned as mediators and results indicated country differences in the degree to which these mediation processes accounted for the size of gender differences in reading performance. For example, Finland had a relatively high gender gap in reading proficiency with an above average contribution of enjoyment of reading and summarizing strategies as mediators. On the other hand, Montenegro had a similar size gender difference in reading proficiency but lower than average mediating contribution for enjoyment of reading and summarizing strategies. However, ‘in all countries that took part in PISA 2009, students who perform well in reading tend to be those students who

have a deep understanding of which learning strategies are most effective in attaining different learning goals while also reading a wide variety of materials for their own enjoyment’ (OECD, 2010b, p. 97). In sum, overall the contributions of attitudinal constructs such as enjoyment of reading, and approaches to learning constructs such as summarizing strategies were significant and positive. Multivariate modelling of the relations between attitudes and reading literacy performance yielded patterns of association that were consistent across countries. Across both waves (PISA 2000 and PISA 2009) the focus on engagement as habits of reading behaviour suggests productive ways for understanding how students across all countries interact with and learn from reading materials.

MATHEMATICS For PISA 2003, the major domain was mathematics with science and reading as subsidiary domains. Each domain was defined by three dimensions: content or structure of knowledge, cognitive processes, and situations or contexts for application of skills. • The content or structure of knowledge that students need to acquire in each assessment area (e.g., familiarity with mathematical concepts); • the processes that need to be performed (e.g., pursuing a certain mathematical argument); and • the situations in which students encounter mathematical problems and relevant knowledge and skills are applied (e.g., making decisions in relation to one’s personal life, or understanding world affairs). (OECD, 2004, p. 25)

Again, attitudinal constructs were not included in this generic definition of mathematical literacy. However, it was argued that since attitudinal constructs represent characteristics known to be associated with how students approach learning, four attitudinal factors were included in the student questionnaire.

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

The first factor, engagement with mathematics, included interest in and enjoyment of mathematics, instrumental motivation, attitudes towards school and a sense of belonging at school. The second factor consisted of students’ self-beliefs in relation to mathematics, measured as self-efficacy and self-concept in mathematics. A third factor consisted of an emotional dimension, mathematics-related anxiety. The final factor consisted of learning strategies or self-regulatory strategies as used in mathematics (OECD, 2004, pp. 115–116). The model operates on the basis that students’ interest in mathematics and low levels of anxiety are drivers which initiate investment in learning activity, with the adoption of particular strategies … the model then seeks to predict students’ performance in mathematics from students’ interest in mathematics, their absence of anxiety in mathematics and frequency with which students report the use of control strategies. (OECD, 2004, p. 147)

The coefficients in the path model confirmed the predicted associations. In addition, a small positive direct path linked interest and enjoyment with performance, and a larger negative direct path linked anxiety and performance (see Figure 3.12 in OECD, 2004, p. 147). There is also an implication that these attitudinal factors influence achievement. However, the strength of this implied influence was qualified by the caveat that arrows in the model ‘indicate a suggested effect, rather than a demonstrated causal link’. From PISA 2003 data it was reported that within countries students with higher interest and enjoyment scores generally had higher mathematics performance. However, cross-country comparisons demonstrated the attitude–achievement paradox. Countries with overall higher interest and enjoyment scores, such as Brazil, Indonesia and Tunisia, tended to have lower performance scores, while students in the highest-performing countries, such as Finland, Japan and Korea, reported the lowest interest in and enjoyment of mathematics (OECD, 2004). The meaning of this pattern was given some attention but largely left unresolved.

109

Interest and performance might be mutually reinforcing, or both might be affected by other factors such as socio-economic circumstances and school context. Indeed … the relationship between intrinsic motivation and student performance in mathematics diminishes considerably or even becomes negligible in most countries when other learner characteristics are accounted for. However, whatever the nature of this relationship, a positive disposition towards mathematics remains an educational goal in its own right. (OECD, 2004, pp. 119–121)

The definition of mathematics literacy in PISA 2012 featured similar components of contents/structure, processes and situation. The associated framework gave prominence to the development of a positive disposition towards mathematics as a valued educational outcome as well as contributing to mathematics achievement (OECD, 2013). Development of the framework for the 2012 mathematics assessment shows a shift in emphasis from specific attitudinal constructs to broader dispositions. It was argued that understanding how students regulate their behaviour when learning mathematics requires distinguishing intrinsic and extrinsic motivation as general dispositions as well as considering students’ intentions. The underlying model draws on the theory of planned behaviour (Ajzen, 1991), which proposed that attitudes and values predict behavioural intentions, and intentions are the most direct influence on actual behaviour. Based on research by Lipnevich, MacCann, Krumm, Burrus, and Roberts (2011), questionnaire items reflected short-term intentions to engage with mathematics studies and longerterm intentions as career and future life directions. Four motivation scales were developed: interest and enjoyment of mathematics, motivation to learn mathematics, short-term intentions, and long-term intentions. It was argued that increasing the number of items, including embedded items as used in the PISA 20061 assessments would ‘unnecessarily overburden students’ (Lipnevich et al., 2011, p. 185). The framework also recognized a need

110

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

to maximize comparability of items across PISA cycles. Of relevance for this analysis is the inclusion in the student questionnaire of broader attitudinal outcomes, including attitudes towards school, truancy, and sense of belonging at school, and domain-specific attitudinal outcomes, including interest in and enjoyment of mathematics, instrumental motivation, self-efficacy, self-concept, mathematics anxiety, and learning strategies of control and elaboration (OECD, 2013). Again, there were wide differences between countries in their endorsement of being interested in and enjoying mathematics. For example, in countries like Indonesia, Malaysia, Kazakhstan, Thailand and Albania, approximately 70% of students reported enjoying mathematics, but only about 30% of students in countries like Croatia, Austria, Hungary, Finland and Belgium enjoy mathematics (OECD, 2013). Overall, girls and students from lower socio-economic groups were more likely to report lower levels of interest and enjoyment of mathematics. Crosscountry comparisons indicated differences in the extent of these variations. For example, ‘Gender differences in intrinsic motivation to learn mathematics are especially wide in Switzerland, Liechtenstein, Luxembourg and Germany’ (OECD, 2013, p. 67). Consistent with the pattern already described, there was an overall tendency for students reporting lower levels of interest and enjoyment of mathematics to score lower on the achievement assessment than students who report higher levels of interest and enjoyment. As with previous PISA results, this positive student-level or withincountry association between intrinsic motivation and performance was not consistent with between-country comparisons.

SCIENCE In PISA 2006 science was the major domain and the framework documents suggest strong interest in the contribution of attitudes to

science achievement. In particular, there was concern that competency on assessments of scientific knowledge and skills did not necessarily translate into ‘the ability to apply scientific knowledge in life situations’ (Bybee & McCrae, 2011). It was recognized that applying scientific knowledge to personal decisions and behaviour also involves attitudes, interests, beliefs and values. Hence, it was argued that attitudinal factors should be included in the definition of science literacy. ‘Willingness to engage’ was added to the definition as a fourth component and this was welcomed as ‘a large shift in what is important … from purely cognitive aspects to include affective aspects’ (Fensham, 2007, p. 227). The four components in the definition of scientific literacy were stated as: • ‘Scientific knowledge and use of that knowledge to identify questions, acquire new knowledge, explain scientific phenomenon, and draw evidence-based conclusions about science-related issues; • Understanding of the characteristic features of science as a form of human knowledge and enquiry; • Awareness of how science and technology shape our material, intellectual, and cultural environments; and • Willingness to engage in science-related issues, and with the ideas of science, as constructive, concerned, and reflective citizen’ (Bybee, 2009, p. 5; OECD, 2007, pp. 34–35).

‘Willingness to engage’ was operationalized by items covering four areas: support for scientific enquiry, self-belief as science learners, interest in science, and responsibility towards resources and environments. The PISA 2006 student questionnaire contained the largest number of attitudinal items of any of the PISA surveys on the rationale that inclusion of ‘willingness to engage’ was part of the definition of science literacy.2 The attitudinal scales included expressions of intentions to use science ‘I will use science in many ways when I am an adult’ (personal value of science), and ‘I would like to work on science projects as an adult’ (future-oriented motivation to learn

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

science). Other scales expressed positive connections with learning science ‘I enjoy acquiring new knowledge in science’ (enjoyment of science) and participating in science activities such as ‘read science magazines or read science articles in newspapers’ (participation in science-related activities). The attitude–achievement paradox was again evident in the results. At the studentlevel interest in science was positively associated with achievement, but at the country-level comparison of means, the association was negative. Countries with higher mean interest in science scores were likely to have lower mean science achievement scores. The student questionnaire was designed to allow investigation of how attitudes and interests influence responses to scientific decisions and behaviour. Whereas in PISA 2000 measures of time spent in reading for pleasure and reading a diversity of material, motivation and interest in reading were combined into an engagement with reading index, the attitudinal constructs in PISA 2006 were not developed into a specific index of ‘willingness to engage’. Nevertheless, independent researchers exploring beyond the published summaries of the cross-country findings have examined some of the associations among the set of ‘willingness to engage’ constructs and with science achievement. Ainley and Ainley (2011a) modelled associations among the interest in science constructs to assess whether students who are interested in a science seek further participation and opportunities to re-engage and whether these associations were similar across different cultural contexts. Participation in science-related activities and future-oriented motivation to learn science represented willingness to engage and re-engage with current and future science activities. Four countries – Colombia, USA, Estonia and Sweden – were selected because of their location as extreme representatives of the four quadrants defined by the intersection of two broad bipolar dimensions derived from the World Values Surveys by Inglehart and colleagues (Inglehart &

111

Baker, 2000; Inglehart & Welzel, 2005). One dimension consists of orientation to authority, whereby values emphasizing obedience to traditional and religious authorities are contrasted with secular-rational values. The second dimension relates to rising affluence in industrialized societies and contrasts dominance of survival values with self-expression values which give prominence to individual autonomy, subjective well-being and personal quality of life. Despite some differences in model coefficients, the same model showed acceptable fit for all four countries. An extension of this analysis (Ainley & Ainley, 2011b) provided a stronger test of willingness to engage by modelling the relation between the same interest in science variables and scores on the embedded interest scale. The latter consisted of items embedded in the science knowledge assessment whereby students indicated whether they were interested in finding out more about the topic in the knowledge assessment task just completed. A similar set of predictive paths was observed: a direct path linking personal value of science with embedded interest and indirect paths mediated through enjoyment of science and general interest in learning science. Again, there were differences between the four countries in the size of the coefficients in the model, but the same pattern applied. These findings describe one of the ways that multivariate modelling of the attitudinal variables can be used to describe students’ ‘willingness to engage’ with science. In PISA 2015, as with PISA 2012, which focused on mathematics, there was a shift in thinking concerning the definition of literacy and the place of attitudinal constructs. The science literacy definition did not include ‘willingness to engage’ as a component. Thus, scientific literacy in PISA 2015 is defined by the three competencies to: • explain phenomena scientifically • evaluate and design scientific enquiry • interpret data and evidence scientifically. (OECD, 2016a, p. 19)

112

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

However, the framework within which these competencies were embedded posited a model consisting of four broad factors: contexts, knowledge, competencies and attitudes. Personal, local/national and global contexts are the settings for enactment of the three competencies quoted above. In addition, students’ content, procedural and epistemic knowledge is sourced in the enactment of specific competencies. The final factor consists of students’ attitudes in the form of interest in science, valuing scientific approaches to enquiry, and environmental awareness. The model proposed that contexts require individuals to display competencies, and how an individual does this is influenced by attitudes and knowledge (OECD, 2016a). The student questionnaire addressed science engagement (participation in science-related activities and science career expectations), motivation (enjoyment of science, interests in science topics and instrumental motivation for learning science) and self-beliefs (science self-efficacy) (OECD, 2016b, p. 111). In most countries, science career expectations were positively related to science achievement and enjoyment of science as well as aspects of student social and economic background (OECD, 2016b, p. 131). Participation in science-related activities was higher among boys than girls and was associated higher levels of intrinsic motivation to learn science, greater enjoyment of learning science, stronger interest in broad science topics and science self-efficacy. Consistent with the attitude–achievement paradox, within countries science achievement was associated with enjoyment of, interest in, and (to a lesser extent) instrumental motivation to learn science (OECD, 2016b, p. 133). Between countries, mean levels of participation in science activities, enjoyment of science and of instrumental motivation were negatively associated with average science achievement scores (OECD, 2016b, p. 136). PISA 2006 results indicated that the attitude–achievement paradox did not apply for science self-efficacy (Van de gaer & Adams,

2010). PISA 2015 results indicated that within countries self-efficacy was moderately related to science achievement in high-performing countries but not in low-performing countries. Simultaneously, country-level science selfefficacy means were not associated with science achievement means. However, when score changes between 2006 and 2015 were investigated, associations between changes in selfefficacy means and changes in science achievement means were reported (OECD, 2016b, pp. 140–141). Focusing on indicators of change may be a fruitful way to address the attitude– achievement paradox. But, pursuing this hypothesis depends on sufficient common items across PISA cycles for each major domain.

ATTITUDES AND ACHIEVEMENT: DISENTANGLING STRANDS OF THE PARADOX The prominence of literature reporting positive associations between motivation, selfcompetence beliefs and achievement across schooling domains was an important factor in the inclusion of attitudinal constructs in ILSAs. While it is well known that many of these findings were drawn from Western educational literatures, it was nevertheless surprising that across a range of ILSAs positive associations were confirmed at the student level, but at the country-level associations between attitudes and achievement were negative. Countries with higher achievement scores generally have relatively lower scores on the attitude scales; countries with lower achievement scores have relatively higher scores on the attitude scales. According to Van de gaer, Grisay, Schulz, and Gebhardt (2012), this attitude–achievement paradox has been observed in PISA, TIMSS and PIRLS and has ‘been found consistently across different subjects, grades, and cohorts’ (Van de gaer et al., 2012, p. 1206). Large-scale survey databases such as those generated by TIMSS and PISA

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

provide researchers with opportunities to explore how countries differ in patterns of achievement, attitudes and values. We will use some of the findings of recent secondary analyses to explore both measurement and substantive cultural factors that might be contributing to this widely documented attitude–achievement paradox.

RESPONSE BIAS AND RESPONSE STYLES Given the strong measurement properties of the scales used in ILSAs, it is critical to determine whether cross-country differences represent real differences in attitudinal constructs or reflect measurement and/or cultural factors that are not part of the substantive meaning of the constructs. In the comparative literature, response bias factors are well documented (see e.g., Baumgartner & Steenkamp, 2001). In PISA surveys, the attitudinal items generally are in the form of four-point response scales anchored as ‘strongly agree’, ‘agree’, ‘disagree’ and ‘strongly disagree’, and in TIMSS ‘agree a lot’, ‘agree a little’, ‘disagree a little’ and ‘disagree a lot’. Common response styles may take the form of acquiescence (ARS) or disacquiescence (DARS), that is, choosing a high proportion of ‘strongly agree’ or ‘strongly disagree’, respectively. An extreme response style (ERS) occurs when students consistently choose an extreme end of the scale whether ‘strongly agree’ or ‘strongly disagree’. A fourth response style is noncontingent responding (NCR), referring to cases where it appears students have responded carelessly or at random. Inspection of the format of student questionnaire attitudinal items reveals that what a student sees first after reading an item is the ‘strongly agree’ option.3 This factor of the physical layout of the items suggests that there is likely to be a higher level of ARS than DARS. This was confirmed by Buckley

113

(2009), who explored the incidence of each of the four types of response style using the PISA 2006 data set. Buckley reported a ‘positive net acquiescence response style’ across the 57 countries for PISA 2006. Jordan, Tunisia and Qatar showed relatively high ARS, while Japan was reported to be an outlier with a relatively large DARS. Acquiescence (ARS) is sometimes referred to as compliance and, given the layout of the student questionnaires, it is not clear whether the high rate of selection of the ‘strongly agree’ response should be interpreted as acquiescence, compliance, or social desirability. Indices of these response styles are not independent, for example, calculation of the extreme response style (ERS) consists of the proportion of extreme responses – on a four-point scale, 1 and 4 – or, specified by the PISA scale, anchors ‘strongly agree’ and ‘strongly disagree’. Calculated in this way, Qatar, Jordan and Tunisia showed the highest ERS scores. Estonia, Qatar, Jordan and Tunisia had the highest scores on the index of noncontingent responding (NCR). Buckley (2009) highlighted the response style in the Taipei results where there was evidence of strong ERS coupled with very low NCR, indicating relative consistency of response to similar content items. Other countries with a very low NCR index were Japan, Australia and Macao (Buckley, 2009, p. 12). To explore the potential effects of response styles on the relation between attitudes and achievement, Buckley (2009) estimated the effects of three response style indicators4 (ARS, DARS and NCR) on two PISA 2006 measures – value of science and enjoyment of science. Only ARS and DARS showed significant effects on value of science and enjoyment of science scores. Controlling for ARS reduced scores on both scales, while controlling for DARS increased scores on both scales with the effect for enjoyment of science twice the size of the effect for value of science. These findings were then applied to determine whether ARS and DARS explained some of the negative country-level

114

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

association between attitudes and achievement. Scatterplots representing adjusted value of science and enjoyment of science scores plotted against science achievement yielded nonlinear relationships between attitude and achievement. For value of science, this was a curvilinear relation with higher achievement at both ends of the value scale and for enjoyment of science ‘the original direction of the relationship is preserved, but with a great deal more nonlinearity’ (Buckley, 2009, p. 16). Following Baumgartner and Steenkamp (2001), Buckley’s (2009) analyses were based on a heterogeneous set of five items from the 41 attitude items. In addition, Buckley reported using several more complex methods for estimating response style but concluded that all had some difficulties. ‘While further research is needed to determine the most appropriate method of detecting response style difficulties in datasets like PISA, it seems clear that the problem should not be ignored’ (Buckley, 2009, p. 29). Lu and Bolt (2015) considered response style in relation to the attitude–achievement paradox by exploring how extreme response styles (ERS) contribute to the betweencountry effect. While less well-educated students are more likely to select an extreme response, there is an overall tendency for agreement responses (ARS) to the attitudinal scales. Hence, it may be that the ERS tendency seen in lower achieving countries will contribute to the negative attitude–achievement association observed between countries. While Buckley’s (2009) analyses were applied to a small heterogeneous subset of attitude items, Lu and Bolt (2015) adopted an approach that treated ERS and substantive attitudes as separate dimensions and used estimates of ERS to correct for this factor. ‘In short, the model emphasizes the simultaneous influence of both the substantive trait and response style on a respondent’s selection of extreme versus less extreme categories’ (Lu & Bolt, 2015, p. 5). The model included estimates of ERS, multiple attitude scales and achievement both at student level and

at country level ‘to simultaneously study the covariance structure of the attitudinal scales, achievement, and ERS both within and between countries’ (Lu & Bolt, 2015, p. 5). Evidence of differential rates of ERS across countries were identified using randomly selected sets of 200 students from each country participating in PISA 2006. Thailand had the lowest mean ERS and Tunisia the highest. To illustrate the effects of ERS, corrected scores were calculated for the environmental responsibility scale (ENV5). Thailand and Tunisia had equal uncorrected scores and, after applying the correction for ERS, the corrected mean scores (μENV) were -.67 and .06 respectively (Lu & Bolt, 2015, p. 13). Low mean scores on attitudinal scales represent positive attitudes. Hence, the uncorrected mean score for Thailand on this scale reflects a more negative attitude than the ERS corrected mean, the reverse for Tunisia. For many countries, there was very little difference in their corrected ENV mean. Lu and Bolt (2015) reported correlations between Buckley’s mean scores and their own: ‘At the country level, we find our estimates of μENJ, μVAL, and μERS to correlate at levels of .446, .679, and .756, respectively, with Buckley’s (2009). Thus, while there is some consistency in how countries are identified with respect to ERS, there are substantial differences from Buckley’s approach in terms of the bias correction’ (Lu & Bolt, 2015, p. 13). When considered as a potential explanation for the attitude–achievement paradox, Lu and Bolt (2015) report there was little correlation between ERS and achievement. Hence, corrected results indicated very little change in the associations between the attitudinal scales and achievement. However, within-country attitude–achievement associations were larger when the ERS correction was applied. Lu and Bolt concluded that ‘despite detectable country differences, most variability in ERS occurs within, as opposed to between, countries’. A different procedure was used by Khorramdel, von Davier, Bertling, Roberts,

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

and Kyllonen (2017) to disentangle effects of response style bias and achievement measures. Using PISA 2012 field trial data, Khorramdel et al. explored the feasibility of using Item Response Theory (IRT) procedures to measure and correct for ERS and midpoint response style (MRS) in the associations between personality dimension scores, achievement scores, mathematics and problem solving. As with Lu and Bolt’s (2015) analyses, the basic strategy involved separating response style variation from variation in the attitudinal constructs, in this case the personality dimensions of perseverance and openness. ERS yielded a ‘medium negative correlation with the cognitive domains (-.54 and -.59), while the MRS measure shows a medium positive correlation (.61 and .63)’ (Lu & Bolt, 2015, p. 85). Comparison of the associations between the achievement scores and personality scores, corrected for response style bias and uncorrected personality scores, indicated lower negative correlations between the personality domains and the achievement scores for both mathematics and problem solving. Differences in the ways response style biases are calculated, and researchers focusing on different attitudinal constructs, make comparisons across these studies difficult. While future developments using these types of analysis hold promise for examining the attitude–achievement paradox, this type of research is in its infancy. We now turn to explore ways that the content of cultures might disentangle some of the strands of the paradox.

MACRO-CULTURAL VALUES Given the variability in economic development and cultural practices across countries participating in ILSAs, it is to be expected that levels of achievement, patterns of selfbelief and associations between them might differ. Disentangling the strands of the paradox requires understanding cultural factors

115

that might influence students’ responses to attitude items. For example, science and mathematics have a global significance in relation to economic, industrial, and technological developments that have occurred in the last two decades. Just as there are substantial differences in socio-economic status within countries and nations, there are substantial differences between countries in access to and adoption of these developments and this has consequences for children’s access to learning opportunities. One way that culture may influence responses to attitudinal items in ILSAs is through macro-cultural values that are part of students’ cultural knowledge and cultural opportunities. Countries can be located in quadrants defined by the intersection of the two major values dimensions derived from the World Values Surveys by Inglehart and colleagues (Inglehart & Baker, 2000; Inglehart & Welzel, 2005): traditional versus secular-rational attitudes to authority, and survival versus self-expression values. Countries representing the more extreme values in the combination, defined as traditional/survival values (i.e., Brazil, Colombia, Mexico, Chile and Turkey), have the highest mean ratings on each of the interest in science constructs. On the other hand, countries representing the more extreme secular-rational/self-expression combination of values (i.e., Denmark, Germany, Japan, Norway and Sweden) have the lowest mean scores on each construct (see also Ainley & Ainley, 2011a). Consistent with the attitude–achievement paradox, this was the reverse of the ordering of science achievement scores. The influence of the macro-cultural context has also been investigated by analysis of the interactive effects of socio-economic status and specific attitudinal constructs. Tucker-Drob, Cheung, and Briley (2014, p. 2048) hypothesized that ‘socioeconomic context and individual science interest would interact to predict individual science achievement, and that this interaction would be evident at both the intranational and

116

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

international levels’. Just as students from higher socio-economic backgrounds within a country who are interested in science are likely to have more options for science-relevant learning experiences than students from lower socio-economic backgrounds, access to science-relevant learning experiences may be contingent on a country’s level of wealth. Therefore, within-countries science interest will interact with the socio-economic level of both family and school. Simultaneously, between-countries, national prosperity interacts with science interest to predict science achievement. Using PISA 2006 data, TuckerDrob et al. (2014) tested a model of achievement that included the main effects of science interest, family socio-economic status (SES), national prosperity (logGDP), and two interaction terms: family SES X science interest, and logGDP X science interest. In addition to the expected main effects, both interaction terms indicated small but significant effects. Tucker et al. investigated the character of these interactions by comparing high and low family SES groups and high and low logGDP groups. High and low were defined as two standard deviations above and below the mean, respectively. The outcome was the same for both interaction terms. There were strong effects for both the high family SES and high logGDP groups and minimal effects for the low family SES and low logGDP groups. Hence, in the same way that family SES moderates the association of science interest on achievement, at the country level prosperity, as indexed by gross domestic product (GDP), moderates the association between science interest and achievement. In sum, part of the explanation of the attitude–achievement paradox lies in the ways that macro-cultural values provide a context within which students develop their attitudes in relation to schooling domains. The examples described here locate these cultural context effects in the degree to which science and technology have been adopted and are embedded in the culture, more particularly, students’ access to science-relevant learning opportunities.

FUTURE IDENTITIES Not unrelated to their access to domainrelated learning opportunities are students’ perceptions of the role of science and mathematics in their personal future. Schreiner and Sjøberg (2007) suggest there may be a disconnect between students’ attitudes towards school science and mathematics and their perception of whether science and mathematics are attractive directions for their future. While science is experienced as interesting in school, it is not necessarily rated highly in students’ perceptions of their choices for an interesting and valuable future occupation. Conclusions from the ROSE project (Relevance of Science Education; Sjøberg & Schreiner, 2010) suggest that while students report finding school science interesting, they do not necessarily see it as an important part of their emerging identity (Schreiner & Sjøberg, 2007; Sjøberg & Schreiner, 2010). The ROSE project is an international project sampling 15-year-old students’ attitudes and values towards science and technology. Data sets from countries with differing levels of technological and economic development have been collected. Austria, Botswana, Estonia, Ghana, Japan, Lesotho, Norway, Spain, Sweden, Turkey and Zimbabwe are just some of the countries participating. The full list can be sourced from Sjøberg and Schreiner (2010). A central premise of the ROSE project is that positive attitudes to science are an important educational outcome and that the character of 15-year-old students’ attitudes and values in relation to school science, especially the observed association with science achievement, need to be interpreted in the context of the issues that they assess to be ‘interesting, important and meaningful for their lives and for society’ (Schreiner & Sjøberg, 2007, p. 232). Schreiner and Sjøberg (2007) proposed that the social context in which students’ lives and experience of schooling is embedded, makes salient options for the future. These options inform and

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

shape students’ identity development. Hence, identity issues reflecting students’ developmental context underlie specific patterns of response to attitude and value items. Across all countries sampled in ROSE, students reported that ‘school science is interesting’, as indicated by all country means being located above the middle of the four-point rating scale. This is similar to the PISA 2006 findings, which indicated that ‘students generally enjoy learning science, with, for example, an average of 63% of students having reported that they were both interested in learning about science and had fun doing so’ (OECD, 2007, p. 140). However, in their analyses, the ROSE researchers showed large cross-country differences in responses to items such as ‘I like school science better than most other subjects’ and ‘I would like to become a scientist’. Using the Human Development Index (HDI6) to rate countries’ level of development, the authors reported that students from more developed countries expressed more negative views to these items while students from less developed countries expressed more positive views. Simultaneously, for an item ‘Working with something I find important and meaningful’ the mean scores for all countries, irrespective of level of development, were high and positive. Schreiner and Sjøberg (2007) suggest that students from all backgrounds perceive school science to be ‘somewhat’ interesting but that perceptions of whether science and technology provide them a meaningful choice for their educational and employment future are linked to their country’s level of development. Students from countries rated high on the HDI were less likely than students from countries with lower rankings to report expecting science to be part of their ‘important and meaningful’ future. In sum, findings from the ROSE project (Schreiner & Sjøberg, 2007; Sjøberg & Schreiner, 2010) identify the way that the broader level of development of the cultural context influences how students perceive

117

science and technology in their personal world; the identities through which they perceive their opportunities and through which they define their personal futures.

SELF-REPORT FRAMES OF REFERENCE Another strand in disentangling the paradox considers the frames of reference students use when answering attitudinal items. There is a broad literature identifying that self-concept within academic domains is based on social comparison processes (Bong & Skaalvik, 2003; Marsh, Trautwein, Lüdtke, & Köller, 2008). This literature has demonstrated variations in the frames of reference students use when making judgements about their own abilities. Students’ frames of reference may be based on perceptions of classmates’ achievement, or on more generalized perceptions of the achievement of their school, or their country. Positive effects on self-concept result when students perceive their achievement to be higher than their reference group, and negative effects occur when the comparison is with a high-achieving reference group. This effect is well known and is often described as the big-fish–little-pond effect (BFLPE) (see e.g., Nagengast & Marsh, 2012). Hence, when the association between self-concept and achievement is measured in waves of PISA data, there is the possibility of both within-country and betweencountry variations in students’ frames of reference. Within-country practices, such as tracking and selective schools, increase variability in frames of reference. Hence, students’ knowledge of differences between schools and between countries in overall levels of achievement provide different frames of reference for perceptions of abilities. Using PISA 2006 data, Van de gaer et  al. (2012) confirmed that these effects were operating at both the school and

118

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

country levels. Consistent with the overall PISA 2006 findings, there was a positive association at the student level between self-concept in science and science achievement. However, at between-school and between-country levels the association was negative. As predicted by Van de gaer et al., where countries had a higher proportion of selective schools the between-school effect was larger. Similarly, the negative association at the between-country level may be related to differences in the ‘educational benchmarks, standards and norms’ operating across countries. The potential for variable frames of reference responding to self-concept items was also described by Van de gaer et al. (Van de gaer & Adams, 2010; Van de gaer et al., 2012) as a response style factor differentiating selfconcept and self-efficacy. Across several PISA waves, self-concept scores consistently have shown the attitude–achievement paradox effect, while self-efficacy scores are positively associated with achievement at both the student level and the country level (see also Kyllonen, Burrus, Roberts, & Van de gaer, 2010; Van de gaer & Adams, 2010), and these differential effects are attributed to the concreteness of the frame of reference. Self-concept items require students to generate their own frame of reference, whereas self-efficacy items supply a concrete frame of reference. To illustrate, self-efficacy items in PISA 2006 required students to report ‘How easy do you think it would be for you to perform the following tasks on your own?’ The set of specific tasks included ‘Explain why earthquakes occur more frequently in some areas than in others’ and ‘Interpret the scientific information provided on the labeling of food items’. The four-point-scale response categories were ‘I could do this easily’, ‘I could do this with a bit of effort’, ‘I would struggle to do this on my own’ and ‘I couldn’t do this’. Hence, in line with Bandura’s (1997) description of self-efficacy, students are giving a self-report

in relation to very specific science tasks. On the other hand, self-concept in science refers to students’ self-evaluation in relation to their ability and interest in the broad domain of science. In this case, the reference standard is relatively open and students draw on their own frame of reference when they respond to self-report items. For example, the selfconcept items in PISA 2006 asked students how much they agreed (‘strongly agree’, ‘agree’, ‘disagree’ and ‘strongly disagree’) with statements such as ‘I learn topics quickly’ and ‘I can easily understand new ideas in ’ (Van de gaer et al., 2012, p. 1125). However, the wording of self-efficacy and self-concept items for separate waves of PISA needs careful consideration in country comparisons. Associations with achievement and frame of reference effects are sensitive to the wording of the items. The PISA 2000 report indicated that academic self-concept, with its reference to ‘most school subjects’ like interest in reading and interest in mathematics, could not be directly compared across countries. On the other hand, selfconcept in reading (e.g., ‘I get good marks in ’), mathematical selfconcept (e.g., ‘I get good marks in mathematics’) and self-efficacy (e.g., ‘I’m certain I can understand the most difficult material presented in readings’) could be compared across countries (OECD, 2003, p. 39). For PISA 2003, 2006, 2012 and 2015, the selfefficacy items included clearly specified tasks and demonstrated a different pattern of results from the more general self-concept items. PISA 2009, which like PISA 2000 had reading as the major domain, did not include self-concept or self-efficacy in the index of engagement. In short, as these examples of self-concept and self-efficacy items demonstrate, the degree to which scale items leave it open for students to answer in terms of their own frame of reference is likely to influence whether results of cross-country comparisons are amenable to clear interpretation.

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

CULTURAL FRAMES OF REFERENCE: FACE, DIGNITY AND MODESTY Comparisons of self-concept and selfefficacy measures in the context of ILSAs acknowledge that self-concept items allow students to draw on their own frame of reference, which might relate to factors such as selectivity of school systems, tracking within a school (Marsh et  al., 2008), and broader differences in achievement across countries (Shen & Tam, 2008). However, as Wang (2015, p. 247) has pointed out from analyses of TIMSS 2007 data on mathematics and self-concept, ‘what matters most is how students perceive, instead of, objective and forced relative standing in class’. Recent analyses have attempted to define more closely some of the cultural factors that contribute to the frames of reference informing students’ judgements of selfconcept. Probably the cultural factor most frequently invoked to explain these differences is the contrast between collectivist and individualistic cultures. Following Triandis (Triandis, McCusker, & Hui, 1990), Heine, Lehman, Peng, and Greenholtz (2002, p. 903) described cultural syndromes as ‘patterns of shared attitudes, beliefs, or values that are organized around a theme and largely shared by members of an identifiable group’. Individualistic and collectivist cultures are distinguished by their focus on the self as an individual or as a member of a relevant group. According to Heine (2004), the tendency for individuals from Western countries to have a strong positive self-evaluation has been widely reported as a positive distortion which reflects a motivational bias. Heine (2004; Heine & Hamamura, 2007) argues that Westerners tend to seek downward social comparison to enhance their self-regard and positive self-views, while East Asians are more concerned about maintaining face in relation to their reference group, which may require incorporating negative information about the self, and so tend to seek upward comparisons. Hence, cultural factors that

119

function as the frames of reference for the differential levels of self-evaluation might explain the attitude–achievement paradox. Using three waves of TIMSS data – 1995, 1999 and 2003 – for both mathematics and science, Shen and Tam (2008) reported finding evidence consistent with the attitude– achievement paradox across three cohorts for both domains. They suggested that an explanation for the paradox can be found in the different academic standards and expectations across countries, especially in domains such as mathematics and science. Students in high-performing countries such as Japan and South Korea usually have relatively low levels of enjoyment/liking the two subjects, low levels of self-evaluation, perceive the two subjects as being hard, and they say that they do not learn the subject matter quickly. On the other hand, students in low-performing countries such as South Africa and Morocco are likely to say that they enjoy learning the two subjects/they like them, they do quite well, they perceive the two subjects as being easy, and they learn the subject matter quickly. (Shen & Tam, 2008, p. 97)

Shen and Tam’s (2008) analysis of TIMSS’s data refers to three self-perception variables: liking, usually doing well, and ease of learning in the domain. The usually doing well item is interpreted as ‘self-efficacy or selfperceived competence’. However, in terms of the analyses from PISA, this type of item would be considered more like self-concept than self-efficacy, the latter being a selfevaluation in relation to a specific mathematics/science task rather than mathematics/science in general. A more recent examination of this issue (Chiu, 2017) characterized these frames of reference as a ‘face culture modesty bias and a dignity culture enhancement bias’ (Chiu, 2017, p. 267). Face cultures tend to have a modesty bias because face is conferred by others rather than depending on self-assessment. Hence, they are more likely to have a self-perception that does not exceed what others are likely to attribute to them. Upward comparison is most common.

120

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

On the other hand, in dignity cultures, students rely on their own judgement and so selfperceptions are at the core for self-concept. Social comparison processes consistent with such modesty and dignity biases have been demonstrated experimentally. For example, White and Lehman (2005) reported that in social comparison tasks, Asian Canadian students made more upward social comparisons than did European Canadian students. As in some of the research discussed earlier, comparison of associations between self-concept and achievement, and between self-efficacy and achievement is central to disentangling this strand of the paradox. For example, Chiu (2017) analyzed PISA 2012 data using a multi-level design consisting of three levels: region, school and student. A group of East Asian regions described as having ‘well-funded and economically equal education systems’, namely, Japan, Singapore, South Korea, Taiwan, Hong Kong, Macau and Shanghai, were compared with the US and with the overall results from 65 countries participating in PISA 2012. For the purposes of this discussion, there were several important findings. First, for the East Asian regions, self-efficacy scores were higher than self-concept scores, while for the US self-efficacy scores were lower than self-concept scores, supporting previous demonstrations that self-concept scores differ between dignity and face cultures (see Chiu & Klassen, 2010). Second, for all regions there was a positive association between both selfconcept and self-efficacy and mathematics achievement but when size of the regression coefficients was considered, consistent with the face hypothesis, the self-efficacy coefficients for the East Asian regions were significantly higher than corresponding selfconcept coefficients. These findings lead Chiu to suggest that self-efficacy may be the more informative indicator in relation to students’ achievement. However, the wording of self-efficacy items needs careful consideration. The eight self-efficacy items in PISA 2006 and PISA

2015, as described above, included clearly specified tasks and this minimizes effects of different self-report frames of reference. Responses to items can be affected by language and cultural difference, resulting in different understandings. In addition, the ways in which the items relate to the latent construct of self-efficacy can differ across cultures. In PISA 2015, student responses from the 2015 survey to these eight items were combined using the IRT generalized partial credit model to measure the latent construct called self-efficacy (OECD, 2017). In addition to checking the internal consistency of the scale in each OECD country, the item and person parameters were compared across countries to ensure the measurement model was a sound fit in each country (He & van de Vijver, 2016; OECD, 2017). Kyllonen and Bertling (2014) report some innovative approaches to measuring non-cognitive attributes. For example, to increase cross-country comparability, some of the questionnaires incorporated anchoring vignettes with descriptions of behaviour representing variations in the construct being measured. Kyllonen and colleagues have reported some success in the PISA 2012 field trial using these anchoring vignettes (Bertling et al., 2016). He and van de Vijver (2016) also argue that nonstatistical strategies need to be used to augment the statistical procedures to investigate the equivalence of items that are translated and adapted. They outline approaches such as overclaiming and forced-choice items as well as anchoring vignettes, which could be developed further to either minimize or correct for response bias. These new directions for item development are reviewed in other chapters in this volume.

CONCLUSIONS Large-scale assessments play an important role in education policy and education

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

planning in many countries. They have become increasingly used as tools for monitoring the effectiveness of educational systems and a body of literature on methods and approaches has emerged (Rutkowski, von Davier, & Rutkowski, 2014). We have based our review of the place of attitudinal constructs in ILSAs around some of the research attempting to disentangle the widely-reported attitude–achievement paradox. Our consideration of this literature suggests that simple comparisons between countries are likely to be invalid, inappropriate and misleading. As noted by He and van de Vijver, comparing measures of achievement ‘does not automatically help students improve their learning nor show how teaching and learning can be improved’ (2016, p. 231). Identifying how learning contexts, processes, attitudes and other non-cognitive factors are associated with achievement can inform policy and practice in more productive ways. Garcia (2014) outlines the non-cognitive skills that support the development of cognitive skills and that should be pursued by schools through instruction and social interaction: ‘patterns of thought, feelings and behavior … socio-emotional or behavioral characteristics that are not fixed traits of personality’ (Garcia, 2014, p. 6). She cites evidence to support this proposition, providing several examples, including from the Consortium on Chicago School Research on the importance of a student-centred learning climate for school improvement (Bryk, Sebring, Allensworth, Easton, & Luppescu, 2010). Understanding relations between attitudinal variables and achievement can be especially informative for future practice. However, these are not necessarily simple linear causal relations and other factors need to be considered for interpretation. Understanding these relations depends on well-developed theory to articulate the models that form the basis of any investigations and sound measurement of the constructs at the heart of those models.

121

Notes 1  See PISA 2006 science framework in the next section. 2  As referred to earlier, this larger set of attitudinal constructs was seen by the PISA 2012 mathematics group as something that would ‘unnecessarily overburden students’ (OECD, 2013, p. 185). 3  Reversed items have been used only in a few of the PISA reading domain scales. 4  ERS was not included in the analysis because of the direct relation with both ARS and DARS. 5  Lu and Bolt (2015) refer to the environmental responsibility scale by the code ENV. 6  HDI – Human Development Index – is based on life expectancy, education and per capita income. Countries participating in ROSE ranged from Uganda, which was the lowest on the HDI, to Norway, which was the highest.

REFERENCES Ainley, M., & Ainley, J. (2011a). Interest in science: Part of the complex structure of student motivation in science. International Journal of Science Education, 33(1), 51–71. Ainley, M., & Ainley, J. (2011b). Student engagement with science in early adolescence: The contribution of enjoyment to students’ continuing interest in learning about science. Contemporary Educational Psychology, 36(1), 4–12. doi:10.1016/j.cedpsych.2010.08.001 Ainley, M., & Ainley, J. (2015). Early science learning experiences: Triggered and maintained interest. In K. A. Renninger, M. Nieswandt, & S. Hidi (Eds.), Interest in mathematics and science learning (pp. 17–32). Washington, DC: AERA Publications. Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179–211. doi:10.1016/0749-5978(91)90020-T Australian Government. (2016). A review of the longitudinal surveys of Australian youth. Canberra: Australian Government. Retrieved from www.lsay.edu.au/publications/2844.html Bandura, A. (1997). Self-efficacy: The exercise of control. New York: Freeman. Baumgartner, H., & Steenkamp, J.-B. E. M. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38, 143–156.

122

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Bertling, J., Marksteiner, M., & Kyllonen, P. C. (2016). General noncognitive outcomes. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective (pp. 255–281). Cham, Switzerland: Springer International. Bertschy, K., Cattaneo, M. A., & Wolter, S. C. (2008). What Happened to the PISA 2000 Participants Five Years Later? IZA (Instutute for the Study of Labor) Discussion Paper Series No. 3323. Bong, M., & Skaalvik, E. M. (2003). Academic self-concept and self-efficacy: How different are they really? Educational Psychology Review, 15(1), 1–40. Bryk, A. S., Sebring, P. B., Allensworth, E., Easton, J. Q., & Luppescu, S. (2010). Organising schools for improvement: Lessons from Chicago. Chicago, IL: University of Chicago Press. Buckley, J. (2009). Cross-national response styles in international educational assessments: Evidence from PISA 2006. Department of Humanities and Social Sciences in the Professions, New York University. Retrieved from https://edsurveys.rti.org/pisa/documents/buckley_pisaresponsestyle.pdf Bybee, R. (2009). PISA’s 2006 Measurement of Scientific Literacy: An Insider’s Perspective for the US. Paper presented at the Science Forum and Science Expert Group PISA 2006 Science, Washington, DC. files.eric.ed.gov/ fulltext/EJ864610.pdf Bybee, R., & McCrae, B. (2011). Scientific literacy and student attitudes: Perspectives from PISA 2006 science. International Journal of Science Education, 33(1), 7–26. doi:10.1080/ 09500693.2010.518644 Chiu, M. M. (2017). Self-concept, self-efficacy, and mathematics achievement: Students in 65 regions including the US and Asia. In J.W. Son, T. Watanabe, & J.-J. Lo (Eds.), What matters? Research trends in international comparative studies in mathematical education (pp. 267–288). Cham, Switzerland: Springer International. Chiu, M. M., & Klassen, R. M. (2010). Relations of mathematics self-concept and its calibration with mathematics achievement: Cultural differences among fifteen-year-olds in 34 countries. Learning and Instruction, 20(1), 2–17. doi:10.1016/j.learninstruc.2008.11.002

Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual Review of Psychology, 53, 109–132. Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation to succeed. In N. Eisenberg (Ed.), Handbook of child psychology: Social, emotional and personality development (Vol. 3, pp. 1017–1096). New York: Wiley. Fensham, P. (2007). Values in the measurement of students’ science achievement in TIMSS and PISA. In D. Corrigan, J. Dillon, & R. Gunstone (Eds.), The re-emergence of values in science education (pp. 215–230). Rotterdam: Sense Publishers. Frydenberg, E., Ainley, M., & Russell, J. (2006). Schooling issues digest: Student motivation and engagement. Canberra: Department of Education, Science and Training. Garcia, E. (2014). The need to address noncognitive skills in the education policy agenda. EPI Briefing Paper No. 386. Washington, DC: Economic Policy Institute. He, J., & van de Vijver, F. J. R. (2016). Bias assessment and prevention in noncognitive outcome measures in context assessments. In S. Kuger, E. Klieme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: An international perspective. Cham, Switzerland: Springer International. Heine, S. J. (2004). Positive self-views: Understanding universals and variability across cultures. Journal of Cultural and Evolutionary Psychology, 2(1–2), 109–122. Heine, S. J., Lehman, D. R., Peng, K., & Greenholtz, J. (2002). What’s wrong with crosscultural comparisons of subjective Likert scales? The reference-group effect. Journal of Personality and Social Psychology, 82(6), 903– 918. doi:10.1037//0022-3514.82.6.903 Heine, S. J., & Hamamura, T. (2007). In search of East Asian self-enhancement. Personality and Social Psychology Review, 11(91), 4–27. doi: 10.1177/1088868306294587 Hidi, S. (1990). Interest and its contribution as a mental resource for learning. Review of Educational Research, 60(3), 549–571. Homel, J., & Ryan, C. (2014). Educational outcomes: The impact of aspirations and the role of student background characteristics. (Longitudinal Surveys of Australian Youth Research Report no. 65). Adelaide: National

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

Centre for Vocational Education. Retrieved from https://www.ncver.edu.au/__data/ assets/file/0022/16780/education-outcomes2669.pdf Inglehart, R., & Baker, W. E. (2000). Modernization, cultural change, and the persistence of traditional values. American Sociological Review, 65, 19–51. Inglehart, R., & Welzel, C. (2005). Modernization, cultural change and democracy: The human development sequence. New York: Cambridge University Press. Jensen, T. P., & Andersen, D. (2006). Participants in PISA 2000 – four years later. In J. Mejding & A. Roe (Eds.), Northern lights on PISA 2003: New dimensions of PISA analysis for the nordic countries. Copenhagen: Nordic Council of Ministers. Kennedy, A., & Trong, K. (2006). A comparison of fourth-graders’ academic self-concept and attitudes toward reading, mathematics and science in PIRLS and TIMSS countries. The Second IEA International Research Conference: Proceedings of the IRC - 2006, Volume 2, pp. 49–60. Amsterdam, International Association for the Evaluation of Educational Achievement. Held in November in Washington DC. Retrieved from http://pub. iea.nl/fileadmin/user_upload/Publications/ Electronic_versions/IRC2006_Proceedings_ Vol2.pdf Khorramdel, L., von Davier, M., Bertling, J. P., Roberts, R., & Kyllonen, P. C. (2017). Recent IRT approaches to test and correct for response styles in PISA background questionnaire data: A feasibility study. Psychological Test and Assessment Modeling, 59(1), 71–92. Kyllonen, P. C., & Bertling, J. (2014). Innovative questionnaire assessment methods to increase cross-country comparability. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 277–285). Boca Raton, FL: CRC Press. Kyllonen, P. C., Burrus, J., Roberts, R., & Van de gaer, E. (2010). Cross-cultural comparative questionnaire issues. Paper for the PISA 2012 Questionnaire Expert Group Meeting, Hong Kong, February 2010. Melbourne: Australian Council for Educational Research.

123

https://www.acer.org/files/kyllonen_burrus_ lietz_roberts_vandegaer_pisaqeg2010.pdf Lipnevich, A. A., MacCann, C., Krumm, S., Burrus, J., & Roberts, R. (2011). Mathematics attitudes and mathematics outcomes of US and Belarusian middle school students. Journal of Educational Psychology, 103(1), 105–188. Lu, Y., & Bolt, D. M. (2015). Examining the attitude achievement paradox in PISA using a multilevel multidimensional IRT model for extreme response style. Large-scale Assessments in Education, 3(2). doi:10.1186/ s40536-015-0012-0 Marsh, H. W. (1993). Academic self-concept: Theory, measurement and research. In J. Suls (Ed.), Psychological perspectives on the self (Vol. 4, pp. 59–98). Hillsdale, NJ: Erlbaum. Marsh, H. W., Trautwein, U., Lüdtke, O., & Köller, O. (2008). Social comparison and BigFish–Little-Pond effects on self-concept and other self-belief constructs: Role of generalized and specific others. Journal of Educational Psychology, 100(3), 510–524. doi:10.1037/0022- 0663.100.3.510 Nagengast, B., & Marsh, H. W. (2012). Big fish in little ponds aspire more: Mediation and generalizability of school-average ability effects on self-concept and career aspirations in science. Journal of Educational Psychology, 104(4), 1033–1053. doi:10.1037/ a0027697 OECD. (2001). Knowledge and skills for life: First results from the OECD programme for international student assessment (PISA) 2000. Paris: OECD Publishing. OECD. (2002). Reading for change: Performance and engagement across countries: Results from PISA 2000. Paris: OECD Publishing. OECD. (2003). Learners for life: Student approaches to learning. Results from PISA 2000. Paris: OECD Publishing. OECD. (2004). Learning for tomorrow’s world: First results from PISA 2003. Paris: OECD Publishing. OECD. (2007). PISA 2006: Science competencies for tomorrow’s world (Vol. 1: Analysis). Paris: OECD Publishing. OECD. (2010a). Pathways to success: How knowledge and skills at age 15 shape future lives in Canada. Paris: OECD Publishing.

124

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

OECD. (2010b). PISA 2009 results: Learning to learn. Student engagement, strategies and practices. (Vol. 3). Paris: OECD Publishing. OECD. (2012). Learning beyond fifteen: Ten years after PISA. Paris: OECD Publishing. OECD. (2013). PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving and financial literacy. Paris: OECD Publishing. OECD. (2016a). PISA 2015 assessment and analytical framework: Science, reading, mathematic and financial literacy. Paris: OECD Publishing. OECD. (2016b). PISA 2015 results: Excellence and equity in education (Vol. 1). Paris: OECD Publishing. OECD. (2017). Procedures and construct validation of context questionnaire data. In PISA 2015 technical report. Paris: OECD Publishing. Retrieved from www.oecd.org/pisa/sitedocument/PISA-2015-Technical-Report-Chapter16-Procedures-and-Construct-Validation-ofContext-Questionnaire-Data.pdf Pajares, F. (1996). Self-efficacy beliefs in academic settings. Review of Educational Research, 66(4), 543–578. Parker, P. D., Marsh, H. W., Ciarrochi, J., Marshall, S., & Abduljabbar, A. S. (2014). Juxtaposing math self-efficacy and self-concept as predictors of long-term achievement outcomes. Educational Psychology, 34(1), 29– 48. doi:10.1080/01443410.2013.797339 Pekrun, R. (2000). A social-cognitive, controlvalue theory of achievement emotions. In J. Heckhausen (Ed.), Motivational psychology of human development (pp. 143–163). Oxford: Elsevier Science. Pintrich, P. R., & De Groot, E. V. (1990). Motivational and self-regulated learning components of classroom academic performance. Journal of Educational Psychology, 82, 33–40. Pintrich, P. R., Marx, R. W., & Boyle, R. A. (1993). Beyond cold conceptual change: The role of motivational beliefs and classroom contextual factors in the process of conceptual change. Review of Educational Research, 63(2), 167–199. Renninger, K. A., Hidi, S., & Krapp, A. (Eds.). (1992). The role of interest in learning and development. Hillsdale, NJ: Lawrence Erlbaum Associates.

Russell, J., Ainley, M., & Frydenberg, E. (2005). Schooling Issues Digest: Student motivation and engagement. Canberra; Department of Education, Science and Training. Rutkowski, L., von Davier, M., & Rutkowski, D. (2014). Handbook of international largescale assessment: Background, technical issues, and methods of data analysis. Boca Raton, FL: CRC Press. Schiefele, U. (1996). Topic interest, text representation, and quality of experience. Contemporary Educational Psychology, 21(1), 3–18. Schiefele, U. (2009). Situational and individual interest. In K. Wentzel & A. Wigfield (Eds.), Handbook of motivation at school (pp. 196– 222). New York: Routledge. Schiepe-Tiska, A., Roczen, N., Müller, K., Prenzel, M., & Osborne, J. (2016). Science-related outcomes: Attitudes, motivation, value beliefs, strategies. In S. Kruger, E. Kleme, N. Jude, & D. Kaplan (Eds.), Assessing contexts of learning: Methodology of educational measurement and assessment (pp. 301–329). Cham, Switzerland: Springer International. Schreiner, C., & Sjøberg, S. (2007). Science education and youth’s identity construction – two incompatible projects? In D. Corrigan, J. Dillon, & R. Gunstone (Eds.), The re-emergence of values in science education (pp. 231–248). Rotterdam: Sense Publishing. Shen, B. C., & Tam, H. P. (2008). The paradoxical relationship between student achievement and self-perception: A cross-national analysis based on three waves of TIMSS data. Educational Research and Evaluation, 14(1), 87–100. doi:10.1080/13803610801896653 Sjøberg, S., & Schreiner, C. (2010). The ROSE project: An overview and key findings. Retrieved from https://roseproject.no/network/countries/norway/eng/nor-SjobergSchreiner-overview-2010.pdf Triandis, H. C., McCusker, C., & Hui, C. H. (1990). Multimethod probes of individualism and collectivism. Journal of Personality and Social Psychology, 59(5), 1006–1020. Tucker-Drob, E. M., Cheung, A. K., & Briley, D. A. (2014). Gross domestic product, science interest, and science achievement: A person × nation interaction. Psychological Science, 25(11), 2047– 2057. doi:10.1177/0956797614548726 Van de gaer, E., & Adams, R. (2010). The Modeling of Response Style Bias: An Answer to

NON-COGNITIVE ATTRIBUTES: MEASUREMENT AND MEANING

the Attitude–Achievement Paradox? Paper presented at the AERA Conference, Denver, CO. Retrieved from https://acer.org/files/vandegaer_paper_aera2010.pdf Van de gaer, E., Grisay, A., Schulz, W., & Gebhardt, E. (2012). The reference group effect: An explanation of the paradoxical relationship between academic achievement and self- confidence across countries. Journal of Cross-Cultural Psychology, 43(8), 1205–1228. doi:10.1177/0022022111428083 Wang, Z. (2015). Examining big-fish–littlepond-effects across 49 countries: A multilevel

125

latent variable modelling approach. Educational Psychology, 35(2), 228–251. doi:10.1080/01443410.2013.827155 White, K., & Lehman, D. R. (2005). Culture and social comparison seeking: The role of selfmotives. Personality and Social Psychology Bulletin, 31(2), 232–242. doi:10.1177/ 0146167204271326 Williams, T., Williams, K., Kastberg, K., & Jocelyn, L. (2005). Achievement and affect in OECD nations. Oxford Review of Education, 31(4), 517–545. doi:10.1080/ 03054980500355427

7 Methodological Challenges to Measuring Heterogeneous Populations Internationally Leslie Rutkowski and David Rutkowski

The first large-scale survey of international educational achievement, the Pilot TwelveCountry Study (Foshay, Thorndike, Hotyat, Pidgeon, & Walker, 1962), included participants from mostly European countries (excepting Israel and the US), with reasonably well-developed economies.1 Of course, Poland and Yugoslavia were under communist rule during data collection in 1960, much of Western Europe was still recovering from the Second World War, and Israel – as a modern Jewish state – was in its infancy. Yet each country shared meaningful geographic or cultural commonalities with some or all of their fellow participants, making them a much more homogeneous group of participants than we currently observe in modern international assessment. In contrast, the 2015 cycle of the Programme for International Student Achievement (PISA) featured 72 system-level participants from all continents, excluding Antarctica. Among these participants, a range of sub-national systems took part, including Singapore, four Chinese provinces and

municipalities, Massachusetts, and the Autonomous Region of the Azores (OECD, n.d., retrieved December 3). And national-level participants spanned a range of income brackets, 90 languages or dialects, and a variety of cultures. Further, 15 countries plus Puerto Rico completed a paper-based assessment, while the remaining participants were administered a computer-based version of PISA. Such a heterogeneous collection of participating educational systems poses challenges in terms of deciding what should be measured and how to measure it in a comparable way. To that end, carefully developed assessment frameworks for PISA and similar studies (e.g., the Trends in International Mathematics and Science Study [TIMSS]) guide this process. From developing an internationally recognized definition of the content domains (International Association for the Evaluation of Educational Achievement, 2013; OECD, 2016b) to the complex process of instrument translation (OECD, 2014b), cross-cultural considerations loom large. But regardless of the care with which a

127

METHODOLOGICAL CHALLENGES TO MEASURING HETEROGENEOUS

Table 7.1  Number of OECD and partner countries since 2000 with GDP in 2015 USD Year 2015 2012 2009 2006 2003 2000

Number of OECD countries Average GDP (in 2015 USD) Number of partner countries Average GDP (in 2015 USD) 34 34 34 28 30 28

$36,810 $41,819 $40,767 $39,836 $33,354 $27,965

study such as PISA or TIMSS is designed, a cross-cultural measurement paradox exists: ‘The larger the cross-cultural distance between groups, the more likely cross-cultural differences will be observed, but the more likely these differences may be influenced by uncontrolled variables’ (van de Vijver & Matsumoto, 2011, p. 3). In the context of international large-scale assessments (ILSAs), this issue is particularly germane, given recent and forecasted growth in the number of study participants, especially in PISA. For 2018, 80 educational systems have committed to participation, with new additions that include Brunei Darussalam, a relatively wealthy newcomer to international assessments, the Philippines, which hasn’t participated in an international study since TIMSS in 2003, and Belarus, a first-time participant with a per capita GDP that is on par with Thailand and South Africa (World Bank, 2017b). In the current chapter, the issue of cultural heterogeneity in international assessments serves as a backdrop against which we consider several methodological challenges to measuring such diverse populations. In addition, we highlight some recent operational advances in international assessment for dealing with cross-cultural differences and propose ways that ILSAs might continue to account for system-level heterogeneity in the measures and methods used to analyze them.

SOME GROWING DIFFERENCES As mentioned above, ILSAs have significantly grown and changed over time. The clearest instantiation – and the one that we focus on in

37 31 40 27 11 15

$15,149 $22,952 $17,856 $16,763 $18,212 $13,556

this paper – is PISA. It is important to note, however, that studies such as TIMSS and the Progress in International Reading Literacy Study (PIRLS) have also changed in meaningful ways. Table 7.1 gives some indication of the way in which PISA, a triennial study of 15-year-olds’ achievement in math, science, and reading has evolved over its nearly 20-year existence. Notably, the total number of countries has grown steadily, from 47 in 2000 to 71 in 2015. And with the number of OECD countries relatively stable over this time span, the majority of growth has come from what are termed partner countries – participating educational systems that are not members of the OECD. A clear pattern that can be discerned from these data is that the income gap – measured here as the difference in average percapita gross domestic product (GDP) between OECD and non-OECD countries – has continued to widen over the course of the study. Figure 7.1 illustrates the steady growth in income differences, which can be at least partly attributed to the addition of relatively low-income countries in recent cycles. And given that additions to the 2018 cycle are primarily lower income countries, this gap will only grow. Important in the context of PISA is that lower income countries will necessarily have less developed infrastructure in general, less familiarity with standardized tests, and – based on previous experience with middleincome countries in PISA – poorer expected performance (OECD, 2016d). PISA for Development (PISA-D) is a further indicator of the changing face of international assessment. Started in 2013, PISA-D is designed with the needs of middle- and

128

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Figure 7.1  Difference in per capita GDP in 2015-adjusted US dollars, estimated as GDPOECD – GDPPartner (dollar amounts in thousands)

low-income countries in mind. According to the OECD: PISA for Development aims to increase middle- and low-income countries’ use of PISA assessments for monitoring progress towards nationally-set targets for improvement, for the analysis of factors associated with student learning outcomes, particularly for poor and marginalised populations, for institutional capacity-building and for monitoring international educational targets in the Education 2030 framework being developed within the UN’s thematic consultations. (OECD, n.d., retrieved February 24)

The study includes seven low-income countries, including Cambodia, Ecuador, Guatemala, Honduras, Paraguay, Senegal, and Zambia. Panama, a middle-income country, is also a participant. The economic and political difference between these countries and other PISA participants is stark. As one example, Senegal

had an annual GDP per capita of just $900 USD in 2015 (World Bank, 2017b). Further, the country has an estimated illiteracy rate of 40%; places 34 of 225 in infant mortality rates; and 46.7% of the population live below the poverty line (US Central Intelligence Agency, 2017). Based on an extensive evaluation of the technical feasibility of using existing items to measure PISA-D participating countries, Adams and Cresswell (2016) concluded that ‘item-bycountry interactions … appear to be enormous for developing countries’ (p. 8), suggesting that measurement differences could impact construct comparability across countries. Nevertheless, the OECD declined to develop new items for this context (Kennedy, 2016), instead selecting items from existing pools, leaving the issue of measurement differences still at issue. Adams and Cresswell cautioned

METHODOLOGICAL CHALLENGES TO MEASURING HETEROGENEOUS

the OECD that careful analyses should be conducted during the field trial to examine the impact on scale validity. This last point is particularly important, given that PISA-D results should be reported ‘in the context of the international PISA scales’ (OECD, 2016d). From a political perspective, it is also worth considering governmental transparency among PISA newcomers and PISA-D participants. The Philippines rank 101 of 175 compared countries, while Cambodia – one of the most corrupt countries in the world – is ranked at 156 of 175, according to 2016 figures from Transparency International (2017). It is notable that in the case of relatively corrupt PISA countries (Kazakhstan, ranked 136; Albania, ranked 83 and second lowest in Europe after Macedonia; and Argentina, ranked 95), data irregularities resulted in qualifications for 2015 PISA results for those countries (OECD, 2016c). Of course, other countries with similarly low corruption indices were not reported to have data irregularities; however, it is notable that no country low on the corruption index was singled out for these types of issues. Given the aforementioned context, the remainder of the chapter is organized as follows. First, we lay out several challenges to measuring such a diverse amalgamation of participants. As part of this discussion, we discuss several examples from PISA or other international education studies. Next, we briefly discuss the state of the field in terms of the ways in which international studies like PISA are dealing with the inherent challenges that come from measuring dozens of heterogeneous populations. Finally, we offer several areas, albeit in need of in-depth research, where international surveys might consider modifications to current operational procedures to improve cross-cultural comparability.

MEASUREMENT CHALLENGES Any study of the size and scope of PISA will necessarily bring with it challenges, some of

129

which can be attributed to large cross-cultural differences in language, income, geography, culture, and religion. Because of or in spite of these differences, the nature of PISA – a selfreported survey and achievement test – must be developed in a way that renders it reasonably valid and comparable across the participating educational systems. Barriers to comparability include differences in the way a construct is understood, differences in the measurement properties of constructs – including response styles and other measurement differences – and the inherent challenge in writing questionnaire and achievement items that can be reasonably translated into 90 different languages and still retain their original meaning. Although there are certainly numerous areas that could be described here, we limit our focus to three key areas: defining, operationalizing, and measuring comparable constructs; instrument translation; and drawing representative samples.

Defining, Operationalizing, and Measuring Comparable Constructs From a validity perspective (Kane, 1992; Messick, 1995), construct definitions that are theoretically-grounded and well defined, are fundamental to operationalizing and measuring concepts in dozens of heterogeneous populations. To that end, a perennial challenge exists: is it reasonable or possible to define a construct in such a way that it means the same thing across all populations of interest? For example, does instrumental motivation (Gardner & MacIntyre, 1991) – defined by the OECD as a motivation to learn because students see a benefit for their future studies or careers (OECD, 2014a) – mean the same thing across educational systems? The universe of possible futures for students across cultures will reasonably be quite different, depending on a given country’s income levels, institutional infrastructure, and professional culture, among other dimensions. This suggests that motivation to learn

130

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

for future opportunities will likely mean very different things in North America or Western Europe in comparison to Tamil Nadu, India, Mauritius, or Uruguay. These differences introduce challenges to the mapping that is made from motivation items back to the motivation domain (and the valid inferences that one might wish to draw) (Rubin, 1988). In a similar vein, it is important to consider the theoretical definition of achievement domains (e.g., math, science, and reading). Although achievement constructs might be less influenced by cultural differences than affective, behavioral, experiential, and socioemotional domains, particularly with respect to theoretical definitions of what math, science, and reading are, there may be dimensions within these broad concepts that should be encompassed or excluded in a given country. For example, a country for whom agriculture is a meaningful percentage of gross domestic product (e.g., Albania at 23%; World Bank, 2017a) might define mathematics as including principles that relate to crop yields, weather patterns, and global commodity prices. In contrast, OECD countries rely on agriculture for just 1.6% of economic production (World Bank, 2017b), suggesting that the addition of an agricultural dimension would be unwanted or not useful. When questions on an assessment do not ‘look like’ topics that are relevant to a population, issues around face validity emerge. We know from testing literature that ‘when a test looks appropriate for the performance of a situation in which examinees will be expected to perform, they tend to react positively’ and such positive reactions are believed to produce higher engagement and thus achievement (Sackett, Schmitt, Ellingson, & Kabin, 2001, p. 316). Related to the theoretical construct definition is the operational definition, which ‘describes the procedures to follow to form measures of the [construct] that represent a concept’ (Bollen, 1989, p. 181). As one example, direct observation of a task that is associated with an informational reading passage (e.g., navigating a bus schedule to

arrive at a birthday party 15 minutes early), supported by think-aloud protocols would certainly provide rich information about what a group of students know and can do. But the sheer size of PISA makes the cost and time burden too high to pursue such an operationalization of informational reading in a sample of around 500,000 students. As a result, in studies like PISA, these procedures involve administering a questionnaire to students, teachers, parents, school leaders, and country representatives and an achievement test to students. In contrast to achievement measures, questionnaires are more vulnerable to responses-style bias, such as socially desirable responding (choosing an answer that projects a favorable image of the respondent), acquiescent response style (a tendency to agree), and extreme response style (a tendency to use the endpoints of the scale, such as strongly agree or strongly disagree) (Johnson, Shavitt, & Holbrook, 2011). Response-style biases are systematic in nature; however, random biases are also an important issue when measuring students’ background. Specifically, context questions with heavy reading burdens will be plagued by more measurement error in countries with low reading proficiency, as students guess at or skip these items. As one example, we estimated the Gunning fog index2 for a single PISA questionnaire item, IC013 from the optional Information and communication technology familiarity questionnaire that asks students how they feel when using various digital media devices. Using a Gunning fog index calculator (Bond, n.d.), our estimate is 13.00, indicating that a reader needs 13 years of formal education to understand the text on a first reading. Given that PISA assesses 15-yearolds, this is an awfully high reading burden for the typical test taker. Notably, this applies only to the English version of the questionnaire and could differ depending on both the translation and language. Missing or measurement error-prone context questionnaire data have ramifications beyond the obvious. In particular, the complex methods used to estimate

METHODOLOGICAL CHALLENGES TO MEASURING HETEROGENEOUS

population and subpopulation achievement (Mislevy, Beaton, Kaplan, & Sheehan, 1992; von Davier, Gonzalez, & Mislevy, 2009) rely on completely observed data that have no measurement error. When these assumptions are untenable, group differences are biased and cross-cultural comparisons of, especially, subpopulation achievement estimates can be inaccurate (Rutkowski, 2011, 2014). The final issue, from a construct validity perspective, emphasizes measuring comparable constructs. In a cross-country setting, this is only achieved after a careful analysis of the measurement properties of the items that underlie some theoretical construct or latent variable. In particular, the patterns and values of parameters that describe the relationship between observed indicators and the latent variable they represent must be equivalent for cross-cultural comparisons to be valid (Meredith, 1993; Millsap, 2011). For example, a comparable measurement model of well-being in school should strictly have an identical number of factors and items. Further, the patterns of relationships among the factors and items should be the same and the parameters that describe these relationships should also be equivalent. When parameters are not equal, differences in levels of the construct cannot be disentangled from differences in measurement properties, making any comparison across countries inappropriate. In practice, evidence abounds that on the context questionnaire side of international assessments, there are meaningful crosscountry measurement differences in important constructs, including socioeconomic status (D. Rutkowski & Rutkowski, 2013; L. Rutkowski & Rutkowski, 2017) and students’ approaches to learning (Segeritz & Pant, 2013), among others. Similarly, in terms of measuring achievement in ILSAs, ample research has shown that cross-cultural measurement differences exist (Goldstein, 2004; Oliveri & von Davier, 2014; Rutkowski, Rutkowski, & Zhou, 2016) and, as one example, can be attributed to translation issues (Ercikan, 1998, 2003). We briefly provide an overview of the translation issue subsequently.

131

Instrument Translation In an effort to contend with some of the thorny issues around translation, Hambleton and Zenisky (2011) developed a comprehensive and validated review form to offer sound guidance for judgmental reviewers who are evaluating the quality of questionnaire translations in research and practice. A set of 25 guiding questions helps to uncover possible translation issues during the developmental phase of the questionnaire. Issues around meaning and nuance; language structure, grammar, and complexity; and culturally sensitive topics that are introduced through translation are included in a straightforward checklist. Nevertheless, challenges, particularly in the context of growing heterogeneity in international assessments, remain at the fore. Nationally and internationally valid measures depend, importantly, on well-translated instruments that retain their original meaning while also adhering to the cultural specifics of the language into which the instrument is translated. A critical first step is to write good questions that ‘travel well’ across good translations (Smith, 2003). And, as noted above, with 90 total languages and dialects in the last PISA cycle, this is no mean feat. For example, the Norwegian word koselig literally translates to English as cozy, which American Englishspeakers often take to mean warm and comfortable and we often assign this adjective to objects like sweaters and small houses. But the Norwegian meaning conveys a broader cultural concept that imbues deep warmth, leans toward notions of emotional and physical well-being (Lomas, 2016), and can be used to describe many things, from a coffee cup to a person or even a positive professional collaboration! Clearly something is lost in translation. Beyond the actual translation are a host of nuanced cultural matters that have clear and meaningful impacts on the way in which respondents engage with questionnaires and respond to individual items. As just one example, cultural differences in cognition (e.g., different concepts of self     ) give rise to very

132

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

different response patterns between Asian and Western respondents (Schwarz, 2003). Another issue regards a reliance on contextual information – of particular importance in a standardized setting where the test administrator is not in a position to clarify the meaning of questions (Schwarz, 2007). For example, the PISA 2015 school management questionnaire asks about the frequency of certain behaviors from the school leader. One such question stem is ‘I pay attention to disruptive behavior in classrooms’ (with options from did not occur to more than once a week). Although the cultural norms in some societies would imply that disruptive behavior would only come from students, in other societies teachers might also contribute to disruptions. Further, the response option did not occur might be selected because the school leader rarely goes into the classroom or because the classes are so well-ordered that there is rarely an opportunity to observe disruptive behavior. An unsure school leader would need to rely on the context of the questionnaire to reasonably infer the intended meaning. These cultural issues lead some researchers to argue that functional equivalence is an alternative aim as opposed to creating identical measures (Braun & Mohler, 2003); however, this issue poses methodological issues with establishing measurement equivalence, as the variables that measure the underlying construct are not identical.

Drawing Representative Samples As international studies of educational achievement continue to recruit participants from increasingly heterogeneous groups of countries, drawing representative samples of the population of interest is made more challenging. This can best be evinced through the PISA-D Strand C, which aims to measure out-of-school populations in participating countries (Carr-Hill, 2015). Before delving into particular challenges around this study, we very briefly review a few key concepts in sampling as they pertain to PISA. Assuming

a finite population with some number of units (in this case, students), interest lies in observing a sample of units from the population and recording the value of some variable of interest (e.g., mathematics achievement). To select a sample of students, some sampling design is defined to guide the procedure. In general, PISA uses a two-stage stratified sampling design, where the first stage units are schools that have 15-year-olds enrolled (OECD, 2014b). The population of eligible schools is partitioned into strata based on some criteria (e.g., geography, school management structure, or school size) and a suitable sample is drawn within each stratum. The second stage sampling units are students within the sampled schools. From the complete list of 15-year-olds in a sampled school, a sample of students is drawn for participation. Importantly, this approach relies on knowing the locations of all schools in the country where 15-year-olds are students. And although this is not an insurmountable challenge in countries with a well-organized administrative infrastructure, it is much more complicated in impoverished countries or in countries with large remote or undocumented populations. Additional complexities are introduced in low-income countries when the target population is 15-year-olds who are not in school. Typically, out-of-school children generally tend to be poorer and mainly in rural areas; however, all countries in PISA-D (except Paraguay, with no reported data) have slum populations that account for between 26% (Panama) and 55% (Cambodia) of the urban population in 2014 (World Bank, 2017c). According to an OECD working paper on the topic of measuring out-of-school populations, birth registration is just 14% in Zambia (CarrHill, 2015). In Cambodia, where the effects of the near total decimation of schools during the Khmer Rouge reign (Clayton, 1998) still reverberates, nearly 40% of the country’s rural poor 15-year-olds are not in school (Carr-Hill, 2015). Taken as a whole, these conditions make it extremely difficult to count

METHODOLOGICAL CHALLENGES TO MEASURING HETEROGENEOUS

and locate 15-year-olds. To that end, the same OECD working paper concluded that PISA-D administrators would need to visit 10 households to find one out-of-school 15-year-old in the participating countries. Further issues surround administrator security in urban slums, political instability in some regions (e.g., the Casamance region of Senegal), or systemic, widespread gang problems (e.g., Honduras). In the case of Honduras, extortion is a common funding source for gangs, with students, teachers, and schools frequently victimized (Ordonez, 2015). At the extreme, one private school near the capital city of Tegucigalpa is believed to have temporarily closed due to extortion demands (Institutos privados, 2017). Clearly, sampling even enrolled students in this environment would be challenging. Although the above examples are not comprehensive, they are representative of the types of methodological and logistical difficulties that can be attributed to developing and administering a comparable instrument to large numbers of increasingly heterogeneous countries. Certainly, PISA and similar studies go to reasonable lengths to ensure that instruments, sampling designs, analysis procedures, and reporting plans are carefully developed and reviewed by panels of international experts. With that in mind, in the next section, we briefly review the current operational mechanisms in place in PISA to ensure that the study is of high quality. Nevertheless, these and other issues necessitate a careful and measured interpretation of PISA and similar study results. Further, there is much work to be done to improve international assessments in future rounds, particularly as more and more diverse countries join studies like PISA.

SOME WAYS THAT PISA ADDRESSES CROSS-CULTURAL DIFFERENCES The complete details of the PISA development and administration process are well outside the scope of this chapter. Interested readers are

133

invited to consult the technical documentation for previous PISA cycles (e.g., OECD, 2012, 2014b, and similar) and the assessment and questionnaire framework documents (e.g., OECD, 2013, 2016b, and similar). In what follows, we briefly summarize some methods and accommodations in PISA that better account for cross-cultural difference. An innovation in the 2009 PISA cycle was the option of including easier booklets into the assessment for educational systems with low expected performance. Countries that chose not to include easier booklets were administered the ‘standard’ test. Twenty educational systems, including two OECDmember countries chose the easy-booklet option in 2009. The option was continued in the 2012 and 2015 cycles, but final numbers are not available in the technical report. This effort, incorporated only into the main domain of the assessment, was intended to better capture what students in low-performing countries know and can do and to provide a more satisfying test experience for students with low levels of content domain proficiency (OECD, 2012). Because the PISA 2015 technical documentation is not complete at the time of writing, the PISA 2012 booklet design is represented in Table 7.2. Content clusters are denoted by a letter and a number, where M, S, and R indicate math, science, and reading, respectively. Regardless of a country’s choice to participate in the standard booklet or easier booklet administration, 13 booklets were administered in a rotated, random fashion so that each booklet would be administered an approximately equal number of times within a country. Note that of nine total math clusters M6A and M7A were the optional standard clusters, while M6B and M7B were the optional easier clusters. No other clusters were subject to this choice. And, as a reminder, no reading or science clusters featured the ‘easy’ option. This design resulted in booklets 1 to 7 corresponding to the standard option while booklets 21 to 27 corresponded to the easy option. Further, booklets 8 to 13 represented

134

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Table 7.2  PISA 2012 booklet design Designation Standard only

Core

Easy only

Booklet ID 1 2 3 4 5 6 7 8 9 10 11 12 13 21 22 23 24 25 26 27

Cluster M5 S3 R3 M6A M7A M1 M2 S2 R2 M3 M4 S1 R1 M5 S3 R3 M6B M7B M1 M2

S3 R3 M6A M7A S1 M2 S2 R2 M3 M4 M5 R1 M1 S3 R3 M6B M7B S1 M2 S2

the core booklets and were administered in all countries to ensure sufficient test material for linking. An important point to highlight is that core reading clusters (i.e., M1, M2, M3, M4, and M5) appear in the standard, core, and easy booklets, but the core booklets contain no easy or standard clusters (i.e., M6A/B and M7A/B). As part of the development of PISA background or context questionnaires, countries have the possibility to include so-called national options, whereby individual educational systems can add limited sets of tailored items to the background questionnaires. A prime example of a frequently used national option is the inclusion of countryspecific items in the wealth scale. According to the PISA technical report, this scale is comprised of eight international items asking students about their household possessions (a room of their own, a link to the internet, a DVD player, and the number of cellular phones, televisions, computers, cars, and bath/shower rooms in their house). There

Standard booklet set Easier booklet set M6A M7A S1 R1 M1 R2 M3 M4 M5 S3 R3 M2 S2 M6B M7B S1 R1 M1 R2 M3

S2 R2 M3 M4 M5 M6A M7A S1 R1 M1 M2 S3 R3 S2 R2 M3 M4 M5 M6A M7B

X X X X X X X X X X X X X

X X X X X X X X X X X X X

is also the possibility of up to three countryspecific items, such as an iPhone or an iPad in Norway (OECD, 2014b). Educational systems also have the possibility of including other items and scales that are relevant for their needs. A key example is the race/ ethnicity items that are added to the US student questionnaire (National Center for Educational Statistics, n.d.). In addition to allowing for differences in the content of the assessment and the context questionnaire, recent PISA innovations also permit model-based allowances for cross-cultural differences. In PISA 2012 and previous cycles, models used to estimate population achievement adhered to an assumption wherein the parameters that described the relationship between math, science, and reading items and the underlying latent variable were identical across all participating countries. In other words, there should be no differential item functioning or item bias. Where strong deviations from this assumption were found, the items received a

METHODOLOGICAL CHALLENGES TO MEASURING HETEROGENEOUS

dodgy label and were discarded altogether or were not used for scoring individual countries where the item was problematic. In the 2015 cycle, operational procedures changed to include so-called item-by-country interactions (Glas & Jehangir, 2014; Oliveri & von Davier, 2011, 2014). A second significant change was the sample used to estimated item parameters. Previously, PISA used a random sample of 500 examinees from each OECD member country as the item calibration sample (OECD, 2014b). The common item parameter estimates (stemming from equal contributions from OECD member countries) were subsequently applied to the item responses from all participating systems (both OECD and non-OECD) in the scoring process. In the 2015 cycle, this practice changed such that all data (weighted for equal contribution) from all countries was used to estimate common item parameters in a first step, followed by admission of itemby-country interactions, as previously noted. Similar to the scaling model for the achievement domains, changes were also made to the background scales in PISA 2015. In particular, a formal investigation of measurement equivalence was conducted on the student, school, and (optional) teacher and parent background scales. Where crosscountry parameter differences were found, a process of relaxing equality constraints in limited cases was pursued (OECD, 2016a). This approach is commonly referred to as partial invariance (Byrne, Shavelson, & Muthén, 1989). Given the partial and draft status of the technical report, full details for the procedures and results are not publicly available at the time of writing. The above examples are, again, not comprehensive; however, they are a reasonable snapshot of some of the ways in which PISA is recognizing and adjusting to the changing composition of participating educational systems. As the study continues to grow, it is important that the OECD and independent researchers continue to consider reasonable ways for further accommodating

135

cross-cultural differences in achievement and other domains. And where the limit of accommodations is reached, it is important to recognize and report on those limitations. We discuss some of these topics in the final section of this chapter.

WHAT CAN BE DONE TO IMPROVE ILSAS IN THE FUTURE? As international studies, such as PISA, grow and change over time to include more and more diverse countries, there are several areas in which to better accommodate crosscultural differences. In what follows, we describe two possibilities for strengthening measurement in increasingly heterogeneous populations, one of which is relevant for the background questionnaires and one of which is specific to the assessment. One possibility for better measuring the context of highly diverse countries is to incorporate more flexibility into the background questionnaires. This can be achieved in several ways; here, we describe two. First, a questionnaire design that features a core set of questions and a larger country-specific set of questions would allow for highly customizable questionnaires while also maintaining some degree of comparability. As it is of fundamental interest in policy and research to make direct comparisons across at least groups of countries, it is important that key questions and scales are kept comparable internationally. For example, proxies for socioeconomic status figure prominently into nearly any conversation about equity in education. Similarly, given the large demographic changes taking place in Europe and elsewhere due to immigration, indicators of immigrant status are of critical interest for many countries. Clearly, these and related demographic information (e.g., sex) should stay constant across countries. Beyond these topics, which should be consensually agreed upon by participants, there is great possibility

136

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

for otherwise tailored questionnaires that meet individual participating educational system’s needs. Of course, this places a greater operational burden on contractors that are responsible for data processing and ensuring sufficient data quality, making cost versus benefit a priority in these sorts of decisions. It is also necessary that countries who are interested in such an approach have the necessary infrastructure for developing and validating country-specific questionnaires. Given that many new PISA participants tend to be lower-income countries, this creates an unavoidable tension. One possibility here is for groups of countries that share regional, cultural, or other commonalities to collaborate and pool resources for developing better quality questionnaires. Second, adding more country-specific questions to existing scales is a reasonable way forward. Again, socioeconomic status is a good example of an area that would benefit from such an approach. In the Nordic region, there is very little variability on international home possession items, such as a student’s own room and an internet connection, where 95% of students responded yes to these items (L. Rutkowski & Rutkowski, 2017). It could be useful, then, for this relatively wealthy region to identify objects that better differentiate between homes with more and fewer resources. Importantly, Rutkowski and Rutkowski found that adjusting models for the unique context of the country results in better model–data fit, while preserving cross-cultural comparability of the underlying scale. A final example of improving measurement in international assessments pertains especially to PISA-D. Given that a reasonable goal of participating in PISA-D is to be compared internationally, and in the context of main PISA, it is important to develop a framework for determining when a country is suited to the task of main participation. Psychometric and other considerations will necessarily be a part of such an endeavor and such a framework could be modeled after

conditions for membership in the European Union, where key economic, political, and institutional criteria must be met. This is an area where we are currently engaged in work with colleagues, particularly from the psychometric aspect. A key matter is the degree to which average proficiency can differ from the main PISA test specification (especially test difficulty) while still managing to adequately measure the country.

CONCLUSION As the number of participating systems increases so will the diversity of cultures and proficiencies. Embracing this diversity within a single assessment is not an easy or well-understood task. In their text on assessment, Thorndike and Thorndike-Christ (2009) fittingly conclude ‘a test developed on a national basis is always less valid than an equally well-developed test tailored specifically to the local objectives’ (p. 164). Moving beyond the nation to the international context of over 90 educational systems introduces a set of complex and thorny issues around the validity of the assessment. Such concerns were most likely unimaginable to those who developed current practices in assessment. As such we now require both a new technical and conceptual understanding of international assessment. Adding more complication to an already problematic situation is that, as PISA and other international assessments expand their reach, they are including systems where resources are largely unavailable that would enable the participating system to create a welldeveloped assessment. In this regard, PISA-D can be seen as a possible new way forward, representing an assessment that attempts to embrace aspects of diversity (e.g., economic) while maintaining the ultimate goal of comparing constructs that are believed to be equally understood and important across culturally diverse populations.

METHODOLOGICAL CHALLENGES TO MEASURING HETEROGENEOUS

Given the vital goals for PISA-D, it is important to remind ourselves that most international assessments work under the assumption that what is being assessed is universally understood and, when aggregated, what is measured is universally important to all systems participating in the assessment. In other words, assessments can work to create a common understanding of what is important to educational systems (D. J. Rutkowski, 2007). If questions on the assessments and the resulting scores were understood differently between countries, serious concerns or violations of the validity and comparability emerge. In this regard, under the current ILSA design, PISA-D is simply a stepping stone for countries to learn how to participate in ILSAs so that they can join PISA, which, according to the OECD, assesses the universal concepts important to the global community. The assumption that universal constructs exist speaks to some of the tensions with the design-based solutions proposed in this chapter. For example, accepting cultural diversity in assessment pushes us to engage in a myriad of technical and conceptual puzzles that are sure to emerge as we embrace heterogeneity. Further, it may be the case that the policy and research community may have to accept that there are very few concepts or constructs that (1) can be assessed and (2) are equally accepted and understood in over 90 countries. We do not reject the possibility of homogeneity of certain constructs (e.g., L. Rutkowski & Rutkowski, 2016); however, there is growing evidence that assuming all assessed constructs are invariant across all cultures is problematic. As noted in this chapter, there is a growing body of evidence that some of the constructs (or latent traits) developed by PISA and other ILSAs are not comparable across, or at time even within, systems. We argue that embracing heterogeneity does not require a wholesale abandonment of ILSAs but rather a re-envisioning of what it means to assess diverse populations. In many ways, PISA-D is an example of a regional model in that it brings together countries that are dealing

137

with very difficult economic situations and attempts to assess what students in those countries need to do to be successful in a modern economy. Grouping countries in such a way is similar to the original PISA design, which was developed to assess the richest countries in the world and compare them on what the OECD believed students needed to know to participate in a global economy. Of course, economic grouping does not alleviate many of the cross-cultural issues discussed in this chapter. Nevertheless, we hope it provides the reader with some ideas of how regional designs can work within an international context. In summary, international assessments have changed drastically over the course of the last 60 years. Growth in participants, changes in platforms, and increased areas of emphasis are just some of the areas that have most markedly changed. During this period, PISA and related studies have also taken on a prominent role in educational policy and research discussions internationally. As such, it is of increasing importance that a study such as PISA, initially developed by, of, and for OECD member states, continues to evolve and develop in ways that best serve their members and the highly diverse group of partner countries now and into the future.

Notes 1  Participants included Belgium, England, Finland, France, Federal Republic of Germany, Israel, Poland, Scotland, Sweden, Switzerland, the United States, and Yugoslavia. 2  The Gunning fog index (Gunning, 1969) is a measure of readability for English writing. The value estimates the number of years of formal education a person needs to understand a text on the first reading.

REFERENCES Adams, R. J., & Cresswell, J. (2016). PISA for Development technical strand A: Enhancements of PISA cognitive instruments (OECD Education Working Papers No. 126). Paris: OECD Publishing.

138

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Bollen, K. (1989). Structural equations with latent variables. New York: Wiley. Bond, S. (n.d.). Gunning fog index calculator. Retrieved February 27, 2017, from http:// gunning-fog-index.com/contact.html Braun, M., & Mohler, P. P. (2003). Background variables. In J. A. Harkness, F. J. R. van de Vijver, & P. P. Mohler (Eds.), Cross-cultural survey methods. Hoboken, NJ: Wiley. Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105(3), 456–466. https://doi. org/10.1037/0033-2909.105.3.456 Carr-Hill, R. (2015). PISA for Development technical strand C: Incorporating out-of-school 15-year-olds in the assessment (OECD Education Working Papers No. 120). Paris: OECD Publishing. Clayton, T. (1998). Building the new Cambodia: Educational destruction and construction under the Khmer Rouge, 1975–1979. History of Education Quarterly, 38(1), 1. https://doi.org/10.2307/369662 Ercikan, K. (1998). Translation effects in international assessments. International Journal of Educational Research, 29(6), 543–553. https:// doi.org/10.1016/S0883-0355(98)00047-0 Ercikan, K. (2003). Are the English and French versions’ of the Third International Mathematics and Science Study administered in Canada comparable? Effects of adaptations. International Journal of Educational Policy, Research and Practice, 4, 55–75. Foshay, A. W., Thorndike, R. L., Hotyat, F., Pidgeon, D. A., & Walker, D. A. (1962). Educational achievement of thirteen-year-olds in twelve countries. Hamburg, Germany: UNESCO Institute for Education. Retrieved from http://unesdoc.unesco.org/images/ 0013/001314/131437eo.pdf Gardner, R. C., & MacIntyre, P. D. (1991). An instrumental motivation in language study: Who says it isn’t effective? Studies in Second Language Acquisition, 13(1), 57–72. https:// doi.org/10.1017/S0272263100009724 Glas, C., & Jehangir, K. (2014). Modeling countryspecific differential item functioning. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues,

and methods of data analysis. Boca Raton, FL: Chapman & Hall/CRC Press. Goldstein, H. (2004). International comparisons of student attainment: Some issues arising from the PISA study. Assessment in Education: Principles, Policy & Practice, 11(3), 319–330. https:// doi.org/10.1080/0969594042000304618 Gunning, R. (1969). The fog index after twenty years. Journal of Business Communication, 6(2), 3–13. https://doi.org/10.1177/ 002194366900600202 Hambleton, R. K., & Zenisky, A. (2011). Translating and adapting tests for cross-cultural assessments. In D. Matsumoto & F. J. R. van de Vijver (Eds.), Cross-cultural research methods in psychology. New York: Cambridge University Press. Institutos privados. (2017). ‘Impuesto de guerra’ martiriza a colegios desde hace 10 años. La Tribuna, February 16. Tegucigalpa, Honduras. Retrieved from www.latribuna.hn/2017/02/16/ institutos-privados-impuesto-guerra-martirizacolegios-desde-10-anos/ International Association for the Evaluation of Educational Achievement. (2013). TIMSS 2015 assessment frameworks. Boston, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. Johnson, T., Shavitt, S., & Holbrook, A. (2011). Survey response styles across cultures. In D. Matsumoto & F. J. R. van de Vijver (Eds.), Cross-cultural research methods in psychology. New York: Cambridge University Press. Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527–535. https://doi.org/10.1037/ 0033-2909.112.3.527 Kennedy, A. M. (2016, March). PISA for Development progress report. Presented at the PISA for Development International Advisory Group, Asuncion, Paraguay. Retrieved from www. slideshare.net/OECDEDU/presentationsfrom-the-3rd-pisa-for-­d evelopment-inter national-advisory-group-meeting-held-inasuncion-paraguay-from-30-march-to-1-april2016-day-1 Lomas, T. (2016). Towards a positive crosscultural lexicography: Enriching our emotional landscape through 216 ‘untranslatable’ words pertaining to well-being. The Journal of Positive Psychology, 11(5), 546–558. https://doi. org/10.1080/17439760.2015.1127993

METHODOLOGICAL CHALLENGES TO MEASURING HETEROGENEOUS

Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525–543. https://doi. org/10.1007/BF02294825 Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi. org/10.1037/0003-066X.50.9.741 Millsap, R. E. (2011). Statistical approaches to measurement invariance. New York: Routledge. Mislevy, R. J., Beaton, A. E., Kaplan, B., & Sheehan, K. M. (1992). Estimating population characteristics from sparse matrix samples of item responses. Journal of Educational Measurement, 29(2), 133–161. National Center for Educational Statistics. (n.d.). PISA questionnaires. Retrieved from https:// nces.ed.gov/surveys/pisa/questionnaire.asp OECD. (n.d.). About PISA. Retrieved December 3, 2017, from www.oecd.org/pisa/aboutpisa/ OECD. (n.d.). About PISA for Development. Retrieved February 24, 2017, from www. oecd.org/pisa/aboutpisa/pisafordevelopment.htm OECD. (2012). PISA 2009 technical report. Paris: OECD Publishing. Retrieved from www.oecd.org/edu/preschoolandschool/ programmeforinternationalstudentassessmentpisa/pisa2009technicalreport.htm OECD. (2013). PISA 2012 assessment and analytical framework: Mathematics, reading, science, problem solving, and financial literacy. Paris: OECD Publishing. Retrieved from http:// dx.doi.org/10.1787/9789264190511-en OECD. (2014a). Are grouping and selecting students for different schools related to students’ motivation to learn? (PISA in Focus No. 39). Paris: OECD Publishing. Retrieved from www.oecd.org/pisa/pisaproducts/ pisainfocus/pisa-in-focus-n39-(eng)-final.pdf OECD. (2014b). PISA 2012 technical report. Paris: OECD Publishing. OECD. (2016a). PISA 2012 main survey analysis plan for questionnaire data. Paris: OECD Publishing. OECD. (2016b). PISA 2015 assessment and analytical framework. Paris: OECD Publishing. Retrieved from www.oecd-ilibrary.org/ content/book/9789264255425-en

139

OECD. (2016c). PISA 2015 results: Excellence and equity in education (Vol. I). Paris: OECD Publishing. OECD. (2016d). PISA for Development: Benefits for participating countries (PISA for Development Brief No. 2). Paris: OECD Publishing. Retrieved from www.oecd.org/pisa/ aboutpisa/PISA-for-Development-Benefitsfor-participating-countries-PISA-D-Brief2.pdf Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333. Oliveri, M. E., & von Davier, M. (2014). Toward increasing fairness in score scale calibrations employed in international large-scale assessments. International Journal of Testing, 14(1), 1–21. https://doi.org/10.1080/153050 58.2013.825265 Ordonez, C. A. Z. (2015, June). Effects of ‘war tax’ collection in Honduran society: Evaluating the social and economic cost (Master’s). US Army Command and General Staff College, Fort Leavenworth, KS. Rubin, D. B. (1988). Discussion. In H. Wainer & H. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates. Rutkowski, D. J. (2007). Converging us softly: How intergovernmental organizations promote neoliberal educational policy. Critical Studies in Education, 48(2), 229–247. Rutkowski, D., & Rutkowski, L. (2013). Measuring socioeconomic background in PISA: One size might not fit all. Research in Comparative and International Education, 8(3), 259–278. Rutkowski, L. (2011). The impact of missing background data on subpopulation estimation. Journal of Educational Measurement, 48(3), 293–312. https://doi.org/10.1111/ j.1745-3984.2011.00144.x Rutkowski, L. (2014). Sensitivity of achievement estimation to conditioning model misclassification. Applied Measurement in Education, 27(2), 115–132. https://doi.org/ 10.1080/08957347.2014.880440 Rutkowski, L., & Rutkowski, D. (2016). The relation between students’ perception of instructional quality and bullying victimization. In T. Nilsen & J.-E. Gustafsson (Eds.), Teacher quality, instructional quality and

140

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

student outcome (pp. 103–120). Amsterdam: IEA and SpringerOpen. Rutkowski, L., & Rutkowski, D. (2017). Improving the comparability and local usefulness of international assessments: A look back and a way forward. Scandinavian Journal of Educational Research, 0(0), 1–14. https://doi.org/ 10.1080/00313831.2016.1261044 Rutkowski, L., Rutkowski, D., & Zhou, Y. (2016). Parameter estimation methods and the stability of achievement estimates and system rankings: Another look at the PISA model. International Journal of Testing, 16(1), 1–20. Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment, credentialing, and higher education: Prospects in a post-affirmative-action world. American Psychologist, 56(4), 302–318. https://doi.org/10.1037/0003-066X.56.4.302 Schwarz, N. (2003). Culture-sensitive context effects: A challenge for cross-cultural surveys. In J. A. Harkness, F. J. R. van de Vijver, & P. P. Mohler (Eds.), Cross-cultural survey methods. Hoboken, NJ: Wiley. Schwarz, N. (2007). Cognitive aspects of survey methodology. Applied Cognitive Psychology, 21(2), 277–287. https://doi.org/10.1002/ acp.1340 Segeritz, M., & Pant, H. A. (2013). Do they feel the same way about math?: Testing measurement invariance of the PISA ‘students’ approaches to learning’ instrument across immigrant groups within Germany. Educational and Psychological Measurement, 73(4), 601–630. https://doi. org/10.1177/0013164413481802 Smith, N. (2003). Developing comparable questions in cross-national surveys. In J. A.

Harkness, F. J. R. van de Vijver, & P. P. Mohler (Eds.), Cross-cultural survey methods. Hoboken, NJ: Wiley. Thorndike, R. M., & Thorndike-Christ, T. M. (2009). Measurement and evaluation in psychology and education (8th ed.). Boston, MA: Pearson. Transparency International. (2017, January 25). Corruption perceptions index. Retrieved from www.transparency.org/cpi2016 US Central Intelligence Agency. (2017, January 12). Senegal. Retrieved February 24, 2017, from www.cia.gov/library/publications/theworld-factbook/geos/sg.html van de Vijver, F. J. R., & Matsumoto, D. (2011). Introduction to the methodological issues associated with cross-cultural research. In D. Matsumoto & F. J. R. van de Vijver (Eds.), Cross-cultural research methods in psychology. New York: Cambridge University Press. von Davier, M., Gonzalez, E., & Mislevy, R. J. (2009). What are plausible values and why are they useful? IERI Monograph Series, 2, 9–36. World Bank. (2017a, January 2). Agriculture, value added (% of GDP). Retrieved February 22, 2017, from http://data.worldbank.org/ indicator/NY.GDP.PCAP.CD World Bank. (2017b, January 2). GDP per capita (current US$). Retrieved February 22, 2017, from http://data.worldbank.org/indicator/NY.GDP.PCAP.CD World Bank. (2017c, January 2). Population living in slums (% of urban population). Retrieved February 22, 2017, from http:// data.worldbank.org/indicator/EN.POP.SLUM. UR.ZS

8 The Participation of Latin American Countries in International Assessments: Assessment Capacity, Validity, and Fairness Guillermo Solano-Flores

Since 1995, Latin American countries have participated regularly in a variety of international assessment programs. The results from these comparisons have become an important influence in their lives. There is no doubt that this participation has informed education reform efforts, stimulated research on academic achievement in the region, and contributed to raise public awareness of the importance of education in the context of a global economy (e.g., Swaffield & Thomas, 2016) and the role that social inequity plays in student achievement (Villar & Zoido, 2016). At the same time, there is a growing concern about the over-reliance on scores in international assessments as a source that informs policy (Baird et  al., 2016; Carnoy, 2015). Adding to the difficulty of making sound decisions based on results from international assessments is the increasing political influence of country rankings, as reflected

by the occasional resistance of some governments to disseminate assessment results, and the overemphasis of country rankings in the media (see Ferrer & Arregui, 2003; Ravela, 2002; Reimers, 2003). These challenges are not exclusive of Latin America (see Figazzolo, 2009; Green Sairasky, 2015; Sjøberg, 2016; Tatto, 2006). However, they may be exacerbated by social and historical circumstances common to many Latin American countries. Among others, these circumstances include centralized governments, restricted school governance, a strong political influence of teacher unions, and a relatively short history of large-scale assessment (see Martínez-Rizo, 2016; Murillo et al., 2002; Zegarra & Ravina, 2003). Under these circumstances, scant attention has been paid to the ways in which participation in these comparisons influences and is influenced by a country’s assessment

142

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

capacity. A clear example of this interaction is a shortage of experts in the field of educational measurement, which can pose a challenge to successfully participating in international assessments and to making sound interpretations of the results of those assessments (Gilmore, 2005; Kamens & McNeely, 2010). In this chapter, assessment capacity is understood as the ability of a country to successfully design, perform, and sustain with its own human and technical resources, assessment activities that are sensitive to its own needs and social context. These activities range from developing technically sound instruments to creating effective assessment systems to making appropriate interpretations of assessment results – both from national and international assessment programs. This chapter submits three main ideas. First, while international assessments can provide countries with valuable information for education reform, the extent to which a given country can benefit from participating in these assessments depends to a large extent on its assessment capacity. Second, a country’s assessment capacity influences not only the fidelity with which it implements the procedures established by international assessment organizing agencies but also its ability to perform additional activities that can contribute to addressing issues that are relevant to its educational goals. Third, participating in international assessments can be an excellent opportunity for a country to develop its assessment capacity, provided that its efforts go beyond meeting international assessment organizing agencies’ requirements and support the development of a strong research agenda and the development of human resources in the field of assessment. The chapter pays special attention to validity and fairness, especially as they refer to the testing of culturally and linguistically diverse student populations. Because tests are cultural products (see Cole, 1999), culture and language are important influences through the entire assessment process – from the

inception of an assessment instrument to the interpretation of scores. International assessment programs have developed mechanisms intended to address culture and language as extraneous factors that might threaten the validity of interpretations of test scores. However, while necessary, these mechanisms may not be able to address the specific social and cultural context of a country. Ensuring that student performance is not affected by cultural and linguistic factors may require a country to take unique actions and to target specific segments of its student populations. This is particularly important for Latin American countries, characterized by their cultural and linguistic diversity, and by tremendous social inequalities. The chapter is organized in four sections. The first section offers a conceptual framework of assessment capacity and discusses some aspects of assessment capacity that relate to validity and fairness. The next three sections discuss different levels of participation of Latin American countries in international test comparisons: Participation, Successful Participation, and Optimal Participation. The section, Participation, examines the patterns of participation of Latin American countries in four major international assessment programs: TIMSS (Trends in International Mathematics and Science Study), LLECE (Laboratorio Latinoamericano de Evaluación de la Calidad de la Educación,1 PIRLS (Progress in International Reading Literacy Study), and PISA (Programme of International Student Assessment). The section also examines how countries with more resources have more opportunities to participate in international assessments. The section, Successful Participation, examines how views of capacity in international assessment programs focus mainly on the institutions, infrastructure, and social conditions that are relevant to meeting the requirements established by the organizing agencies and implementing their procedures with fidelity. The section cautions that successfully participating in an international

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

assessment does not necessarily bring with it long-term benefits concerning assessment capacity. The section, Optimal Participation, examines the kinds of actions that countries need to take if they are to benefit from their participation in international assessments in ways that are sensitive to their own contexts. Addressing these challenges does not necessarily imply higher costs, but certainly entails higher levels of organization. It is not the intent of this chapter to evaluate Latin American countries’ assessment capacities. A great deal of the information on the national contexts that shape assessment capacity and the characteristics of the institutions involved in large-scale assessment activities may not be accessible to external observers. Rather, the intent is to provide a series of considerations for researchers, practitioners, and decision makers to reflect on how their activities can contribute to ensuring optimal participation in international assessments.

CONCEPTUAL FRAMEWORK ON ASSESSMENT CAPACITY The term capacity is rarely defined and its use varies across authors, disciplinary fields, and types of system (e.g., financial capacity, emergency response capacity). The concept entails the conditions, abilities, and resources that make it possible for a society or an organization to accomplish a goal successfully (see Capacity Development Group, 2007; Clarke, 2012). Assessment capacity is understood here as the set of conditions (e.g., social stability, institutional organization), resources (e.g., financial, human) and skills (e.g., developing assessment instruments, collecting and analyzing data) that make a country self-sufficient in meeting its assessment needs by effectively and systematically performing assessment activities according

143

to national long-term goals (see SolanoFlores, 2008; Solano-Flores & Milbourn, 2016). This definition, which pays special attention to self-sufficiency and a country’s own social context, contrasts with typical perspectives on capacity, such as that in the PISA for Development Framework (Ward & Zoido, 2015), which focuses on the conditions and actions needed to properly participate in PISA. This section examines three aspects of assessment capacity – national context, practices, and evidence of validity and fairness. It is important to mention here that some of the activities discussed regarding validity and fairness are not common practice. The reason is twofold. First, while necessary, available procedures created to examine validity and fairness in testing (e.g., item bias detection) are not always sufficient to address the complex challenges of testing culturally, linguistically, or economically heterogeneous populations. Second, those procedures are not always used or are difficult to use due to time or budget constraints. In addition, in some cases, fairness is viewed as an attribute of tests addressed at the end of the process of test development. These limitations, which may take place in national assessment programs, are certainly likely to be observed in international comparisons. How effectively Latin American countries address their own cultural and linguistic diversity will substantially influence their ability to detect and address issues of culture and language in international assessments, in which, by nature, examinees are from multiple national, cultural, and linguistic backgrounds.

National Context The national context consists of the set of conditions that shape the assessment activities that take place in a country and the impact that those activities have in that country’s life. These conditions shape its ability to develop successful assessment programs and

144

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

perform assessment activities that are sensitive to national circumstances and congruent with national educational goals. Examples of these conditions are: • Institutions. The existence of organizations in charge of assessment activities and programs; the infrastructure (e.g., facilities, software, equipment) and the financial resources needed to generate and administer assessment instruments, and to analyze and interpret assessment data systematically. • Professionals. The availability of numbers of qualified professionals in the field of assessment and related areas commensurate with the magnitude of the assessment programs. • Procedures. The existence of procedures that allow proper coordination of the assessment activities with key components in the educational system. • Sustainability. The ability to sustain long-term assessment efforts regardless of fluctuations in the country’s financial and political circumstances. • Normativity. The existence of key normative documents, such as sampling frameworks, assessment frameworks, and sets of item specifications, that allow respectively to systematically determine the characteristics of student samples to be used in assessment programs, the specification and sampling of the content of tests, and the types of items (and their characteristics) that are to be included in tests.

can contribute to ensuring the fair inclusion of culturally and linguistically diverse students. Ideally, this document should identify main groups of students defined by factors such as socio-economic status, locale, ethnicity, and native language that need to be considered to determine the samples of pilot students with which pilot versions of a test are to be tried out. Unfortunately, in the absence of normative documents, or when these normative documents are not thoroughly developed, it is extremely difficult to pay attention to issues of culture and language. • Procedures. The formalization and systematic use of procedures are critical to addressing culture and language. First, these procedures include the establishment of assessment project timelines that are sensitive to the notion that there is a minimum time needed to successfully perform actions concerning cultural and linguistic diversity. Under tight timelines, it is difficult to successfully complete activities such as test translation, adaptation, localization, or review, or to perform differential item functioning analyses and revise test instruments. • Test development teams. Experts in the fields of culture and language (e.g., cultural anthropology and linguistics) need to work along with item writers, experts in educational measurement, educators, and content experts throughout the entire process of test development and test review to properly address a country’s cultural and linguistic diversity.

Institutional Practices

Evidence of Validity and Fairness

Practices are the actions taken by institutions in charge of national assessment programs at different stages of the process of test development, review, use, analysis, and interpretation. Assessment practices relevant to culture and language are:

In addition to well-known procedures for examining bias (such as the use of differential item functioning analysis techniques, discussed later on in this chapter), other approaches can provide evidence on validity and fairness if they are used to examine groups of special interest, such as indigenous populations and students of low socio-­ economic status. These approaches include:

• Attention to cultural and linguistic diversity. While the majority of assessment systems have or use documents that guide the process of test development (e.g., assessment frameworks, item specifications, standards), rarely do these documents address issues of culture and language thoroughly. A population sampling framework

• Analysis of response processes. Current thinking in assessment pays attention to response processes as critical to validation (see Kane & Mislevy, 2017). The analysis of verbal protocols

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

(e.g., students ‘talk aloud’ as they read and solve test items (see Leighton, 2017; Yang & Embretson, 2007)) allow examination of the extent to which cultural and linguistic factors can influence the ways in which students interpret test items and respond to them. • Analyses of score variation based on generalizability theory (a theory of measurement error; see Shavelson & Webb, 1991). These analyses allow examination of the extent to which different sources (e.g., item format, language of test administration) and their interaction contribute to measurement error and how the magnitude of this interaction varies across student groups (see Solano-Flores & Li, 2013). • Data disaggregation in the analysis of the technical properties of items. There is a growing interest in data disaggregation as key to ensuring equity through fine-grain data for different population subgroups (see Rubin et  al., 2018). While the educational measurement community is used to reporting information on student academic achievement separately for different groups of interest (e.g., by gender, language background, or socio-economic status), rarely are the data on these sub-populations disaggregated to analyze the reliability of tests and other technical properties for each group. Analyzing test technical properties after disaggregating by group may reveal that the generalizations made about academic achievement based on the scores on a given test may be appropriate for a given group but not for others.

PARTICIPATION The participation of a country in international assessments can be an indicator of its assessment capacity. At the same time, provided that proper actions are taken, participation in international assessments can contribute to increase a country’s assessment capacity. Figure 8.1(a) shows three main stages of a country’s participation in an international assessment program. First, the country is expected to take actions and meet conditions needed to ensure that the samples of students included in the test, the translation

145

of the test, its administration, and the scoring of student responses meet the design, specifications, and criteria established by the organizing agency. Second, the organizing agency reports the results. Third, the country is assumed to use the results to inform education reform decisions. This section focuses on the first stage, which entails a commitment to comply with a set of requirements established by an international assessment’s organizing agency. For example, a document posted on the web by the OECD (2018a) lists the following requirements for participating countries: i take responsibility for drawing a representative sample of schools and students in compliance with the internationally agreed target population definitions and sampling procedures. The field trial included a sample of approximately 1,500 students and the main study a sample of approximately 6,000 students; ii have the authority and resources to recruit schools to participate and to administer the assessment; iii have the capacity to deal with issues of translation, preparing and spiralling of assessment booklets; iv have the capacity to process returned booklets and score open-ended test items; and v contribute to the international overhead costs. (OECD, 2018a)

Because each country is responsible for the costs and implementation of these activities, the pattern of participation of Latin American countries in international assessment programs may be a reflection of differences concerning financial and human resources and institutional organization. Table 8.1 shows the number of times (participations) that each of 21 Latin American countries has participated in four international assessment programs – TIMSS, LLECE, PISA, and PIRLS – between 1995 and 2018. The table also shows per capita gross national income as a gross indicator of wealth for each country – information used as proxy of ­ assessment capacity, given the absence of

146

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Country draws representative student samples, translates items, recruits schools, prepares test booklets, administers tests, scores open-ended responses

Participation

Successful Participation

Optimal Participation

(a)

Organizing agency reports test results

Country makes decisions based on assessment data

(b) Fidelity of implementation

(c) Country enriches and adapts procedures

Assessment capacity

Key: Stage

Sequence

Condition

Extra activities

Impact

Figure 8.1  Process of an international assessment and the impact of the national context

more precise data. Further, Table 8.1 also shows how financial and human resources relate to countries’ experience in international assessments. Between 1995 and 2018, a total of 113 participations took place. In this period of 24 years, there were 15 years in which one of the four international assessments took place. In some cases, a country participated

in ways that varied from those in which the other countries participated. Five forms of variation can be identified. First, the test was given only to students of one of the populations or grades targeted by an international assessment program, as was the case of Argentina in 1995 and 2003, Chile in 2003, and Colombia in 1995, which gave TIMSS (which targets Grades 4 and 8) only to Grade

4,390 3,130 15,350 * * 5,890 3,560 4,060 2,130 13,100 4,060 2,250 6,630 11,040 5,970 15,250 8,580 8,610 5,830 13,040 13,610

Belize (1) Bolivia (1) Trinidad Tobago (1) Venezuela (1) Cuba (2) Ecuador (2) El Salvador (2) Guatemala (2) Nicaragua (2) Panama (3) Paraguay (3) Honduras (4) Dominican Republic (5) Costa Rica (7) Peru (8) Uruguay (8) Brazil (10) Mexico (12) Colombia (12) Argentina (13) Chile (14)

T=TIMSSb

L=LLECEc

T T T

1995

P=PISAd

L L L L L

L L L L L

L L

L

1996

T

1999

P P

T

I=PIRLSe

P

P

P

2000

I I

I

2001

T T

P P P

2003

L L L L L L

L L L

L L L L L L L

P P P P P P

2006

T

T

2007

P P P P P P P P

2009

Sources: a World Bank Group (2018) b USDE-IES-NCES (2018c) c UNESCO (2000); UNESCO, OREAL, Laboratorio Latinoamericano de Evaluación de la Calidad de la Educación (2008, 2015) d OECD (2018b) e USDE-IES-NCES (2018a)

Key:

* Information for 2017 not available

Per capita GNIa

Country (total number of participations)

T

T

I

I

2011

P P P P P P P P

2012

L L L L L L L L L L L L L L

L

2013

P P

P P P P P P P

P

T T

2015

I

2016

P P P P P P P P P

P

2018

Table 8.1  Participation of Latin American countries in four international assessments (1995–2018) and per capita gross national income (2017) THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS 147

148

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

8 students. Second, the test was used with students of a grade other than the grade targeted by the assessment program, as was the case of Honduras in 2011, which gave PIRLS (a Grade 4 assessment) to Grade 6 students. Third, a country administered the assessment only to students in a city, not to a national student sample, which was the case of Argentina in 2015, in which Buenos Aires was a benchmarking participant. Fourth, a country participated in an assessment, but its results were not reported, as was the case of Mexico in 1995. Fifth, a country administered the test to a new sample of students outside the international assessment program, as was the case of Mexico in 2000. Table 8.1 shows that the number of participations of Latin American countries in different international assessment programs has been stable. Thus, of the 113 participations, 55 and 58 took place respectively in the first eight and the last seven years of the 15 years in which at least one international assessment was administered. At the same time, the table shows a tendency over time for Latin American countries to participate in fewer different assessment programs. For example, there were eight and only four participations in TIMSS respectively in the first and the second half of the 15 years in which at least one international assessment was administered. This decrease in the variety of assessment programs is corresponded by an increase in the relative frequency of PISA participations over other assessments. While in the first half of the 15 years referred 25% of the participations were in PISA, in the second half 62% of the participations were in PISA. This trend indicates a decrease on the variety of international sources of data available (e.g., by age group, grade, and content area) and, therefore, in the variety of knowledge and skills assessed for Latin American countries to inform their education reform efforts. Table 8.1 also shows that Latin American countries vary tremendously on the amount of experience they acquire from participating in international assessments. There is a

tendency for countries with higher per capita annual income to participate in more assessment programs and for countries with lower per capita annual income to participate in fewer assessment programs (Panama and Trinidad Tobago are exceptions). Indeed, 76% (86) of the 113 participations were from seven of the nine countries with the highest per capita gross national income (Costa Rica, Brazil, Uruguay, Mexico, Colombia, Argentina, and Chile). Of course, per capita gross national income is only one of many economic indicators that can be used to examine the wealth of countries. Indeed, this measure does not take into account inequality in the distribution of wealth – a serious factor in Latin American countries. Further, it should not be assumed that only economic factors determine whether a country participates in an international assessment program – countries may differ on their motivation to participate in international assessments. Yet, taken together, the pattern shown in Table 8.1 suggests that financial factors play an important role on the extent to which a country is able to participate in international assessments and, ultimately, the extent to which that country is exposed to experiences in those assessments. Given this disparity of opportunities, LLECE seems to play an important buffering role as a moderator of the disparities between Latin American countries concerning international assessment. Not only is this the international assessment in which most Latin American countries participate. Also, with the exception of El Salvador’s participation in TIMSS in 2007 and Honduras’s participation in TIMSS and PIRLS in 2011, this is the only assessment in which low per capita gross national income Latin American countries have participated in any international assessment since 1995. (Belize participated in PIRLS in 2011, but has never participated in LLECE). Furthermore, LLECE is an international assessment generated by participating countries and, therefore, has great potential to be sensitive to the regional cultural

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

and linguistic characteristics of the students assessed (see Treviño, 2014; UNESCO, OREAL, Laboratorio Latinoamericano de Evaluación de la Calidad de la Educación, 2008, 2017).

SUCCESSFUL PARTICIPATION In current practice in international assessments, the participation of a country is deemed successful when it performs the activities it has committed to perform with fidelity, according to the requirements established by the corresponding organizing agency. Figure 8.1(b) represents the notion that fidelity of implementation is critical for a country to successfully participate in an international assessment. Consistent with literature in the field (e.g., Fullan & Pomfret, 1977; O’Donnell, 2008; Ruiz-Primo, 2006), fidelity of implementation can be broadly defined as the extent to which the actions of a program are performed according to an original plan or design. In the context of international assessments, fidelity of implementation refers to the extent to which activities such as drawing student samples, translating items, recruiting schools, preparing test booklets, administering tests, and scoring open-ended responses are performed as established by the organizing agency. Ultimately, fidelity of implementation is relevant to the integrity of international assessments and the validity of interpretation of the test scores it produces. PISA refers to capacity as: The ability of the individuals and institutions responsible for the project in [a] country to perform the necessary functions (as set out in the roles and responsibilities for the National Centre (NC) and the National Project Manager – NPM), solve the likely problems that will arise during implementation and set and achieve project objectives in a sustainable manner. (OECD, 2015c)

Recent attention to the factors that determine countries’ successful participation in international assessment programs has led to

149

examining the conditions that are critical to fidelity of implementation. For example, the PISA for Development initiative launched by the OECD seeks ‘to identify how PISA can best support evidence-based policy making in low income and middle income economies’ (Ward & Zoido, 2015, p. 21) and prepare them to participate in the program by supporting them to build ‘capacity for managing large-scale student learning assessment and using the results to support policy dialogue and decision making’ (OECD, 2016a). Of the eight countries participating in this initiative, five (Ecuador, Guatemala, Honduras, Panama, and Paraguay) are Latin American. An important aspect of this initiative is the development of an analytical framework for assessing the capacity needs of each country to manage its participation. This analytical framework comprises a total of 112 elements identified as critical for successful implementation. As shown in Figure 8.2, each element is assessed according to four levels of capacity (latent, emerging, established, and advanced),2 according to evidence on the country gathered from interviews and document analysis. These dimensions are grouped in three dimensions (see Ward & Zoido, 2015, p. 24): • the enabling environment, encompassing the legislative context and culture that facilitates the implementation, and the stakeholders who make use of the results; • organisation, encompassing the National Centre and any sub-national institutions that are directly involved in the implementation of the project; and • the individual, encompassing the staff of the National Centre and related organisations, in particular the National Project Manager(s) and his/her team.

While the analytical framework is being used currently only with the eight countries participating in the PISA for Development initiative, it is possible to appreciate its possibilities as a tool for examining how

150

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Figure 8.2  An example of the rubric used to assess one of the 112 elements included in the PISA for Development capacity needs analytical framework: Guatemala (OECD, 2015b)

countries differ on the sets of conditions that shape fidelity of implementation. As an example, Figure 8.3 provides a partial summary of the reports on the capacity needs analysis of the five Latin American countries that partake in the initiative. This summary focuses on the percentage of elements within each of the three dimensions that were rated as ‘Advanced’. Taken the five Latin American countries together, between 14% and 46% of the elements within a given dimension were rated as advanced. The five countries exhibit different sets of strengths and weaknesses. For example, relative to the other two dimensions, in terms of the percentage of elements rated as Advanced within each dimension, Individual is higher than Organization and Enabling Environment for Paraguay. In contrast, Enabling Environment is higher than Organization and Individual for Honduras, and Organization is higher than Enabling Environment and Individual for Panama. These differences speak to the heterogeneity of political and cultural conditions, the level of organization, and the availability of human resources in Latin American countries’ participation in international assessments. Notice, however, that in the analytical framework, capacity is circumscribed to participation in PISA. Given this specificity,

implementing with fidelity the activities of the organizing agency does not constitute, by itself, a guarantee that a country obtains the maximum benefit from participating in PISA or in any other international assessment.

OPTIMAL PARTICIPATION Participation in an international assessment is optimal when, beyond performing activities according to the requirements established by the corresponding organizing agency, a country performs additional activities intended to provide information relevant to national long-term educational goals. Figure 8.1(c) illustrates the role that assessment capacity plays for that maximum benefit to occur (Solano-Flores, 2008; Solano-Flores & Milbourn, 2016). This section discusses some issues (and the actions needed to address them) that are critical to optimal participation, as it concerns the valid and fair testing of culturally and linguistically diverse populations. These issues are grouped into two categories: issues of assessment development and administration, and issues of interpretation of results and decision making.

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

151

50 45 40

Percentage

35 30 25

Enabling Environment

20

Organization Individual

15 10 5 0 Ecuador

Guatemala

Honduras

Panama

Paraguay

Country

Figure 8.3  Percentage of elements rated as ‘Advanced’ within each of the three dimensions considered in the capacity needs analysis (OECD, 2015a, 2015b, 2015c, 2016a, 2016b, 2017a)

Assessment Development and Administration Item development Items are created according to normative documents, such as the assessment’s framework and sets of specifications that establish the numbers and types of items needed to assess different topics and different kinds of knowledge within a content area. Countries are an important source in the creation of items. At an early stage in the process of development, countries submit items that may be selected for inclusion in the test (e.g., OECD, 2017b). While participating countries are not required to contribute items, the importance of performing this activity should not be underestimated. Consistent with current trends in education reform emphasizing meaningful learning, high percentages of test items in international assessments contain contextual information in which problems are situated in a scenario or short story with the intent to make the problems tangible, interesting, and relevant. An investigation with

PISA 2006 and PISA 2009 items revealed that 38% and 19% of items provided contexts based respectively on schoolwork activities and household activities (Ruiz-Primo & Li, 2015). Because context cannot be dissociated from culture, the extent to which contexts make problems meaningful may be shaped by students’ cultural experience. Indeed, there is evidence that personal experience, along with content knowledge, shapes how students make sense of science and mathematics items (Solano-Flores & Li, 2009; Solano-Flores & Nelson-Barber, 2001). Even if items are situated in meaningful contexts, students may be at a disadvantage when those contexts are not close to their everyday life experiences. Unfortunately, the participation of Latin American countries in the process of item development has been limited. As part of the development of PISA 2012, 20 countries submitted a total of about 500 items for the initial pool of items (OECD, 2014). Unfortunately, of the eight Latin American participating countries, only Colombia, Mexico, and Uruguay submitted items. Since

152

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

PISA is intended to measure competencies for life, its items may present contexts, represent information, and require response styles that are a reflection of the everyday lives of industrialized countries (see Eivers, 2010; Wultke, 2007). Due to limited participation of countries in the process of item development, the pools of items included in international assessments may not have a balanced representation of different cultural contexts. Since the numbers or percentages of items submitted by each country and included in international assessments are not reported, those percentages may be considerably low and, therefore, this unbalance may be even more serious.

Translation In current international assessment practice, items are created in English, then translated into the languages of the non-English-speaking participating countries. PISA uses also French as a source language. Countries translate the two source language versions of the items separately and then compare and integrate them into one version of the items in the target language to ensure that the original meaning of the items is preserved (see OECD, 2017b). As experience from international assessments accumulates, the procedures for test translation and test translation verification have evolved. Now it is well recognized that translation may alter the constructs that tests are intended to measure or the level of linguistic complexity of items, thus affecting comparability across languages (Arffman, 2010; Goldstein, 2018; Grisay & Monseur, 2007). Indeed, there is evidence that translation is a major source that contributes to differential item functioning (e.g., Yildirim & Berberoĝlu, 2009). While international assessment programs provide countries with guidelines for translation of test items and questionnaires, only scant information is available on the ways in which these guidelines are implemented by each country. Also, there is evidence of the

limitations of available analytical statistical procedures used in international assessments with the intent to minimize language as a variable that affects item difficulty and linguistic demands (El Masri, Baird, & Graesser, 2016; Goldstein, 2018). Also, while assessment programs have robust translation verification procedures in place (see Hambleton, Merenda, & Spielberger, 2005; OECD 2017b), there is evidence that speaks to the advantage for countries to use internal, enriched test translation review procedures that are sensitive to their own contexts. For example, having multidisciplinary teams (e.g., composed by content specialists, linguists, teachers, translators, and curriculum experts) examine translated items, and focusing on confirming evidence rather than disconfirming evidence of test translation adequacy may reveal very subtle ways in which the translation of items can be improved (Solano-Flores, Backhoff, & Contreras-Niño, 2009). It is important to mention that this room for improvement is not necessarily attributable to low quality translation, but rather to the fact that languages encode meaning in different ways and have different sets of grammatical rules. Due to these differences, even an impeccable translation has translation errors. The job of an internal translation review team is to minimize that error (Solano-Flores, Backhoff, & Contreras-Niño, 2009). Using the multidisciplinary, consensusbased approach mentioned above, it is possible to quantify translation error and to estimate how this translation error can affect student performance. For example, in an investigation with science and mathematics TIMSS items translated into Spanish and used in Mexico, a translation review panel found that 24 items out of 319 items could be characterized as objectionable due to many or serious translation errors. There was a tendency for items with higher levels of translation error to be responded correctly by fewer students (Backhoff, ContrerasNiño, & Solano-Flores, 2011; Solano-Flores,

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

Backhoff, & Contreras-Niño, 2005). Another investigation with PISA items (SolanoFlores, Contreras-Niño, & Backhoff, 2013) revealed a similar pattern. Of the 138 PISA2006 items examined by a translation review panel, 26 items were identified as having an objectionable translation. The proportion of students responding correctly to the items was considerably lower for the items with objectionable translation than for the items with acceptable translation. Taken together, these findings indicate that translation can play an important role in the validity of generalizations of translated test scores. While the international assessment programs’ translation guidelines ensure a minimum level of rigor and standardization in translation, Latin American countries can operationalize those guidelines in ways that translation error is minimized.

Collaboration between participating countries Regional collaboration plays an important role in the ways in which Latina American countries participate in international comparisons. For example, a collaborative effort like LLECE and the Grupo Iberoamericano de PISA (GIP) contributes to addressing common educational challenges and goals (Martinez-Rizo, 2016). This collaboration may also contribute to minimizing issues of culture that result from the under-representation of cultural contexts in items. Furthermore, this effort also contributes to the development of a culture of assessment that benefits participating countries (see UNESCORegional Bureau for Education in Latin America and the Caribbean, 2017). However, regional collaboration needs to be nuanced, according to principled practice. For example, at first glance, sharing the cost of test translation and distributing the translated items among all participating countries whose official language is Spanish seems to be a good idea. But there are some complexities that need to be taken into account, as the experience in a feasibility study of AHELO

153

(Assessment of Higher Education Learning Outcomes) illustrates (see Universidad Nacional de Colombia Bogotá, 2012). This assessment has constructed-response tasks. Each describes an event (e.g., the spread of a disease, a series of car accidents in a city) and several documents (e.g., a technical report, an interview, a newspaper clip, emails between government officials) with opinions and information on the event. Students are asked to make a decision or recommendation (e.g., what is the most likely factor contributing to the spread of the disease, how can the city reduce the frequency of accidents) based on examining the documents. The complexity of the task lies on the fact that the documents provide fragmented, biased, or conflicting information (see also ZlatkinTroitschanskaia, Shavelson, & Kuhn, 2015). Because constructed response tasks need to be authentic, the events need to be situated in the cultural context of the student’s country and the format, appearance, and linguistic features of the documents need to be the same as the documents that would be generated in the culture in which the events are supposed to take place. This is extremely hard to accomplish. For example, the format, style, vocabulary, and idiomatic expressions used in newspapers in Mexico and Colombia (the only two Latin American countries that participated in the feasibility study of AHELO) are so different that it is impossible for a country to use the translation of a newspaper clip without making a considerable number of adaptations. It is not an overstatement to say that, in many cases, translating these documents from scratch can be as costly and time-consuming as adapting their translations. This experience is not limited to Spanish or to any specific component of a test. Similar issues were observed, for example, when an attempt was made to use in Belgium the Dutch translation of AHELO tasks made in the Netherlands. The wording of questions may sound artificial or unfamiliar to students tested with the translation made

154

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

in another country. Thus, while countries have shared translations, the complexity and amount of a careful work of linguistic localization and cultural adaptation needed should not be underestimated, when time available for translation work is limited (Tremblay, Lalancette, & Roseveare, 2012). This adaptation work needs the participation of teams of experts.

Cultural and linguistic bias A wide range of analytical tools are available that allow examination of the extent to which the properties of items may be altered when tests are translated and adapted (see Sireci, Patsula, & Hambleton, 2005). A wellknown, empirical approach to examining cultural or linguistic bias is the analysis of differential item functioning (see Camilli, 2013; Camilli & Shepard, 1994; Hambleton, Swaminathan, & Rogers, 1991). The difficulty of an item for a group of examinees – the focal group – is compared to the difficulty of that item for a reference group. For example, in examining potential bias in a translated item, the reference group is the group tested in the language in which the test was originally created, and the focal group is the group tested in the language to which the test was translated. A test item functions differentially if the scores obtained by those groups on that item are substantially different after controlling for the groups’ overall performance on the test. While translation can unfairly make an item easier in the target than in the source, original language, in the majority of the cases items function differentially against the students tested with translated tests (see Hambleton, 2005). The analysis of differential item functioning is typically invoked as an approach to ensuring fairness (e.g., van de Vijver, 2016). However, several conditions may limit its use and effectiveness. Understanding these limitations is essential to having realistic expectations about the extent to which they can contribute to properly addressing cultural, linguistic, and socio-economic diversity.

Research on differential item functioning in international assessments shows that performance is considerably sensitive to the linguistic features of translated items. Even at the lexical level, a single word may contribute substantially to make a translated item differentially functioning (Ercikan, 1998). These differences are shaped by the fact that languages offer different sets of words and expressions to express the same given idea. For example, the colloquial word, ‘spec’, needs to be translated in Spanish as ‘partícula’ (particle) – a low-frequency word in Spanish – because no high-frequency equivalent to ‘spec’ is available in the Spanish used in Mexico (Solano-Flores, Backhoff, & Contreras-Niño, 2009). Research also shows that the numbers of items that may be differentially functioning are far from being small. For example, in a comparison performed between English- and Turkish-speaking students who took TIMSS-Repeat, 23% of the items were detected as differentially functioning (Arim & Ercikan, 2014). Similar or even higher percentages have been found in other large-scale assessments (see Yildirim & Berberoĝlu, 2009). Important actions need to be taken if differential item functioning analysis is to provide useful information on bias. First, careful attention must be paid to the characteristics of the samples of students used in differential functioning analyses. A limitation of differential item functioning analysis techniques is that they assume population homogeneity. Yet there is evidence that differentially functioning items may go undetected when the samples of students are heterogeneous but wrongly assumed to be homogeneous (Ercikan et  al., 2014). Given the tremendous student populations’ heterogeneity in most of the Latin American countries, differential functioning analyses can produce useful information on bias only if there is certainty that the student samples used in the analyses are representative of major factors related to culture, language, and economic income.

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

Second, because performing differential functioning item analyses is time-consuming and costly, careful planning and allocation of resources needs to take place in order to try out large numbers of items with representative samples of pilot students, analyze their technical properties, and then make modifications on the items identified as differentially functioning. In practice, these activities are unlikely to be completed, even with small samples of items, especially when there are many items to develop or translate and when test developers are working under tight timelines. These practical constraints underscore the importance of judgmental procedures in which content experts and other professionals examine the characteristics of items and make decisions about potential bias (Allalouf, 2003). Comparable results in the ratings of items as biased using judgmental procedures and differential item functioning analyses have been obtained when several differential item functioning models, rigorous review procedures, and highly qualified reviewers are used (see Roth et  al., 2013; Yildirim & Berberoĝlu, 2009). Third, in practice, differential item functioning analyses are difficult to perform during the process of item development in international assessments. The main reason is the lack of a reference group (students from an English-speaking country) during the process of test development. Notice that, whereas there are a number of studies on differential functioning in international assessments, those studies are conducted with data obtained after the assessments have been administered, results have been reported, and decisions have been made based on those results. While those studies can have a long-term impact in test development practices, they do not influence the process of item development because they cannot inform this process timely. Clearly, the availability and the viability of analytical procedures for bias detection should not be regarded as exchangeable sets of circumstances.

155

Interpreting Reports and Decision Making Several authors have expressed concern about simplistic interpretations of test results from international assessments (Berliner, 2015) and the limitations of international assessment organizing agencies’ recommendations to countries based on those results (e.g., Carnoy & Rothstein, 2013; Sjøberg, 2016). Assessment capacity is key to minimizing the perils of those simplistic interpretations. Figure 8.1(c) represents assessment capacity as key for optimal participation in international assessments, and therefore a balanced use of test results. Assessment capacity allows a country to inform policy decisions based on multiple sources of information, without relying solely on external test reports and recommendations. For example, sufficient assessment capacity allows a country to perform its own analyses of assessment data and to make independent interpretations of those data. It is important to say that these analyses should not only concern test scores but also the study of the cognitive and cultural factors that influence how students understand items and respond to them. This section discusses two aspects of assessment capacity as relevant to proper test results interpretation and decision making: the need for multiple disciplinary perspectives in interpreting international assessment results, and the fact that different cultures ascribe different sets of values to different forms of knowledge.

Diversity of disciplinary perspectives There is a growing consensus that different disciplinary perspectives and approaches to analyzing and interpreting data from international comparisons should inform educational policy decisions (see Aloisi & Tymms, 2017). Yet the perspective of educational researchers may not be influencing these decisions as much as it should do. One

156

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

indicator of this trend is the fact that research on PISA reflects a predominant economics view. For example, a review of 116 studies on PISA published between 2001 and 2012 in periodical journals (Pereira, Perales, & Bakieva., 2016) revealed that 55% of those studies (64) were published in journals in the field of economics, whereas only 31% (36) were published in journals in the field of education. Moreover, whereas 81% of those publications (94) made recommendations on education policy, only about 38% (44) made recommendations on pedagogy. Another review (Hopfenbeck et al., 2018) found that there are more articles on PISA published in journals belonging to the category Economics than in journals belonging to the category Assessment and Testing. While these numbers do not take into consideration the impact of the journals in which the investigations examined were published or the fact that not all studies are equally influential, they suggest that not all relevant views may weigh equally in decision making based on international assessments.

Content and knowledge valued An important fact not considered in the interpretation of results from international assessments is the correspondence between what an international assessment is intended to measure and the knowledge taught that is valued in each country (see Labaree, 2014; Suter, 2000). First, cultures ascribe different values to different forms of knowledge (see, for example, Gebril, 2016; Kennedy, 2016). Also, different types of items (multiple choice, completion, constructed response) tend to tap into different types of knowledge (e.g., declarative or factual, procedural, schematic, strategic; see Ruiz-Primo, 2007). As a result, the extent to which certain forms of knowledge are represented in a given international assessment may not correspond exactly to what society in a given country expects its citizens to learn and be able to do. Second, forms of representing information (e.g., through drawings, charts) in both

formal and informal contexts are intimately related to culture (see Kress, 2012; Lemke, 1998). Figure 8.4 shows an example that questions the assumed universality of forms of representation of information. The example was developed mimicking the characteristics of actual items used in TIMSS (e.g., International Association for the Evaluation of Educational Achievement, 2007). The example shows that an item may not be equally difficult across countries not only due to possible differences in the knowledge assessed but also due to differences in the students’ familiarity with the conventions used to represent information. It is important to say that this inequality may take place even between Latin American countries, as illustrated by important differences in the systems used to represent decimals and fractions (Solano-Flores, 2011). While translation guidelines used in international assessment programs take into consideration cultural adaptation, this adaptation focuses on superficial differences, notation, monetary units, and names of characters. Typically, visual forms of representation are assumed to be

Look at the following diagram.

Which fraction describes the shaded squares? A) 2/4 B) 2/6 C) 3/4 D) 3/8

Figure 8.4  A hypothetical item. Are students from all Latin American countries taught to represent fractions this way?

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

equally explicit to individuals from different cultures. Fourth, the content assessed by an international assessment may not be exactly the kind of content that a country needs to assess. For instance, one of the main challenges for Uruguay to benefit from its participation in PISA has been finding ways to make sense of results beyond country rankings. Despite Uruguay having one of the most sophisticated educational evaluation systems in Latin America, this country does not have a national middle school assessment system in place. This gap has contributed to the adoption of PISA results as an indicator of academic achievement (Peri et  al., 2016). However, PISA focuses on competencies, while the traditional form of assessing academic achievement in Uruguay focuses on content (Peri et  al., 2016). This experience illustrates that results from an international assessment can influence the decisions made by a country simply because the measures provided by that assessment are the only measures available. Clearly, international assessments and the assessments used by a country are not necessarily sensitive to the same content and skills. The information they produce needs to be used complementarily; one does not replace the other and their results should not be confused.

CONCLUSIONS This chapter has offered a conceptual framework on assessment capacity as key for Latin American countries to have an optimal participation in international assessments. The participation of a country is successful when it implements with fidelity the actions required by the international assessments’ organizing agencies. Beyond successful participation, the participation is optimal when the country performs activities beyond those required by the organizing agencies with the

157

intent to better address its own needs and social contexts. Among many examples, these extra activities include: conducting verbal protocols to validate the appropriateness of test translations; enriching population samples to address, with a fine-grain level of detail, socio-economic, linguistic, and cultural diversity; and performing differential item functioning analyses to examine potential bias against groups of special interest. Also, the country may take advantage of its participation in international assessments to answer research questions that are part of a larger research agenda. Ultimately, these additional activities contribute to enriching the interpretation of test results and informed policy decisions without needing to rely solely on reports from the organizing agencies. While Latin American countries have participated regularly in international assessments in the last three decades, the number of times different countries have participated reflects economic inequalities and indicates different levels of assessment capacity. The variety of international assessments in which Latin American countries have participated has decreased in the last years and is almost limited to PISA. This trend should raise concerns because PISA focuses on competencies, not content knowledge, and targets a specific population defined by age. Indeed, there are concerns that PISA may underestimate the competencies of the populations of 15–16-year-olds in developing countries because its sampling framework excludes school dropouts and student attrition rates are higher in developing countries than in industrialized countries (Spaull, 2018). These kinds of concerns underscore the need for countries to participate in different international assessments. Hopefully, LLECE will continue to be offered in the future, as this effort is a unique opportunity for Latin American countries to assess its students with instruments that are more likely to be sensitive to the characteristics of their cultures and education systems.

158

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Assessment capacity influences whether a country’s participation in an international assessment is optimal. At the same time, participation in an international assessment constitutes an opportunity for a country to strengthen its assessment capacity. Two conditions appear to be critical for this to occur. First, the country needs to have an assessment research agenda in place that allows it to link the results of international assessments to information from other assessments. Ultimately, this research agenda should contribute to increase the country’s ability to make sound interpretations and uses of test results. Second, the country needs to ensure that its participation in international assessments always supports the development of human resources in fields relevant to assessment. This can be accomplished by creating diplomas, masters, and doctoral programs in assessment and psychometrics associated to that participation. Content covered should emphasize test development, test translation and adaptation, and test review. The need for these efforts oriented to increasing countries’ assessment capacity through the development of human resources is evident when one considers that many developing countries are not represented in the membership of professional testing organizations (see Solano-Flores & Milbourn, 2016). In an era of accountability and assessment, many Latin American countries may be experiencing a serious need for more professionals with formal training in the field of educational measurement and related areas. While international assessments provide countries with valuable information relevant to policy and education reform, they cannot be expected to be sensitive to the specific social context of each country. For many Latin American countries, the most important benefit of participating in international assessments may not be knowing how their students perform compared to the students from other countries, but rather having the opportunity to strengthen their assessment

capacity in accordance with their long-term educational goals.

Notes 1  LLECE is used in this chapter to refer to the three editions of the assessment, PERCE (Primer Estudio Regional Comparativo y Explicativo/First Regional Comparative and Exploratory Study), SERCE (Segundo Estudio Regional Comparativo y Explicativo/Second Regional Comparative and Exploratory Study) and Tercer Estudio Regional Comparativo y Explicativo/Third Regional Comparative and Exploratory Study). 2  One or two levels do not apply for some ­elements.

REFERENCES Allalouf, A. (2003). Revising translated differential functioning items as a tool for improving cross-lingual assessment. Applied Measurement in Education, 16(1), 55–73. Aloisi, C., & Tymms, P. (2017). PISA trends, social changes, and education reforms. Educational Research and Evaluation: An International Journal on Theory and Practice, 23, 180–220. Arffman, I. (2010). Equivalence of translations in international Reading literacy studies. Scandinavian Journal of Educational Research, 54(1), 37–59. Arim, R., & Ercikan, K. (2014). Comparability between the American and Turkish versions of the TIMSS mathematics test results. Education and Science, 39(172), 33–48. Backhoff, E., Contreras-Niño, L. A., & SolanoFlores, G. (2011). La teoría del error de traducción de pruebas y las evaluaciones internacionales de TIMSS y PISA. Reporte No. 36. Mexico: Instituto Nacional para la Evaluación de la Educación. Baird, J. A., Johnson, S., Hopfenbeck, T. N., Isaacs, T., Sprague, T., Stobart, G., & Yu, G. (2016). On the supranational spell of PISA in policy. Educational Research, 58(2), 121– 138. doi: 10.1080/00131881.2016.1165410 Berliner, D. C. (2015). The many facets of PISA. Teachers College Record, 117(1), 1–20.

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

Camilli, G. (2013). Ongoing issues in test fairness. Educational Research and Evaluation: An International Journal on Theory and Practice, 19(2–3), 104–120. Camilli, G., & Shepard, L. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage. Capacity Development Group (2007, May). Capacity assessment methodology: User’s guide. Bureau for Development Policy, United Nations Development Program. New York, September 2005. Retrieved from the United Nations Development Programme website: www.unpei.org/sites/default/files/ PDF/institutioncapacity/UNDP-CapacityAssessment-User-Guide.pdf Carnoy, M. (2015). International test score comparisons and educational policy: A review of the critiques. Boulder, CO: National Education Policy Center, University of Colorado, Boulder. Retrieved from the National Education Policy Center website: https:// nepc.colorado.edu/publication/internationaltest-scores Carnoy, M., & Rothstein, R. (2013). What do international tests really show about US students’ performance? Washington, DC: Economic Policy Institute. Clarke, M. (2012). What matters most for student assessment systems: A framework paper. Washington, DC: World Bank. Retrieved from the World Bank website: https://openknowledge.worldbank.org/ bitstream/handle/10986/17471/682350 WP00PUBL0WP10READ0web04019012. pdf?sequence=1 Cole, M. (1999). Culture-free versus culturebased measures of cognition. In R. J. Sternberg (Ed.), The nature of cognition (pp. 645–664). Cambridge, MA: Massachusetts Institute of Technology. Eivers, E. (2010). PISA: Issues in implementation and interpretation. The Irish Journal of Education, 38, 94–118. El Masri, Y. H., Baird, J. A., & Graesser, A. (2016). Language effects in international testing: The case of PISA 2006 science items. Assessment in Education: Principles, Policy & Practice, 23(4), 427–455. doi: 10.1080/0969594X.2016.1218323 Ercikan, E. (1998). Translation effects in international assessments. International Journal of Educational Research, 29, 543–553.

159

Ercikan, K., Roth, W.-M., Simon, M., Sandilands, D., & Lyons-Thomas, J. (2014). Inconsistencies in DIF detection for sub-groups in heterogeneous language groups. Applied Measurement in Education, 27(4), 275–285. Ferrer, J. G., & Arregui, P. (2003). Las pruebas internacionales de aprendizaje en América Latina y su impacto en la calidad de la educación: Criterios para guiar futuras aplicaciones. Reporte PREAL, 26. Figazzolo, L. (2009). Impact of PISA 2006 on the education policy debate. Report published by Education International. Retrieved December 12, 2012 from http://ei-ie.org/en/ websections/content_detail/3272 Fullan, M., & Pomfret, A. (1977). Research on curriculum and instruction implementation. Review of Educational Research, 47, 335–397. Gebril, A. (2016). Educational assessment in Muslim countries: Values, policies, and practices. In G. T. L. Brown & L. R. Harris (Eds.), Handbook of human and social conditions in assessment (pp. 420–435). New York: Routledge. Gilmore, A. (2005). An evaluation of the value of World Back support for international surveys of reading literacy (PIRLS) and mathematics and science (TIMSS). Retrieved January 24, 2012, from www.iea.nl/fileadmin/user_ upload/Publications/Electronic_versions/Gilmore_Impact_PIRLS_TIMSS.pdf Goldstein, H. (2018). Measurement and evaluation issues with PISA. In L. Volante (Ed.), The PISA effect on global educational governance (pp. 49–48). New York and London: Routledge. Green Sairasky, N. (2015). The politics of international large-scale assessment: The Programme for International Student Assessment (PISA) and American Education Discourse, 2000–2012. New York: Columbia University Academic Commons. Grisay, A., & Monseur, C. (2007). Measuring the equivalence of item difficulty in the various versions of an international test. Studies in Educational Evaluation, 33(1), 69–86. Hambleton, R. K. (2005). Issues, designs, and technical guidelines for adapting tests into multiple languages and cultures. In R. K. Hambleton, P. F. Merenda, & C. D. Spielberger (Eds.), Adapting educational and

160

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

psychological tests for cross-cultural assessment. Mahwah, NJ: Lawrence Erlbaum Associates. Hambleton, R. K., Merenda, P. F., & Spielberger, C. D. (Eds.) (2005). Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NJ: Lawrence Erlbaum Associates. Hambleton, R. K., Swaminathan, H., & Rogers, J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage. Hopfenbeck, T. N., Lenkeit, J., El Masri, Y., Cantrell, K., Ryan J., & Baird J. A. (2018). Lessons learned from PISA: A Systematic review of peer-reviewed articles on the Programme for International Student Assessment. Scandinavian Journal of Educational Research, 62(3), 333–353, doi: 10.1080/00313831.2016.1258726 International Association for the Evaluation of Educational Achievement (2007). TIMSS 2003 mathematics items: Released set, fourth grade. Boston, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. Kamens, D. H., & McNeely, C. L. (2010). Globalization and the growth of international educational testing and national assessment. Comparative Education Review, 54(1), 5–25. Kane, M., & Mislevy, R. (2017). Validating score interpretations based on response processes. In K. Ercikan & J. W. Pellegrino (Eds.), Validation of score meaning for the next generation of assessments: The use of response processes (pp. 11–24). New York: Routledge. Kennedy, K. J. (2016). Exploring the influence of culture on assessment. The case of teachers’ conceptions of assessment in Confucian heritage cultures. In G. T. L. Brown & L. R. Harris (Eds.), Handbook of human and social conditions in assessment (pp. 404–419). New York: Routledge. Kress, G. (2012). Thinking about the notion of ‘cross-cultural’ from a social semiotic perspective. Language and Intercultural Communication, 12(4), 369–385. doi: 10.1080/14708477.2012.722102 Labaree, D. (2014). Let’s measure what no one teaches: PISA, NCLB and the shrinking aims in education. Teachers College Record, 116, 14 pages.

Leighton, J. P. (2017). Collecting and analyzing verbal response process data in the service of interpretive and validity arguments: Validating score interpretations based on response processes. In K. Ercikan & J. W. Pellegrino (Eds.), Validation of score meaning for the next generation of assessments: The use of response processes (pp. 25–38). New York: Routledge. Lemke, J. L. (1998). Multiplying meaning: Visual and verbal semiotics in scientific text. In J. R. Martin & R. Veel (Eds.), Reading science: Critical and functional perspectives on discourses of science (pp. 87–113). New York: Routledge. Martínez-Rizo, F. (2016). Impacto de las pruebas en gran escala en contextos de débil tradición técnica: Experiencia de México y el Grupo Iberoamericano de PISA. RELIEVE, 22(1), 1–12. doi: http://dx.doi.org/10.7203/ Murillo, V., Tommasi, M., Ronconi, L., & Sanguinetti, J. (2002). The economic effects of unions in Latin America: Teachers’ unions and education in Argentina (September). IDB Working Paper No. 171. Available at SSRN: https://ssrn.com/abstract=1814721 or http:// dx.doi.org/10.2139/ssrn.1814721 O’Donnell, C. L. (2008). Defining, conceptualizing, and measuring fidelity of implementation and its relationship to outcomes in K–12 curriculum intervention research. Review of Educational Research, 78(1), 33–84. doi: 10.3102/0034654307313793 OECD. (2014). PISA 2012 technical report. Paris: OECD Publishing. OECD. (2015a). PISA for Development capacity needs analysis: Ecuador. Paris: OECD Publishing. Retrieved from www.oecd.org/pisa/ aboutpisa/NEW_Ecuador%20CNA%20 ReportFINAL_SL.pdf OECD. (2015b). PISA for Development capacity needs analysis: Guatemala. Paris: OECD Publishing. Retrieved from www.oecd.org/pisa/ aboutpisa/NEW_Guatemala%20CNA%20 reportFINAL_SL.pdf OECD. (2015c). PISA for Development capacity needs analysis: Paraguay. Paris: OECD Publishing. Retrieved from www.oecd.org/pisa/ aboutpisa/Paraguay%20CNA%20report_ FINAL.pdf OECD. (2016a). PISA for Development Brief – 2016/07 (July). Paris: OECD Publishing.

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

Retrieved from www.oecd.org/pisa/aboutpisa/PISA-FOR-DEV-EN-1.pdf OECD. (2016b). PISA for Development capacity needs analysis: Honduras. http://www.oecd. org/pisa/pisa-for-development/HondurasCapacity-Needs-Analysis.pdf OECD. (2017a). PISA for Development capacity needs analysis: Panama. Paris: OECD Publishing. Retrieved from www.oecd.org/pisa/ pisa-for-development/Panama%20CNA_ FINAL.pdf OECD. (2017b). PISA 2015 technical report. Paris: OECD Publishing. OECD. (2018a). How to join PISA: Requirements for OECD partner countries and economies to participate in PISA. Paris: OECD Publishing. Retrieved from www.oecd.org/ pisa/aboutpisa/howtojoinpisa.htm OECD. (2018b). PISA participants. Paris: OECD Publishing. Retrieved from www.oecd.org/ pisa/aboutpisa/pisa-participants.htm Pereira, D., Perales, M. J., & Bakieva, M. (2016). Análisis de tendencias en las investigaciones realizadas a partir de los datos del proyecto PISA. RELIEVE, 22(1), art. M10. doi: http://dx.doi.org/10.7203/ relieve.22.1.8248 Peri, A., Sánchez-Núñez, M., Silveira, A., & Sotelo-Rico, M. (2016). Lo que PISA nos mostró: claroscuros de la participación de Uruguay a lo largo de una década. RELIEVE, 22(1), art. M4. doi: http://dx.doi.org/10.7203/ relieve22.1.8272 Ravela, P. (2002). ¿Cómo presentan sus resultados los sistemas nacionales de evaluación educativa en América Latina? Reporte PREAL, 22. Reimers, F. (2003). El contexto social de la evaluación educativa en América Latina. Revista Latinoamericana de Estudios Educativos (México), 23(3), 9–52. Roth, W. M., Oliveri, M. E., Sandilands, D. D., Lyons-Thomas, J., & Ercikan, K. (2013). Investigating linguistic sources of differential item functioning using expert think-aloud protocols in science achievement tests. International Journal of Science Education, 35(4), 546–576. doi: 10.1080/09500693.2012.721572 Rubin, V., Ngo, D., Ross, A., Butler, D., & Balaram, N. (2018). Counting a diverse nation: Disaggregating data on race and ethnicity to advance a culture of Health. New York:

161

PolicyLink. Retrieved from www.policylink. org/sites/default/files/Counting_a_Diverse_ Nation_08_15_18.pdf Ruiz-Primo, M. A. (2006). A multi-method and multi-source approach for studying fidelity of implementation. Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing. Ruiz-Primo, M. A. (2007). Assessment in science and mathematics: Lessons learned. In M. Hoepfl & M. Lindstrom (Eds.), Assessment of technology education, CTTE 56th Yearbook (pp. 203–232). Woodland Hills, CA: Glencoe-McGraw-Hill. Ruiz-Primo, M. A., & Li, M. (2015). The relationship between item context characteristics and student performance: The case of the 2006 and 2009 PISA science items. Teachers College Record, 117(1). Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage. Sireci, S. G., Patsula, L., & Hambleton, R. K. (2005). Statistical methods for identifying flawed items in the test adaptations process. In R. K. Hambleton, P. Merenda, & C. Spielberger (Eds.). Adapting educational and psychological tests for cross-cultural assessment (pp. 93–115). Hillsdale, NJ: Lawrence Erlbaum. Sjøberg, S. (2016). OECD, PISA, and globalization: The influence of the international assessment regime. In C. H. Tienken & C. A. Mullen (Eds.), Education policy perils: Tackling the tough issues. New York: Routledge. Solano-Flores, G. (2008). A conceptual framework for examining the assessment capacity of countries in an era of globalization, accountability, and international test comparisons. Paper presented at the 6th Conference of the International Test Commission, Liverpool, UK, 14–16 July. Solano-Flores, G. (2011). Language issues in mathematics and the assessment of English language learners. In K. Tellez, J. Moschkovich, & M. Civil (Eds.), Latinos/as and mathematics education: Research on Learning and Teaching in Classrooms and Communities (pp. 283–314). Charlotte, NC: Information Age Publishing. Solano-Flores, G., Backhoff, E., & ContrerasNiño, L. A. (2009). Theory of test translation

162

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

error. International Journal of Testing, 9, 78–91. Solano-Flores, G., & Li, M. (2013). Generalizability theory and the fair and valid assessment of linguistic minorities. Educational Research and Evaluation, 19(2–3), 245–263. Solano-Flores, G., Contreras-Niño, L. A., & Backhoff, E. (2013). The measurement of translation error in PISA-2006 items: An application of the theory of test translation error. In M. Prenzel, M. Kobarg, K. Schöps, & S. Rönnebeck (Eds.), Research on the PISA research conference 2009 (pp. 71–85). Heidelberg: Springer. Solano-Flores, G., & Li, M. (2009). Generalizability of cognitive interview-based measures across cultural groups. Educational Measurement: Issues and Practice, 28(2), 9–18. Solano-Flores, G., & Milbourn, T. (2016). Assessment capacity, cultural validity and consequential validity in PISA. RELIEVE, 22(1), M12. doi: http://dx.doi.org/10.7203/ relieve.22.1.8281 Solano-Flores, G., & Nelson-Barber, S. (2001). On the cultural validity of science assessments. Journal of Research in Science Teaching, 38(5), 553–573. Spaulll, Nick (2018). Who makes it into PISA? Understanding the impact of PISA sample eligibility using Turkey as a case study (PISA 2003–PISA 2012). Assessment in Education: Principles, Policy & Practice, 13 August. doi: 10.1080/0969594X.2018.1504742 Suter, Larry E. (2000). Is student achievement immutable? Evidence from international studies on schooling and student achievement. Review of Educational Research, 70, 529–545. Swaffield, S., & Thomas, S. (2016). Educational assessment in Latin America. Assessment in Education: Principles, Policy & Practice, 23(1), 1–7. doi: 10.1080/0969594X.2016.1119519 Tatto, M. T. (2006). Education reform and the global regulation of teachers’ education, development and work: A cross-cultural analysis. International Journal of Educational Research, 45, 231–241. Tremblay, K., Lalancette, D., & Roseveare, D. (2012). Assessment of higher education learning outcomes: Feasibility study report. Vol. 1 – Design and implementation. Paris: OECD Publishing.

Treviño, E. (2014). Factores asociados al logro de los estudiantes: Resultados del Segundo Estudio Regional Comparativo y Explicativo (SERCE), Informative Document, December, 2014. Presentation: Primera Entrega de Resultados, Comparación de Resultados, SERCE-TERCE, December 4, Asunción, Paraguay, www.mec.gov.py/cms. UNESCO. (2000). First international comparative study of language, mathematics, and associate factors in third and fourth grades of primary school. Santiago, Chile, June. Retrieved from http://unesdoc.unesco.org/ images/0012/001231/123143eo.pdf UNESCO, OREAL, Laboratorio Latinoamericano de Evaluación de la Calidad de la Educación. (2008). Student achievement in Latin America and the Caribbean: Results of the Second Regional Comparative and Explanatory Study (SERCE). Santiago, Chile, June. Retrieved from http://unesdoc.unesco.org/images/ 0016/001610/161045E.pdf UNESCO, OREAL, Laboratorio Latinoamericano de Evaluación de la Calidad de la Educación. (2015). Informe de resultados, Tercer Estudio Regional Comparativo y Explicativo: Factores asociados. Santiago, Chile, July. Retrieved from http://unesdoc.unesco.org/images/ 0024/002435/243533s.pdf UNESCO Regional Bureau for Education in Latin America and the Caribbean. (2017). A brief history of the Laboratorio Latinoamericano de Evaluación de la Calidad de la Educación. New York: UNESCO. Retrieved from www.unesco.org/new/en/santiago/pressroom/newsletters/newsletter-laboratory-forassessment-of-the-quality-of-educationllece/n14/a-brief-history-of-llece/ Universidad Nacional de Colombia Bogotá. (2012). OECD AHELO feasibility study: Institution report: Generic skills learning outcomes. Bogotá: Universidad Nacional de Colombia Bogotá. USDE-IES-NCES. (2018a). Progress in the International Reading Literacy Study (PIRLS). Washington, DC: National Center for Education Statistics. Retrieved from https://nces. ed.gov/surveys/pirls/countries.asp USDE-IES-NCES. (2018b). Program for international student assessment: Welcome to PISA 2015 Results. Washington, DC: National Center for Education Statistics. Retrieved

THE PARTICIPATION OF LATIN AMERICAN COUNTRIES IN INTERNATIONAL ASSESSMENTS

from https://nces.ed.gov/surveys/pisa/ pisa2015/index.asp USDE-IES-NCES. (2018c). TIMSS participating countries. Washington, DC: National Center for Education Statistics. Retrieved from https://nces.ed.gov/timss/countries.asp van de Vijver, F. J. R. (2016). Assessment in multicultural populations. In G. T. L. Brown & L. R. Harris (Eds.), Handbook of human and social conditions in assessment (pp. 436– 453). New York: Routledge. Villar, A., & Zoido, P. (2016). Challenges to quality and equity in educational performance for Latin America: A PISA 2012 perspective. RELIEVE, 22(1), M9. doi: http:// dx.doi.org/10.7203/relieve.22.1.8273 Ward, M., & Zoido, P. (2015). PISA for Development. ZEP: Zeitschrift für internationale Bildungsforschung und Entwicklungspäda­ gogik, 38(4), 21–25. World Bank Group (2018). Countries from the World Bank: Data. Washington, DC: World Bank. Retrieved from https://data.worldbank.org/country Wuttke, J. (2007). Uncertainties and bias in PISA. In S. T. Hopmann, G. Brinek, & M. Retzl

163

(Eds.), According to PISA: Does PISA keep what it promises? Berlin: LIT Verlag. Yang, X., & Embretson. S. E. (2007). Construct validity and cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 119–145). Cambridge: Cambridge University Press. Yildirim, H. H., & Berberoĝlu, G. (2009). Judgmental and statistical DIF analyses of the PISA-2003 mathematics literacy items. International Journal of Testing, 9(2), 108– 121. doi: 10.1080/15305050902880736 Zegarra, E., & Ravina, R. (2003). Teacher unionization and the quality of education in Peru: An empirical evaluation using survey data (January 2003). IDB Working Paper No. 182. Available at SSRN: https://ssrn.com/ abstract=1814733 or http://dx.doi.org/ 10.2139/ssrn.1814733 Zlatkin-Troitschanskaia, O., Shavelson, R. J., & Kuhn, C. (2015). The international state of research on measurement of competency in higher education, Studies in Higher Education, 40(3), 393–411. doi: 10.1080/0307 5079.2015.1004241

9 Validity Issues in Qualitative and Quantitative Research of Cross-National Studies Jae Park

INTRODUCTION This chapter looks into research design and analysis issues in international comparisons from the perspectives of education studies and comparative education. Education researchers usually have a salutary skepticism on whether students’ experience in education or entire education systems can be reliably compared across boundaries determined by elusive and complex contexts such as culture and systems. How can we be sure that our research faithfully reflects what research participants experience and do in their spoken/written/ survey report to us researchers? How can we be sure that our interpretations of a participant’s report are intellectually and scientifically plausible? Such questions include both external accountability and self-examination of the researcher. Researchers feel the need to outwardly justify the validity of their research and adhere to institutionalized screening (e.g., research ethic committees or research offices).

However, our desired goal is not only external accountability but also the justification of moral coherence, integrity and emotional satisfaction of the work performed. The affective dimension of the investigator might not be visible externally, but it is a non-negligible aspect of international comparison studies, particularly sociology and ethnography (Lakomski & Evers, 2012). The validity of empirical studies does not exclusively emanate from the quanta, such as a sufficient/great number of participants, ‘perfect sampling’, and large size/scope of a dataset. The momentous Big Data research, for example, tries to validate its claims with analysis of large-scale data with an ad hoc ‘interpretation’, such as multilevel analysis, path analysis and latent cluster analysis, yet they are mostly statistical models. The used dataset is, however, more often collected by other research by one party or multiple parties. When there is more than one source of data, where it is rather impossible to avoid

VALIDITY ISSUES IN QUALITATIVE AND QUANTITATIVE RESEARCH

the ‘lost in compiling’ phenomenon, it is in detriment of important nuances and details. Intriguingly, with large datasets, Big Data researches usually deem the necessity of a theoretical framework, which I would differentiate from statistical models. There is something uncomfortable here because they mostly draw on a dataset amassed by several investigations with unequal empirical frameworks and perhaps collected from unrelated or little related social/system contexts. Each one of them, in turn, may have compiled data from different sources and contexts, which is very common, for example, in meta-analysis. This usual methodological mismatch among source investigations posits the question whether the target investigation could render plausible and robust generalizations. The recourse to a unifying and all-encompassing research framework (not as a mere interpretive model) could and perhaps even should be abandoned or, at least, their generalizations falsified à la Popper, no less than in any small or medium-size research. Second, and a dated question, is whether the very idea of a scientific framework is a myth (Popper, 1994). It is not unusual for us to come across several seasoned academics in comparative and international education as peer reviewers for journals who unflinchingly demand a theoretical framework when the content of the paper does not call for any. This kind of blanket urge for and faith in borrowed frameworks can be criticized on the ground that certain types of purely theoretical explorations, such as philosophy of education and life and value education, do not need empirical frameworks. Against logics, comparable situations might occur in Big Data research. Let us perform a thought experiment. Suppose there is a Big Data set from two million refrigerators, impeccably sampled across countries, used or new, which indicate that all refrigerators have some unpleasant odors, from fresh plastic to camembert cheese. The study suggests the need of odorless refrigerators. What kind of ‘grand framework’ did this study and suggestion

165

need? In fact, it could be said that instead of relying on a borrowed framework, it has generated a law-like generalization cum plausible suggestions for problem-solving. I prefer to leave this fascinating problem of ‘framework by necessity’ to the readers of this chapter, only recommending Karl Popper’s related works in the philosophy of science (e.g., Popper, 1994). Radical reductionists in the philosophy of science dismiss qualia and they claim that every problem and existing beings will eventually and scientifically be explained solely with quanta (Churchland, 1981). In other words, all scientific problems will ultimately be explainable with the law that governs hydrogen and helium. We have to note here, however, that reduction is a common goal of all sciences, for example, as Popper foresaw, chemistry will be reduced to atomic and particle physics. However, for international comparatists and other education studies researchers, the qualia could be the most rewarding and interesting, yet all the more challenging, due to its call for analytical depth and discoveries of elusive particulars and nuances. The qualia aspect of any research, including hard-core empirical and quantitative studies, is important and non-negligible. For instance, a single infringement of research conduct, such as intrusion into a local culture and/or deception, could invalidate an entire ethnographic study. In addition, there are self-deceptions with the loss of scientific integrity and humility that are essential but invisible to external observers. The foregoing is a context-setting starter for this chapter. Now I move on to discuss the leitmotif of this chapter, namely research design and analysis issues in international comparisons. The opening discussion will be the problem of the gap between insider and outsider in international comparisons or cross-cultural studies. It will be followed by a critical analysis of the representation of experiences and observations, or what I call the ‘graphein predicament,’ by showcasing the

166

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

research in international comparisons with the Confucian Heritage Culture, which can be defined as a group of Asian nation-states with their motherland and overseas diasporas who share Confucian values, which consistently reflect in their social behavior and practices, including academic outcomes and learning approaches. The final section before the concluding remarks shall argue that the ethical dimension is the key to easing both the emic– etic tension and the twofold representation problem or graphein predicament in research design in international comparisons.

INSIDER–OUTSIDER TENSION IN INTERNATIONAL COMPARISONS The problem of mismatch between the reality perceived and explained by local/indigenousinsiders and researcher-outsiders was early on discussed by cross-cultural psychologist John W. Berry (1969, 2009). Berry’s comparative method in psychology included international comparisons of cultural, social and behavioral systems (1969). Berry opens his discussion with the works of the Polish ethnographer Bronisław Malinowski, expounded by Goldschmidt in 1966: ‘Malinowski was most insistent that every culture be understood in its own terms; that every institution be seen as a product of the culture within which it developed. It follows from this that a cross-cultural comparison of institutions is essentially a false enterprise, for we are comparing incomparables’ (cited in Berry, 1969, p. 120). Malinowski’s concern resembles the ‘incommensurability’ argument of Thomas Kuhn, that is, a change in the paradigm of the observer, when the same object under observation is perceived by the same observer so differently that s/he is unable to understand or measure it. Hence, it is incommensurable. Consequently, this postulation of the change in paradigm has started being considered as something desirable in education, with

mental breakthroughs such as creativity, lateral thinking and radically new perspectives. Kuhn’s theories (1970; 1977) also brought about the imperatives of a paradigm and theoretical frameworks in scientific research, and effectively popularized his philosophical conjecture with the term ‘paradigm shift’. However interesting and plausible, incommensurability can and should be falsified. The theory of relativity draws from Newtonian physics its concept of mass and speed; quantum physics boggles our minds because we have not completely abandoned past universals on atoms and particles. Even a fully successful application of quantum physics for strongly encrypted messaging would still rely on the old paradigms of human communication and the idea of science at the service of human development. Kuhn’s philosophy of science received welldeserved criticism from Karl Popper, who claimed that knowledge growth does not occur by total change in paradigm but by an evolutionary process of trial and error for the gradual development of science and society by problem solving. John Berry has suggested a three-stage adjustment for cross-cultural comparisons (1969). A first stage of determining functional equivalence is on either side of the cultures under study. A functional equivalence is a social behavior which ‘has developed in response to a problem shared by two or more social/cultural groups, even though the behaviour in one society does not appear to be related to its counterpart in another society’ (Berry, 1969, p. 122). Berry’s second and third stages of adjustment for cross-cultural ethnography are identifying descriptive categories (equivalent to taxonomy or rubrics in education) and creating an adequate measuring instrument. In regard to the faultline between insider and outsider positions, John Berry borrowed the terms emic and etic from Kenneth Pike (1967) and suggested two research approaches in comparative studies (Berry, 1969, p. 123, adapted by author) (see Table 9.1).

VALIDITY ISSUES IN QUALITATIVE AND QUANTITATIVE RESEARCH

167

Table 9.1  Comparison of emic and etic approaches Emic approach

Etic approach

•  from within the system •  examines only one culture •  structure discovered by the analyst •  criteria are relative to internal characteristics

•  from a position outside the system •  compares one or more cultures •  structure created by the analyst •  criteria are considered absolute or universal

The foregoing brings us to the core of the research design and analysis issues in international comparisons. If the above synthesis is to be given credit, for us education researchers in the year this Handbook is published (2019), the international comparisons of any kind (among nation-states or among cultures) would prove to be a futile if not ‘fake’ attempt if we dare to claim to take a distilled emic approach. Even an ethnography ‘from within’ and with one group of people is nothing of a comparison. Nobody can be sure of how long is enough time, if ever, for the researcherobserver to become ‘entitled’ to use the emic approach and how s/he was able to devoid of her/his own culture to become a ‘true’ local. More often, therefore, international comparisons rely on an etic approach. The social structure reported by the researcher is, thus, never something discovered from within but is always ‘framed’ from outside. Let me illustrate the problem with a case analysis. Instead of aboriginal and ethnographic studies, I choose a global case involving the histories of many current nation-states, particularly in the developing world, which has been my latest research on colonial and postcolonial studies (Park, 2017). Colonization was a major source of emic–etic tension across all social institutions in the occupied territories. Scientists were not an exception. They took part not only in the colonization process but also constituted an intellectual community at the service of imperial apparatuses and fed colonialism with specific techniques of dominance and power. Knowledge production, the gradient of power and the scientific superiority that they generated caused tensions, if not rebellions.

Scientists viewed colonial territories as uncharted locations to explore and enlarge their knowledge production through discoveries of new places, living species and human races/groups. Freire was not wrong in complaining about cultural invasion: ‘In cultural invasion the actors … superimpose themselves on the people, who are assigned the role of spectators, of objects. In cultural synthesis, the actors become integrated with the people, who are co-authors of the action that both perform upon the world’ (Freire, 1972, p. 147). The greatest loss under colonization qua an imposed structure of emic–etic tension was the loss of subjectivity and autonomy. The enacted system of differentiation – colonized and colonizer – in power relations was the consistent start to such a process (Foucault, 1983). Once reified its top-down power structure, colonization widens the emic and etic gap, and the local subjectivity and autonomy are replaced by a deeply-seated idiosyncrasy that the colonized (emic) will never become or catch up with colonizers (etic). It gives way to the emergence of a new existential condition, say, a subject who is ‘deferential, glad to be of use’, borrowing from T. S. Eliot. The urge to catch up with the West still influences many former colonial territories and their social institutions, such as universities today. Imperial investigative machineries, from universities to espionage agencies, were not neutral to imperial power structures. They played key roles in geopolitics, for example, looting of cultural heritage and seeding lasting chaos in the former Ottoman territories to preserve as much as possible the exploitation and enrichment of the Commonwealth. They sometimes functioned in the name of indigenous

168

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

welfare, such as the case of the British Colonial Research Committee (1942–1948), which later became the Colonial Research Council (1948–1959) (Mishra, 2012). Colonial researchers, so to speak, often took advantage of the politically patronized emic– etic fissure and it had its own research dynamics. By dynamics in international comparative studies, I mean the research rationale and ensuing action, that is, from deciding a unit of inquiry through the careful choice of research framework to fieldwork and analysis. Colonial subjects became an object of aporia (a sense of wonder and intellectual curiosity) for imperial scientists. Between the etic researchers and the emic research participants or assistants, a distinct relation was established: ‘the “West” is equipped with universalist “theory” and the rest of us have “particularist” empirical data, and eventually in writing, “we” become a footnote to either validate or invalidate theoretical propositions. Hence, theoretically minded researcher vs. native informant’ (Chen, 2003, p. 879). This phenomenon can be understood with the Hegelian description of a basic human experience, namely the ‘master– slave relation’ in his The Phenomenology of Mind (Hegel & Baillie, 1949). In regard to the subject at hand in this section, the emic role of research participants or assistants and the etic role of investigator are clearly differentiated. Up until now, we can see some of the consequences of such differentiation and prerogatives, for example, publications with multiple authors where the first and corresponding author is usually the budget-holder of the project regardless of the actual amount of research work and contribution to the publication. Today, colonial and postcolonial studies are mostly devoted to critical historiography and they usually denounce the negative consequences of colonialism/imperialism to the present. These studies offer alternative voices to the ongoing globalization, led once more by traditional capitalists and new forms of imperialism by neoliberal nation-states. This is widely covered by the media with glamorous

G-summits in tuxedos and evening gowns, and with aboriginal attires only for photo sessions. Colonialism and its consequences are an illustration. The emic–etic tension is common to all investigative endeavors in international comparative studies involving human research participants. In terms of approach to such comparisons, the etic approach overwhelmingly dominates, even in studies conducted by a researcher in his/her own country (e.g., minority studies).

Predicament of Twofold Representation The emic and etic tension is not the only problem in research design and analysis in international comparisons. A second counterpoint I push to the fore is the problem of representation, that is, the disparity between the phenomenon (experience) of research participants, such as students and their self-declared personal experience. The concern for ‘triangulation’ in this case is indicative of the etic researcher’s suspicion about the validity of the information, if not even remorse and placebo for the pain of intrusion caused. An additional glitch in representation is unavoidably generated by the researcher when s/he mentally categorizes and interprets what research participants have reported. The data might be recorded, and its analysis perhaps aided by software, yet, whatever they render will keep a distance from what research participants meant or displayed. After all, it is usually a prerogative of the researcher to cluster, code and categorize the data set where a part of data might be regarded as negligible. This problem also occurs when a researcher (or her/ his assistant) observes externally performed acts with a pre-set method that is often blind to nuances and actors’ innermost intentionality. The locus of the problem of representation in international comparisons is therefore twofold. First, in the minds of participants and, second, in a theoretically loaded researcher’s brain cortex.

VALIDITY ISSUES IN QUALITATIVE AND QUANTITATIVE RESEARCH

The same predicament may also emerge during hermeneutics of policy documents or other sorts of texts, when the researcher tries to fathom, for example, the spirit in which the document was written. In the field of legal studies, this veiled yet huge challenge is called epikeia, a Greek term denoting benign interpretation, yet better known in legal circles as an interpretation of a law in the spirit of the one who wrote it. Among international comparisons, one deserves particular attention – interpretive intervention. Interpretation is a representation of a reality, but it is never the reality itself. The Kantian category of noumena in the Critique of Pure Reason (Kant, 1781/1998) as ‘things in themselves,’ ‘intelligible existences,’ or, in the singular, the ‘object in itself’, is directly related to the problem of representation: ‘if we assume that our representation of things as they are given to us does not conform to these things as they are in themselves, but rather that these objects as appearances conform to our way of representing, then the contradiction disappears’ (Kant, 1781/1998, p. xx). By ‘things as they are in themselves’ in this important passage of his first Critique, Kant means noumena. In Kantian Inmanentism, our representation of things in mundus sensibilis, namely our experience or phenomenon, might not correspond to noumena (the ‘real reality’) of mundus intelligibilis. If we take the Kantian stance, as John Rawls did in political philosophy and Lawrence Kohlberg did in moral psychology, we will be pushed to dismiss or devaluate the phenomenon (experience). This could be subject to criticism. In the same passage quoted above, for example, we could argue that our representation of things in the mundus sensibilis is never ‘given to us’ as we are not passive recipients but free agents, which is paradoxically one of the core claims of Kant. Instead, we can contest that representation of the world is done by our intellect and as a free agent. For our discussion in this chapter, it suffices to say that in international comparisons,

169

the representation by personal experience and collective experiences are never passive and ‘given’ to researchers. They are, instead, a researcher’s intellectual construct of what research participants reported about their actions and contexts. Is this double representation performed by the researcher free of limitations? Mostly likely it is imperfect. The mundus sensibilis and the mundus intelligibilis are not in totally opposing positions because, in simple terms, only what people perceive through their senses becomes intelligible. The reverse also holds true: people’s intellectual state can affect their mundus sensibilis greatly, for example, the threshold of pain is said to be lower in more intelligent people. Dissenting with Kant, we could say that the noumenon might be far from reality or inaccurate as it dismisses human experience. For example, a researcher’s representation of why male Sikhs do not cut their hair and why male Koreans during the Joseon dynasty did likewise could vary widely in their rationale and value system. Hence, in social science in general and international comparisons in particular, we must pay more attention to phenomena over noumena. When an interpretivist researcher has not been adequately exposed to the culture in the research site, s/he should be particularly observant wherever and whenever an interpersonal relationship happens. To do otherwise, the researcher could interpret results with distortions or biases. A number of practical advices to interpretivist research were suggested by Taylor and Waldrip (1999), such as: initial contact which is tightly organized with local participants; prolonged engagement and observation; understanding the relation between ownership of knowledge (information) and social hierarchy; using careful triangulation to validate interviews; overcoming language barriers; and maintaining the interview process as sustainable by having an awareness of local culture, for example, gender-related misunderstandings. Conducting international comparison research necessitates analogous insistences in

170

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

all interpersonal contact points, while ensuring there are high requirements of fairness, trust and acceptance by locals. All translated research tools, for example, interview questionnaires, research instruments and inventories, should aim to have reliability by verifying their internal consistency and stability in time (Banville, Desrosiers, & Genet-Volet, 2000; Behling & Law, 2000; Brislin, 1986; Vallerand et al., 1992). Ideally, international comparison researches should be conducted in the participants’ native language, with the assumption that any translations are a liability. David Bridges and his defense of outsider research merits our consideration (Bridges, 2001). He correctly describes that an outside research could be easily stereotyped by insiders as intrusive, disrespectful and disempowering because ‘an outsider cannot understand or represent accurately a particular kind of experience’ (Bridges, 2001, p. 372). He continues, however, that outsider research can still contribute to the understanding of all participating parties and that constraining outsider research would only be the cause of epistemological and moral isolation (ibid., p. 381). Outsider research is then justified by Bridges by indicating that experience of insider and its representation by outsider are basically not the same and that none has special authority in validity (ibid., p. 374), adding that an optimal condition for outsider research could be accomplished by including cordiality, respect, openness to criticism and so on (ibid., p. 384). Bridges’ diagnosis of the problem echoes what I have discussed above. However, we need to take into account that, first, his optimal conditions are ideal scenarios, which cannot be naturalistically coherent and simplest (Ockham’s razor). Second, the principal problem that concerns Bridges, namely emic–etic tension, requires a methodological solution and he does not suggest any. In my opinion, the core of methodological problems is the inconsistency between phenomenon as experienced and graphein (Greek verb, ‘to write’). The issue here is

how fairly the researcher is representing the experiences of others. Let us consider children’s experience as a case example. We educationalists consider children very seriously where research is concerned because we are aware that their experiences are unique, culturally permeated, immanent and nontransferable. When a researcher describes the experiences of children, there is the implication that sense-data endure a second interpretive intervention (the first being that of the children) and this, in turn, implies an unavoidable distortion. Thomas Nagel (1974) indicated an epistemological problem of mind–body theorization, proposing that an alien experience, be it that of a bat or a person, is unfathomable for outsiders, and that radical reductionists and physicalists cannot conceivably articulate any credible epistemology apart from their own, and possibly not even their own. In my opinion, an answer to this concern of disparity between phenomenon and graphein is to openly and systematically reveal the variation between the experience-description of research participants and the experience-description of the researcher. Indeed, this is the type of gap that the sub-field phenomenography has been attempting to bridge for the past 35 years in education (Marton, 1981, 1988; Pang, 2003). What is pertinent to the present argument is not phenomenography qua pedagogical tool (Marton’s original goal), but rather its use of variation as research methodology. Only recognition and inclusion of a variation factor can break the interpretive upper-hand of a researcher. Emic–etic variation comes into sight through behavior which, in turn, depends significantly on culture. An international comparison research should consider the association between behavioral patterns and culture. I illustrate its importance with some extant scholarship on the Confucian Heritage Culture (CHC). When students with CHC see a fish tank picture, for example, they are inclined to attribute the way fish move to the environment and other animals living in the fish tank, whereas

VALIDITY ISSUES IN QUALITATIVE AND QUANTITATIVE RESEARCH

Western students do so to the fish themselves that are on the foreground. Nisbett and his collaborators (Nisbett & Miyamoto, 2005; Nisbett, Peng, Choi, & Norenzayan, 2001) conducted similar experiments and argued that significant variation occurs between the two groups, which seems paramount for data analysis in CHC-based research: ‘East Asians [are more] holistic, attending to the entire field and assigning causality to it, making relatively little use of categories and formal logic, and relying on “dialectical” reasoning. Westerners are more analytic, paying attention primarily to the object and the categories to which it belongs and using rules, including formal logic, to understand its behaviour’ (Nisbett et al., 2001, p. 291). Empirical findings of Nisbett and others propose that CHC informs behavior on the surface and, simultaneously, that it communicates hidden metaphysical categories (e.g., accident–substance, essence–existence) and epistemological processes, including beliefs and values. Jin Li’s empirical research findings (2002) partially corroborate Nisbett’s CHC children’s belief on learning, and their motivation to learn are culturally forged and as an act of valuation of te (virtue in Chinese), such as diligence and personal effort as opposed to the Western model that values more talent, innate smartness and the biological basis of intelligence. It is in this light that parental socialization seems to be a powerful stimulus on CHC children’s educational values and attitudes, with epistemological consequences. In my opinion, research on CHC students should preferably include information and data from parents. Finding a solution to the concern of disparity between phenomenon and graphein inherent to cross-cultural and international comparative education research will not completely placate the emic–etic tension. The reason for this is that a phenomenon and graphein inequality implies different degrees of alterity, cultural intrusion and disempowering. This state of affairs creates an issue of research ethics.

171

Ethics as the main Challenge of International Comparisons In my opinion, ethics is the key challenge of international comparative research. Furthermore, any research is inextricably a moral act insofar as free agents partake in all stages of research, and not only when, for example, researchers are adhering to clearance of the ethical committee. In this section, I will examine the sui generis methodological problems in international comparative research. A research framework consists of a conceptual component and an outlook, a body of a pre-existing theories, also more simply referred to as the research paradigm. This offers the researcher epistemic soundness during the course of all the inquiry stages. In my interpretation, this and comparable ‘Western’ research dynamics and their basic rationale are sine qua non conditions to which any international comparisons in the educational field should pledge. Essentially, there are no other alternatives for international comparison researchers. They are obliged to comply with time-honoured ‘Western methodologies’ in the conduct of their work in order to be recognized both intraculturally and transculturally. If their researches are to warrant any consideration, there is a need for them also to make sense to, for example, Central Asian, African and Latin-American heritage cultures and the world research community at large. It is impossible for researchers to casually take Polanyi’s principle of mutual control, ‘[the] simple fact that scientists keep watch over each other’ (1983, p. 72). Jin Li, for example, argues that the difficulty is positioned in a ‘continual reliance of researchers on Western concepts without considering indigenous or emic cultural meanings and their psychological manifestations’ (Li, 2009, p. 42). She adds three further dilemmas: (1) dichotomous conceptual frameworks of the West; (2) dominant Western individualistic and culturefree anthropological assumptions; and (3)

172

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

understanding education univocally as a means to social mobility (Li, 2009). Possibly, the very emic and etic dualism itself is a strong essentialization of research that is of a culturally sensitive nature. A superetic position that can be considered, for example, is an overseas research consultant who is simultaneously studying and writing a report that has been one of those accumulating. It is possible that the researcher might only be an outsider, but s/he will also be an individual who has a resentment or hyper-bias about the population being studied. There could also be considered an infra-emic situation: a local research participant’s lowest rank within a research team, which very frequently is unbelievably hierarchical. Expressed differently, the variation within as well as without could be considered as a continuum rather than two opposing poles. If we were unconvinced, we could also question how long it takes for an etic researcher to become ‘reasonably emic’ to allow contact with research participants. Are six months adequate for an etic researcher to become fully accepted as an ‘insider’ in a school? Maybe eight or twelve? (We must remember that history can often be the toughest judge of the validity of invasive research undertaken in the past). These quandaries, however, appear not to be exclusive of the West. Asian researchers who have both Asian and Western backgrounds do not seem to be particularly immune as many received their training either in the West or in academic environments in Asia that could be considered as being heavily ‘Westernized’. Moreover, even those who were not will be unable to escape from Western philosophies of science, professional ethos and fora, all of which, frankly, come together in the dominant Western Anglo scholarly publication set-up. This does not mean, however, that such researches have less validity but, instead, to say that, more often than not, they employ the same dichotomous conceptual frameworks when pointing the finger at ‘Western misunderstandings’. One such example is the empirical research series on the ‘Paradox of the Chinese Learner’ by Biggs and

Watkins (cf. Biggs & Watkins, 1996; Watkins, Biggs, & Comparative Education Research Centre, 2001). While observing Western research requirements, I choose Karl Popper’s tetradic schema for knowledge growth in order to analyze critical areas of international comparative research methodology. This schema is favored over Baconian inductive models for the following reasons. First, it is not essential for international comparisons to be inductive. For example, research that is conducted on art critiquing education employs a deductive method – general aesthetic notions permit the interpretation of particular cases (Eisner, 1985). Second, the schema is a research model with the ability to dispel the obsession and determination for total generalizability and predictability that some international comparative research circles still aspire. Popper (1963) outlined his schema as follows: P1 → TT1 → EE1 → P2 → TT2  To explain this more straightforwardly: a researcher discovers an intriguing problem (P1); in order to explain it, s/he attempts a first tentative theory (TT1); subsequently, any errors are eliminated (EE1) in order to become conscious of an ensuing problem (P2). Ultimately, this endless process is an ‘ever discarding errors’ or ‘problem solving by trial and error’ heuristics – the core of Popper’s philosophy of science, epistemology and social theory. Next, I develop the above in order to identify how ethical issues and methodological issues overlap in research. The selection of an initial research problem (P1) from the hundreds that exist on the phenomenological horizon are dependent upon the exertion of the researcher’s freedom and choice. Bafflement or sense of wonder, namely aporia, is how all human inquiries start, including science, and is the introductory lead to his/her selection of one or more research problems, followed by assigning subjective relevance to each. The act of selecting and determining a research problem while demoting all others, and not altering the choice lightly without it

VALIDITY ISSUES IN QUALITATIVE AND QUANTITATIVE RESEARCH

being tested, suggests a moral act of a free and rational agent. Ethical-methodological problems are already visible at this stage, for example, turning research participants into a ‘case’ or a ‘research problem’, that is, the risk of ‘problematizing’ or ‘pathologizing’ them (Nind, Benjamin, Sheehy, Collins, & Hall, 2004). Broadly speaking, the entire international comparative research is about ‘problematizing’ people with different nationalities or culture by classifying them following the researcher’s mental rubric. At the very beginning of the research, a process of ‘cultural invasion’ and interruption of local social conventions occur. More than one tentative theory (TT) or method is accessible to a researcher. However, is s/he entirely able to select any? Karl Popper’s answer is unexpectedly illiberal. He contends that the fittest theory should be selected. The dilemma here is that all of the available theories are in the ‘still-tobe-tested’ stage and no test result is known yet, and there can be neither inner certainty nor outer justification for its validity. At this point, Popper’s argument takes a subtle turn: ‘the testing of a theory depends upon basic statements whose acceptance or rejection, in its turn, depends upon our decisions. Thus, it is decisions which settle the fate of theories’ (Popper, 1959, p. 108). The kind of decision he is discussing is a free decision: ‘With the conventionalist I hold that choice of any particular theory is an act, a practical matter … I differ from positivist in holding that basic statements are not justifiable by our immediate experiences, but are, from logical point of view, accepted by an act, a free decision’ (Popper, 1959, p. 109). An international comparative researcher’s freedom plays a significant role in the choice of theoretical framework and, in addition, freedom is the definitive justification of the reasons why a researcher selects a theoretical framework in preference to the others available. It should be mentioned that the naturalistic criteria to select a theoretical framework should not be ignored. To define naturalistic

173

criteria, I suggest something like ‘empirical adequacy, or squaring with observational evidence, consistency, simplicity, comprehensiveness, explanatory unity, fecundity and learnability’ (Evers & Lakomski, 2001, p. 503). Theoretical framework choice should be both purposeful and rational, yet a decision is dependent on the freedom a researcher has and not on distilled and pure sola ratio. This is a similar stance on which Kuhn, Lakatos and Feyerabend as well as Popper can be roughly referred to as irrationalists (Stove, 1982). The dualism of Immanuel Kant’s freedommoral law offers researchers clues in order to comprehend the captivating association between free decision and the fittest theory. His second Critique states that ‘freedom is indeed the ratio essendi of the moral law, the moral law is the ratio cognoscendi of freedom’ (Kant, 1997, p. 4). A vernacular reading of this could be: Moral law occurs due to human freedom, and the reason why we attempt to find out more about our freedom is due to there being a moral law that assists in maintaining and monitoring our freedom. It also puts forward the notion of mutual dependence between freedom and moral law on existential and epistemological reasoning. Towards the end of his productive life, in a remote footnote of his Anthropology from a Pragmatic Point of View, Kant himself defined freedom quite cryptically, as briefly as ‘pure spontaneity’ (2006, p. 30). This can be classed as the converging point of ethical and methodological issues in research, suggesting that critical issues in research methodology happen wherever and whenever any moral interactions occur among research actors. This stage of the tentative theory testing includes research framework formulation, fieldwork, and data collection, and is particularly rich in interpersonal relationships. Two issues might be considered especially pertinent to international comparative research. First, it is just implausible to conduct international comparisons without any ‘research biases’ being apparent or a-theoretically. To some extent, every research framework itself

174

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

is a hard-core bias. From an ethical perspective, participants in international comparative research could create some baffling difficulties to an etic approach to research. A continuum of etiquette–ethics could inaudibly yet critically twist the research process, everything from the planning stage to the participant’s consent via interview participation and outcome, and even in experimental research (cf. Brashen, 1974). After testing a freely chosen theory, research outcomes can be analyzed. Karl Popper argues that in science, knowledge is attained by error elimination (EE), the act of discarding a wrong hypothesis or inconsistent, and therefore false, theories, which have unsuccessfully resolved the problem (or, for disciplines alike to sociology and education, theories that have failed to enhance understanding or have no better descriptive power when compared with previous ones). In Darwinian biology, one of Popper’s favorite examples was that when a tentative solution (mutation) does not work, the error elimination materializes to be fatal ‘for the bearer of the mutation, for the organism in which they occur’ (Popper, 1999, p. 5). It would appear that the error elimination stage is no less critical for educational research and cross-cultural and international comparative studies. At this later stage of research, a moral dilemma is confronted by researchers – they either discard errors together with the failed theories or they continue to drag on with them. The predicament here, however, is whether the researcher is in a position or not to recycle a theory that has failed to clarify the problem that puzzled him/her at the beginning of the research project. Is the international comparative research invulnerable to this critical view of science and research methodology? Let us consider this more closely. A characteristic of social science and humanities research is already restricted to a limited number inquiry areas (e.g., ‘assessment,’ ‘administration-policy’ or ‘language instruction’ in the field of education) where researchers are more often

than not confined to their own specialism for decades, or even possibly for their entire life. The possible imagery here is that of a scholar with a wrinkled forehead who is only dedicated to a rigid and small set of research problems and frameworks duplicating inbred variables (disciplines, cultures, social sectors and organizations), investigating the same research area for life. Such researchers will claim to have an outcome that is both positive and enlightening and that will be a contribution to a field, or, as meta-analyst John Hattie has recently described, the ‘what-worksrecipe’ or ‘theories of their own about what works (for them)’ (Hattie, 2009, p. 3) in an academic world that is agitatedly publicationcentered and output-dependent (Adler & Harzing, 2009). Putting reality-check, selfcriticism and laments aside, whatever the reasoning, the point here is that it is uncommon for a second or a third theory to be subsequently tested in a same research project. Any effort to forcibly adjust the emerging data with a failed hypothesis or theoretical framework would equate to a regress for the research, if not a full-blown deception. Such hesitation in eliminating errors is known as akrasia, an enigma of moral philosophy, and a failure in self-dominion or lack of motivation (free will) to follow the rules of reason or, in our scenario, scientific rationality. Popper states that after EE1, a new problem (P2) materializes (cf. Popper, 1972, p. 288). It would appear that the term emergence is probably inaccurate here since an offspringproblem is not spontaneous but a subsequent version of an earlier one. The researcher now needs to consider the ‘tail’ of the first cycle and the ‘head’ of the second cycle. The same ‘tail’ is the result of the inquiry process that has just finished, due to the researcher upholding rational and ethical rigor of scientific inquiry. Transforming the ‘tail’ into the ‘head’ of the second cycle can follow if the researcher is able to freely and consciously make the decision to take it on. There are also, however, ethical implications. If considered ethical, a second cycle of the schema can

VALIDITY ISSUES IN QUALITATIVE AND QUANTITATIVE RESEARCH

be initiated, as long as the researcher wishes to and is able to freely do so. Popper candidly defined his Critical Rationalism in the following way: ‘I may be wrong and you may be right, and by an effort, we may get nearer to the truth’ (Popper, 1994, p. xiii). He further explained what he was really alluding to. It is ‘a confession of faith, expressed simply, in unphilosophical, ordinary English; a faith in peace, in humanity, in tolerance, in modesty, in trying to learn from one’s own mistakes; and in the possibilities of critical discussion. It was an appeal to reason’ (ibid.). Perhaps it is neither the methodological precision nor sophistication and logics in publishing international comparison research what will eventually determine the problem of emic–etic tension and the problem of representation. In addition to these, the imperative principle to international comparison research could be the moral position of researcher, her/his humility to recognize that limitations can exist (as opposed to common excuses of limited generalizability due to scarce sample size), and her/his absolute disposition of respect towards others.

FINAL REMARKS To conclude, we are living in a world that aims its cannons at enemy targets and our car console still displays a speed based on Newtonian physics. Our grasp of Relativity theory in whatever level is still dependent on Newtonian concepts such as mass and speed, while we pay attention to what the next ‘real use’ of quantum physics will be. Yet none of the so-called past models and our corresponding world view is totally bygone. Science seems to follow the fate of postmodernity, where many past knowledge and theories coexist with the only difference that we now look at them with irony and a salutary dose of incredulity (Lyotard, 1984). We are said to be in a fast globalizing world. As we contemplate and experience with awe

175

the quickly changing contexts, some of us might even opt to deny or resist the rampant neoliberalism engulfing our mundus sensibilis. I believe that, want it or not, international comparisons in many fields, including sociology and education, have a timely and important role to play for human development. International comparisons have a great relevance today and the potential to expand in the future. I am personally skeptical that only Big Data research can meaningfully contribute to this process. Another reason for this optimism (but never in a naïve sense) is the fact that we human beings, by nature, cannot learn without analogy and comparisons. The brain processes information through synapsis and neural networks. From brain science to philosophical epistemology, comparative studies are in a solid position to attain further advances in knowledge. To do so in social sciences and education, this chapter has argued, we need to overcome the emic–etic tensions in international and intercultural comparisons and be realistic about our limitation to tackle the problem of twofold representation. It was suggested, therefore, that only an authentic recognition, ethical soundness and ‘scientific humility’, as suggested by Karl Popper, is able to ease such predicaments.

REFERENCES Adler, N. J., & Harzing, A.-W. (2009). When knowledge wins: Transcending the sense and nonsense of academic rankings. The Academy of Management Learning and Education (AMLE), 8(1), 72–95. Banville, D., Desrosiers, P., & Genet-Volet, Y. (2000). Research note. Translating questionnaires and inventories using a cross-cultural translation technique. Journal of Teaching in Physical Education, 19(3), 374–387. Behling, O., & Law, K. S. (2000). Translating questionnaires and other research instruments: Problems and solutions. Thousand Oaks, CA: Sage.

176

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Berry, J. W. (1969). On cross-cultural comparability. International Journal of Psychology, 4(2), 119–128. Berry, J. W. (2009). Imposed etics-emics-derived etics: The operationalization of a compelling idea. In P. B. Smith & D. L. Best (Eds.), Crosscultural psychology (Vol. 1, pp. 54–64). London: Sage. Biggs, J. B., & Watkins, D. (1996). The Chinese learner: Cultural, psychological and contextual influences. Hong Kong: Comparative Education Research Centre. Brashen, H. M. (1974). Research methodology in another culture: Some precautions. Paper presented at the Annual Meeting of the International Communication Association. April 17–20, New Orleans, LA, USA. Bridges, D. (2001). The ethics of outsider research. Journal of Philosophy of Education, 35(3), 371–386. Brislin, R. W. (1986). The wording and translation of research instruments. In W. Lonner & J. Berry (Eds.), Field methods in cross-cultural research (pp. 137–164). Beverly Hills, CA: Sage. Chen, K.-H. (2003). Civil society and Min-Jian: On political society and popular democracy. Cultural Studies, 17(6), 877–896. Churchland, P. M. (1981). Eliminative materialism and the propositional attitudes. The Journal of Philosophy, 78(2), 67–90. Eisner, E. W. (1985). The art of educational evaluation: A personal view. London: Falmer Press. Evers, C. W., & Lakomski, G. (2001). Theory in educational administration: Naturalistic directions. Journal of Educational Administration, 39(6), 499–520. Foucault, M. (1983). The subject and power. In H. Dreyfus & P. Rabinow (Eds.), Michel Foucault: Beyond structuralism and hermeneutics (2nd Ed., pp. 208–226). Chicago, IL: University of Chicago Press. Freire, P. (1972). Pedagogy of the oppressed. London: Penguin Books. Hattie, J. (2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. London and New York: Routledge. Hegel, G. W. F., & Baillie, J. B. (1949). The phenomenology of mind (2nd Ed.). London: Allen & Unwin.

Kant, I. (1997). Critique of practical reason (M. J. Gregor, Trans.). Cambridge: Cambridge University Press. Kant, I. (1781/1998). Critique of pure reason (P. Guyer & A. W. Wood, Trans. and Ed.). Cambridge and New York: Cambridge University Press. Kant, I. (2006). Anthropologie in pragmatischer Hinsicht [Anthropology from a pragmatic point of view]. Cambridge and New York: Cambridge University Press. Kuhn, T. S. (1970). The structure of scientific revolutions (2nd Ed.). Chicago: University of Chicago Press. Kuhn, T. S. (1977). The essential tension: Selected studies in scientific tradition and change. Chicago: University of Chicago Press. Lakomski, G., & Evers, C. W. (2012). Emotion and rationality in educational problem-solving: From individuals to groups. Korean Journal of Educational Administration, 30(1), 653–677. Li, J. (2002). A cultural model of learning: Chinese ‘heart and mind for wanting to learn’. Journal of Cross-Cultural Psychology, 33(3), 248–269. Li, J. (2009). Learning to self-perfect: Chinese beliefs about learning. In C. K. K. Chan, N. Rao, & Comparative Education Research Centre (Eds.), Revisiting the Chinese learner: Changing contexts, changing education (pp. 35–69). Hong Kong: Comparative Education Research Centre, The University of Hong Kong. Lyotard, J.-F. (1984). The postmodern condition: A report on knowledge. Manchester: Manchester University Press. Marton, F. (1981). Phenomenography: Describing conceptions of the world around us. Instructional Science, 10(2), 177–200. Marton, F. (1988). Phenomenography: A research approach to investigating different understandings of reality. In R. R. Sherman & R. B. Webb (Eds.), Qualitative research in education: Focus and methods (pp. 141– 161). London: Falmer Press. Mishra, P. (2012). From the ruins of empire: The revolt against the West and the remaking of Asia. London and New York: Allen Lane. Nagel, T. (1974). What is it like to be a bat? The Philosophical Review, 83(4), 435–450. Nind, M., Benjamin, S., Sheehy, K., Collins, J., & Hall, K. (2004). Methodological challenges

VALIDITY ISSUES IN QUALITATIVE AND QUANTITATIVE RESEARCH

in researching inclusive school cultures. Educational Review, 56(3), 259–270. Nisbett, R. E., & Miyamoto, Y. (2005). The influence of culture: Holistic versus analytic perception. Trends in Cognitive Sciences, 9(10), 467–473. Nisbett, R. E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: Holistic versus analytic cognition. Psychological Review, 108(2), 291–310. Pang, M. F. (2003). Two faces of variation: On continuity in the phenomenographic movement. Scandinavian Journal of Educational Research, 47(2), 145–156. Park, J. (2017). Knowledge production with Asia-centric research methodology. Comparative Education Review, 61(4), 760–779. Pike, K. L. (1967). Language in relation to a unified theory of the structure of human behavior (2nd rev. Ed.). The Hague: Mouton. Polanyi, M. (1983). The tacit dimension. Gloucester, MA: Peter Smith. Popper, K. R. (1959). The logic of scientific discovery. London: Hutchinson. Popper, K. R. (1963). Conjectures and refutations: The growth of scientific knowledge. London: Routledge and Kegan Paul.

177

Popper, K. R. (1972). Objective Knowledge: An evolutionary approach. Oxford: Oxford University Press. Popper, K. R. (1994). The myth of the framework: In defence of science and rationality (M. A. Notturno Ed.). London and New York: Routledge. Popper, K. R. (1999). All life is problem solving. London and New York: Routledge. Stove, D. C. (1982). Popper and after: Four modern irrationalists. Oxford: Pergamon Press. Taylor, P. C., & Waldrip, B. G. (1999). Standards for cultural contextualization of interpretive research: A Melanesian case. International Journal of Science Education, 21(3), 249–260. Vallerand, R. J., Pelletier, L. G., Blais, M. R., Briere, N. M., Senecal, C., & Vallieres, E. F. (1992). The Academic Motivation Scale: A measure of intrinsic, extrinsic, and amotivation in education. Educational and Psychological Measurement, 52(4), 1003–1017. Watkins, D., Biggs, J. B., & Comparative Education Research Centre. (2001). Teaching the Chinese learner: Psychological and pedagogical perspectives. Hong Kong: Comparative Education Research Centre, The University of Hong Kong.

10 Mixed Methods in Education: Visualising the Quality of Quantitative Data D a v i d A . Tu r n e r

INTRODUCTION It has become common in recent years to speak of ‘mixed methods’ as a combination of quantitative and qualitative methods, as though quantitative and qualitative methods are distinct, unitary and homogeneous. Brannen (2005) offers a more nuanced understanding of qualitative and quantitative methods, but still argues that ‘research that combines qualitative and quantitative methods needs particular attention’ (Brannen, 2005: 175), and regards both approaches as providing ‘data’. The issue of ‘data’ is particularly important as the idea that some observations, particularly quantitative measurements, are ‘given’ renders them unproblematic, and unquestionable. For those who believe in the inductive method, data form the foundational starting point on which robust theory can be built. This chapter argues that qualitative and quantitative data are inseparable, as quantitative data always raise questions of quality, both the quality of the data and the qualities

that are being measured. Both the collection and interpretation of quantitative data depend upon a range of theoretical and commonsense assumptions that are rarely made explicit. Referring to Aaron Cicourel’s (1964) classic work, Method and Measurement in Sociology, it is shown that the concerns that Cicourel identified are, if anything, more relevant now than when he originally examined the problems associated with qualitative data. But it is also suggested that methods of data visualisation that are now available make it possible to understand the issues without going into a great deal of computational complexity. The chapter exemplifies the arguments in relation to a small selection of data on higher education taken from the database of the UNESCO Institute for Statistics. In contrast with the idea that data can provide a sound foundation upon which theorising can develop, more than 50 years ago Cicourel (1964) pointed out that measurement (in which he included dichotomous divisions and simple labelling of categories, which

Mixed Methods in Education: Visualising the Quality of Quantitative Data

would usually be described as ‘qualitative’) requires a theoretical basis. In the absence of a formal theory which explains the division of observations into clusters (these observations are ‘the same’ while those are ‘different’), the researcher depends on the implicit theories and common-sense views which the researcher holds in common with those who collect or create the data. If such implicit theory is not examined, then the resulting process, in which it is merely assumed that what is being measured corresponds to the numbers allocated, is described by Cicourel (1964: 12) as ‘measurement by fiat’. By criticising observations and measurements that lack theoretical support, Cicourel seeks to bring closer a time when such implicit theoretical assumptions have been thoroughly examined and made explicit. This would mean that the theoretical foundations of measurement were more soundly based. However, we appear to have made relatively little progress in the direction that Cicourel proposed; indexes of happiness, social justice, institutional efficiency and the like abound, with little discussion of whether there are good theoretical reasons for thinking that the components of such indexes can legitimately be combined into a single measure. So long as the measurements can be reliably repeated, it is assumed that they must be measuring something. We are, therefore, in a situation which Cicourel addresses in some detail, where we are in danger of assuming that measurements are robust merely because the numbers are relatively well behaved in statistical terms: One dangerous consequence of measurement by fiat is that the measurement scales assume logical relationships which may not correspond to our implicit theories. Ideally, we would like our theories to generate numerical properties that would correspond to the measurement scales and their postulates. Our implicit theories do not generate numerical properties except after they have been transformed into explicit theories: after the language of measurement has imposed some measurement scale or set of logical relationships or some set of arbitrary or semi-theoretical categories upon them. (Cicourel, 1964: 28)

179

For the purposes of illustration, I have selected a few variables from the UNESCO database held by the UNESCO Institute for Statistics relating to education (UNESCO, 2017). I have chosen the proportion of students in university who are studying in each of seven different areas of study. In addition, I have selected five countries from the many available. This is not a random selection; the small sample of data shown here is picked with the specific purpose of examining key points as clearly as possible. However, those points are relevant to all data in all data sets. Cicourel’s central contention is that such statistics are collected with a purpose, and if the researcher fails to examine that purpose critically then he or she is likely to be adopting an implicit theoretical framework, ‘by fiat’, that may or may not fit the purposes of the researcher. The division of university disciplines into ‘sciences’ and ‘non-sciences’ has a long history in the Anglo-Saxon world, but not in continental Europe and many other parts of the world (Snow, 1959). More recently, such divisions between cognate areas have had increased emphasis placed upon them as governments have adopted funding mechanisms that distinguish between different fields in terms of the resources required to teach them. Typically, laboratorybased subjects, such as science and engineering, have attracted higher levels of funding. This creates some contested boundaries, such as whether some areas of experimental psychology should properly be classified as ‘science’, and whether parts of mathematics or graphic design, which require considerable computing power, should be classed as engineering. The problem is compounded when many national governments adopt policies to promote science, technology, engineering and mathematics (STEM subjects) (UK Parliament, 2012: Chapter 2). Such bureaucratic concerns will affect the definitions that those reporting the statistics will employ, and there is no reason to suppose that such definitions will be common across all countries and cultures.

180

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Figure 10.1  Education statistics for Australia

The practical and common-sense assumptions used in creating the data become problematic in the absence of explicit, shared theory across different national settings. Different actors may have strong and vested interests in producing categorisations that serve particular interests. Consequently, although the data selected from the UNESCO Institute for Statistic Data Centre is undoubtedly quantitative, we are faced with at least two categories of question which are qualitative: 1 To what extent do the data reported reflect the implicit assumptions of those reporting the data and are those assumptions uniform across the whole data set? 2 Assuming that those implicit assumptions allow the definition of an ideal measurement, to what extent do the reported data approximate to that theoretical ideal?

These questions focus on issues of the quality of quantitative data in both senses – the qualities that the measurements represent, and the accuracy of the measurement of those qualities. But the qualitative questions

imply quantitative answers, in the sense that, ‘To what extent…?’ invites a quantitative response, whether that is 40 per cent or 80 per cent, or a little or a lot. The qualitative and the quantitative are therefore inextricably linked. But these observations also invite us to start thinking of quantitative values as composed of separate components. In its simplest terms, there are data, or the ideal measurement that would theoretically be produced by applying measurement and observation as accurately as possible, and there is ‘error’, which arises when the procedures of measurement fall short of the ideal standard. In this connection I want to draw attention to Figures 10.1 and 10.2, and the contrast between the data for Australia and Kyrgyzstan. In the data for Australia, we can see trends, slow changes from year to year, following an increasing trend or a decreasing trend. Sometimes, as is the case for the percentage of students studying Social Sciences, Business and Law, the trend rises and then falls, but the process can be seen to

Mixed Methods in Education: Visualising the Quality of Quantitative Data

be smooth, with little variation around the main trend. This feature of the Australian data is seen more clearly if we contrast with the case of Kyrgyzstan, where data exhibit random movements up and down around whatever the trend may be, while the trend itself is more difficult to discern. Simply by looking at the data we can see that the random variations are minor or negligible in the one case and dominate any underlying trend in the other. This observation highlights the first point, that an extensive data set combined with visualisation software can make qualitative differences in the data immediately apparent. This is a process which can best be described by the term ‘eyeballing the data’, a process of making a primary assessment of the data by simple inspection. In the case of differences between Figure 10.1 and Figure 10.2, this contrast between data that show distinct trends (Figure 10.1) and data where the random variations swamp any underlying trends (Figure 10.2) are obvious. With training and experience, these judgements can be made between data where the characteristics are less clear. One trick I learned as an undergraduate student was to lay the paper

Figure 10.2  Education statistics for Kyrgyzstan

181

down and view the data from an angle, viewing along the trends in the data, as shown in Figure 10.3. With experience, even quite small deviations from a main trend can be identified. I was taught to use this technique to assess how successfully a smooth curve had been drawn through a series of data points, as quite small changes in trend can become very obvious. It should be noted, however, that there is no a priori reason for dividing the data into ideal measurements and error in this way. It is perfectly possible that the percentages of students studying different subjects in different years in Kyrgyzstan actually do show dramatic fluctuations from year to year. It is no less possible that the percentages in Australia also fluctuate dramatically from year to year, and that the errors in measurement systematically obscure those short-term variations. But our common-sense, implicit assumptions about education lead us to presuppose continuity rather than discontinuity. Without such presuppositions, interpretation of the data would be impossible, whether those presuppositions are introduced in defining the measurement scale, or, as here, introduced afterwards in the interpretation phase.

182

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Figure 10.3  Rotating an image (Australia, Figure 10.1) to see trends and errors

It is important to recognise the seductive nature of language here. By adopting the statistician’s language of ‘error’, we are more inclined to think of a separation between the data and something that is really going on behind the mask of measurement. Error can be dismissed, leaving behind the quantity that we are really interested in. This process helps to reify concepts that were previously rather vague and disparate into something that appears very concrete. This is what Cicourel means when he refers to the language of measurement imposing a set of logical relationships on data where they do not necessarily belong. Figures 10.4 and 10.5 make the same point; the differences between smooth trends and random variations are again quite obvious, with the data for Latvia looking similar to that of Australia, while the data from Malaysia look more like that from Kyrgyzstan. Each data point can therefore be seen as being composed of two components, one predictable and explicable, and the other random and unpredictable, and visualisation of

the data can help us to distinguish between those two components, an important step in evaluating the quality of the data. It must be borne in mind, however, throughout the discussion that this is both a common way of talking about quantitative data and, at the same time, a perspective that is steeped in common-sense ideas, or implicit theory, about how we expect systems to behave. To the extent that the division of data into components is a useful tool, the techniques are valuable and should be well understood by all who engage with quantitative data whether as the producers or the consumers of research. To the extent that such interpretation depends upon implicit understandings, researchers should also be critical of quantitative measures, and most importantly should not take data as unquestionable. It is also possible to distinguish between two sub-components of each of those components. This is made clear by looking at each component, the predictable and the unpredictable in turn.

Mixed Methods in Education: Visualising the Quality of Quantitative Data

183

Figure 10.4  Education statistics for Latvia

Figure 10.5  Education statistics for Malaysia

RANDOM COMPONENTS The first thing to be clear about is that large systems behave differently from small systems in relation to random variations. To take a ridiculously small system for illustrative purposes, we might imagine a system with only seven students, one in each area of study. That would mean that 14 per cent of

students were in each subject area. If just one student transferred subject area, then the percentage in that area would fall to zero, while the proportion in the other area would rise to 28 per cent. That is to say, changes can only be registered in steps of 14 per cent, and smooth trends are impossible. In contrast with this, a large system with thousands of students would show only infinitesimal

184

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

changes if one or two students moved from one area to another. In general, therefore, we might expect small systems to exhibit greater volatility, or larger, apparently random movements around an underlying trend. Statistical tests register this difference by indicating that changes in small samples have to show larger movement to reach statistical significance. Relatively small changes in large numbers can be statistically significant, while much larger changes in small numbers can be more likely to occur by chance. The overall size of the respective systems of higher education can be judged from the total number of graduates from each in 2013. In that year, 22,000 students graduated in Latvia, 50,000 in Kyrgyzstan, 237,000 in Malaysia, and 402,000 in Australia. On that basis, we would expect Latvia and Kyrgyzstan to show greater volatility than Malaysia and Australia. This would explain some of the off-trend movement in the figures for Kyrgyzstan but does not explain the volatility in the figures for Malaysia. In looking at the figures for Malaysia, we need to consider a second source of this random variation, namely random measurement errors. This is a clear indication of the quality of the data. The difficulty is that it is not possible by visual inspection to distinguish between these two sources of random error. But comparing the various systems that we are looking at here, we can see that small systems, like Kyrgyzstan and Latvia, which we might expect to show greater volatility, can be either of high quality, as is the case for Latvia, or low quality, as is the case for Kyrgyzstan. Conversely, we would expect large systems, such as Australia and Malaysia, to show little off-trend variation, as is the case for Australia. Overall, then, we are able to examine the data visually, and conclude that the Australian system and the Kyrgyzstan system behave roughly as we would expect, with the larger system showing trends more clearly, and random variation about those trends being

less pronounced. By contrast, Kyrgyzstan shows greater random variation about trends, which is understandable for a small system. Comparison with those two systems, which might be considered as some sort of a benchmark, indicates that the quality of data from Latvia is remarkably good, while the data from Malaysia are relatively poor. Overall, then, we can see that, by applying deductive reasoning on the foundation of what we know about large numbers and small numbers, we are able to separate out, theoretically, two elements in the random variations in the data, one of which is explicable in terms of how small systems function, and the other which is not. We are left attributing the unexplained variations to measurement error, which may be unfortunate, as some texts describe everything that cannot be explained as ‘error’. There is, however, the parallel problem, that the judgement about the size of the sample considered, and the consequent smoothing of trends, will be taken, on its own, to be an indication of the quality of the data. If we look at the progress that has been made in the quantitative measures available to comparative educationists since the 1970s, then one would be bound to conclude that the data have improved dramatically. More variables are collected, over larger samples, so that the measurement errors can be assumed to have been reduced. If we look, for example, at the data on school performance collected by the OECD PISA programme, the quality and availability of the data are beyond comparison with anything that could have been wished for in the 1970s. But in one sense, this progress has been purely technical, and atheoretical. Looking at measurement in sociology in the 1960s, Cicourel (1964) argued that the theoretical foundations of measurement were under-developed. After all, it only makes sense to add together all students in higher education in the Social Sciences, Business and Law if we believe, from some theoretical perspective, that all students in

Mixed Methods in Education: Visualising the Quality of Quantitative Data

the Social Sciences, Business and Law are in some way equivalent. This qualitative identification of a class of actors, actions or events that are equivalent is at the heart of any measuring system. On most occasions, a measurement is an attempt to quantify, and therefore clarify or make more precise, an idea which is implicit and rather vague. Thus, an everyday concept such as ‘intelligence’ becomes quantified and operationalised in the measurement of IQ. The same will apply to many measurements in education, from literacy to attendance in school. But in the common-sense and vague meaning of those terms, the former can range from decoding a system of symbols to active engagement with reading literature for pleasure, while the latter can range from bodily presence in a building to the pupil being engaged with everything that is presented to him or her. There is the danger that by assigning a number to the concept a spurious sense of certainty will be produced. In the process of quantification, such implicit groupings of equivalent events should be examined and improved upon to provide a theoretical basis for measurement and for quantification, and it was this improvement of theory that Cicourel advocated and hoped for. Instead, over the past 50 years we have focused more on the technicalities of measurement, seeking to increase sample size and reduce measurement error, so that the data appear well behaved, and tests for statistical significance can be applied more readily. The result has been, to a great extent, that far from developing and elaborating our concepts, such as intelligence, by the addition of the concept of IQ, we have created new concepts that are not theoretically attached to our original concepts. It is neither particularly original, nor particularly surprising, to hear it argued that a person with a high IQ can be remarkably stupid. We are left looking at measurements that have no very strong foundation in theory. As Cicourel (1964: 29) notes, ‘The danger remains, however, that the vocabulary

185

[of measurement] will displace the search for the theoretical rationale for classification that assumes reflexiveness, symmetry, transitivity, and the other properties basic to a measurement system’. For us, that danger is ever more present as we see the sprouting of measures of performance, personality and attitudes which are advocated on the grounds that they are reliable, or statistically well-behaved, rather than having any claim to be rooted in theory. We need to be returning our attention to the question of how measurement can assist in the development of robust theory. Turning attention to the underlying trends, which might be expected to be more predictable, it is also possible to discern two underlying influences.

COMPONENTS IN TRENDS This component can also be divided into two sub-components: those that can be related to and explained by other variables in the database, and those that can only be related to and explained by looking for explanations outside the database. For example, if we look at the data for Australia, we can see that the proportion of students studying programmes in Health and Welfare has risen fairly consistently over the period for which we have data. We could therefore look for variables in the database that also showed a rising trend to see whether there are correlations that might be associated with, and by implication explain, the rise in student enrolments in this sphere. Really, this is the classic approach to statistical methods in this context. We are seeking a regression equation in the form: Y = a 0 + a 1 ⋅ X1 + a 2 ⋅ X 2 + a 3 ⋅ X 3 +  + a n ⋅ X n

where Y is the variable to be explained (in this case the proportion of students studying Health and Welfare), ai is a sequence of numerical coefficients, and Xi are variables that can be used to explain Y (which might be

186

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

such things as the graduation rates from secondary education, demographic factors that indicate an ageing population, and therefore greater demand for health and welfare specialists, or investments in tertiary education). The ideal, for this sub-component of the data, is that put forward by Noah and Eckstein (1969), namely that the selection of the series X should be based entirely on statistical criteria – a variable being included if it contributes to the explanation of variance in Y, and omitted otherwise. As an aside, the importance of a variable X in explaining Y is generally taken to be the proportion of the variance of Y which is associated with variable X; explaining more variance is taken to indicate importance. The problem with this is that there is not a unique way of calculating the series X and, generally speaking, variables that are put into the regression early account for more of the variance. This is a question that I have dealt with elsewhere in relation to Gilbert Peaker’s analysis of the IEA (International Association for the Evaluation of Educational Achievement) studies in the 1970s (Turner, 2004: 38–42). In this connection it is only necessary to remark that

the order in which items are entered into the regression is arbitrary, and therefore based on non-statistical criteria. In Peaker’s case this involved the decision that, ‘early events can affect later ones, but later ones cannot affect earlier ones’ (Turner, 2004: 41). This may be a reasonable assumption on one view of the world, but it is not based on the statistical qualities of the data, and is therefore what we might describe as ‘ideological’. In this context, I do not mean ‘ideological’ to be seen as pejorative; it simply means that the methodological decision is made a priori, before the data has been examined. How the data can be, and should be, analysed, and therefore interpreted, depends upon our implicit theories about the phenomena we are observing, unless, of course, those implicit theories are critically examined and made explicit. Turning to trends that can only be explained by looking outside the database, we might consider the pattern exhibited by the system of New Zealand (Figure 10.6). In 2003 there is a sudden increase in the proportion of students studying programmes in Social Sciences, Business and Law. At the same time, there is a corresponding sharp drop in the proportion of students studying Humanities and Arts. This sudden change in the distribution

Figure 10.6  Education statistics for New Zealand

Mixed Methods in Education: Visualising the Quality of Quantitative Data

is maintained in 2004, after which the change is reversed, and both series of data revert to trend lines that correspond to what had happened in 2002 and before. This does not look particularly like random error, as the data, apart from those two sudden changes, have all the indications of being good quality. And it seems unlikely, but not impossible, that such a result could have been brought about by a sudden change in government policy and its subsequent reversal, for example, a change in fee policy. On the face of it, it is more believable that something entirely outside the database has had an impact. For example, one group of programmes that stand in an ambiguous position between the Humanities and Arts and Social Sciences, Business and Law (for example, Architecture or History) were reclassified from one cognate area to the other. A reversal of that policy two years later would account for the change back. This is speculation, of course, and would need a considerable effort in following up in order to be clear about exactly what had happened here. But it is speculation based on experience, assumptions and expectations about how systems, the sort of systems that actually collected these data and passed it on to UNESCO, function. Even picking out what needs to be explained, what in the database is particularly interesting for one reason or another, involves reliance on a host of implicit background theories, which examination of the data should help us to make explicit. On many occasions looking at the data may stimulate a speculation about relationships that are not immediately obvious. The first recourse will be to look at other items in the database to see whether an explanation can be found. (For example, changes in the law related to the length of compulsory education clearly have a major impact on the level of enrolments in schools for the appropriate ages.) What is in the database and what is not will be contingent on many things, and to a great extent is purely accidental, so the distinction between trends that can be

187

explained within the database and those that need recourse to data that are outside the database is largely arbitrary. However, the distinction is useful in drawing attention, once again, to the idea that in evaluating data the researcher must often rely on information, interpretations and intuitions that are not strictly quantitative or statistical. Moreover, this discussion of the patterns discerned in the data for the system in New Zealand again highlights the way in which simply looking at the data in an appropriate form can raise questions about the quality of the data. In this case the question is not whether the data are good or bad, so much as whether the categories are appropriate, and what exactly is included in each. Does it make sense to group Law (a traditional university subject) with Business (a more modern or recent concern in academia)? What exactly is included in each grouping? And is the grouping properly represented by its label? Is it clear where the division between Humanities and Social Sciences is to be drawn? There should be more behind such groupings than simply whether the numbers can be reproduced reliably. Looking critically at the data can help the reader of data to explore the underlying assumptions that are used in defining the measurement model that results in the data that we find in such databases as those held by the UNESCO Institute for Statistics. But the closer one looks at these questions, the clearer it becomes that even such critical judgements are based upon assumptions about the way that data are supposed to behave. Critically for this discussion, we implicitly assume that systems involving a large number of people will not exhibit large discontinuities. But this is itself an assumption that, on another occasion, might be the subject of examination. As Cicourel notes, without any underlying theoretical basis to support our system of measurement, the interpretation of quantities becomes a problem in the sociology of knowledge (Cicourel, 1964: 7).

188

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

BLURRING THE EDGES In the preceding passages, I have distinguished four elements that contribute to any data point in a series of data in an educational database. There are trends that we might hope to explain by looking at other educational data, or closely related economic and demographic data, that we might expect to be included in the database. Then there are trends which we might expect to be explained but would involve remote connections or connections that are unlikely to have been monitored in the database. We might think of these as shocks that are entirely extraneous to the educational system, although, as the discussion makes clear, they might include changes in the definitions used and other intrinsic sources of change that have escaped immediate attention. Then there are the random errors in the data, or sudden and unexpected changes away from a perceived trend, first one way and then the other. These may result from measurement error, or they may arise in small systems because the numbers involved make such volatility more likely. With appropriate experience, the human eye can be remarkably good at distinguishing between those changes that follow trends and those that are random, and I have suggested that the ratio of trends to errors can be seen as an indication of the quality of the data. (Actually, the human ear is also remarkable at distinguishing between trends and errors, or in this case what information engineers refer to as the signal to noise ratio. We have relatively little difficulty distinguishing between what Beethoven intended us to hear and the random errors, or hiss, that has been added by the recording medium or radio transmission.) But, again, the distinction cannot be maintained in any absolute sense. A competent official might well be able to produce a fictitious series of data that look good enough to fool an observer. Indeed, if we look at the case of Cyril Burt, it has been argued that

he falsified his data by reporting results that were too good to be true (Joynson, 1989). Overall, then, there is a great deal to be said about data, the relationships to be found in data, and what can be learned simply by looking critically at data in an appropriately visualised format. Qualitative and quantitative methods meet at this point, where researchers are invited to judge the quality of quantitative data. These are judgements about the shapes, forms and patterns that can be discerned in data. And if these judgements can be made, and if they can be improved by practice, it follows that researchers and those who read the results of research should be given full preparation in such techniques, so that they are in a position to make judgements about the inferences that are being presented to them. Any use of quantitative data will require judgement on the part of the researcher, as the researcher chooses to focus on some aspects of the data and to dismiss other features as ‘random errors’, or as produced by extraneous shocks that are not relevant to the matter being researched. In this sense, any use of data is selective, and this needs to be recognised in the way that research is presented and received. Researchers need to make a plausible case for why they were right to focus on the elements of data that they chose, while readers of research need to be able to recognise how that focus has been produced and make their own judgement about the plausibility of the decisions made by the researcher. These are judgements that the experienced user of quantitative data is making all the time, linking through in a complete process, from the implicit assumptions to the derived conclusions. The real hazards arise when the tasks are divided between the quantitative researcher and the qualitative reader, when, without looking at the data, it is possible that implicit assumptions remain unexamined. It is for this reason that the possibility of visualising the data, and of forming judgements about the whole process from measurement to inference, even without a great deal of statistical sophistication, is extremely important.

Mixed Methods in Education: Visualising the Quality of Quantitative Data

SPARSE DATA SETS There is a further complication to interpretation when one looks at sparse data sets, such as the educational data held by the UNESCO Institute for Statistics. A sparse data set is one in which there are relatively few instances (in this case countries) in comparison with the number of variables about each of those instances (in this case anything from school enrolments, investment in education, progression rates and resources). If we go back to the earliest serious use of statistical data in comparative education, the IEA studies of science and mathematics performance in approximately 20–30 countries, these were not particularly sparse data sets. Although there were few countries involved, the IEA collected data on a very limited range of variables which they thought to be of direct relevance to student performance in those studies. The data were then used to construct regression equations, as specified above. The disappointment with those early studies (if there was a disappointment) was that only a very small proportion of the variance in academic achievement (typically between 5 and 10 per cent) could be accounted for using such regression techniques. It was possible then, in the 1970s and 1980s, to imagine a time when more data, and better data, had been collected, and it would be possible to account for 80 per cent, 90 per cent, or perhaps even 100 per cent of the variance in academic achievement. After all, more must mean better, and accounting for 100 per cent of the variance in academic performance would mean that, at last, we would know everything about exactly what goes on in school classrooms. We can now move forward to 2017, when anybody can access the UNESCO Institute for Statistics database. This holds data on roughly 200 countries, so the number of instances has increased dramatically. But the corresponding number of variables has increased even more dramatically, reaching nearly 2,000 over nearly 20 years. The UNESCO database is, therefore, now a sparse data set.

189

One of the important features of sparse data sets is that it becomes increasingly easy, with the kind of computational support that has also been developed between the 1970s and now, to construct regression equations that will account for more and more of the variance in chosen dependent variables. Suppose that we restrict ourselves to one variable, the proportion of students studying programmes in Social Sciences, Business and Law, then with roughly 2,000 variables at our disposal it should be possible to account for all the variation that we observe in terms of the other variables in the database. As a brief aside, let me stipulate that the 2,000 variables used to construct the regression will exclude the proportions of students who study programmes in other areas. There is a trivial and not very interesting solution that can be found to the effect that students who are not studying Agriculture, Education, Humanities and Arts, and so on, must be studying Social Sciences, Business and Law. This will probably be the simplest way of accounting for fluctuations in the variable of interest, and cannot be ruled out on purely statistical grounds, but pursuing this line of research will be about as valuable as distributing a questionnaire in the hope of identifying married bachelors. There are some points where logic is of greater importance than statistical accuracy. Coming back to the less obvious solution, it should be possible, in a sparse data set, to construct a regression equation that accounts for nearly all of the variation in the dependent variable, and that includes all of those up and down movements in the data for Malaysia and Kyrgyzstan that we have previously dismissed as random error. When we set out on this path in the 1970s, we agreed that more would mean better, and that the ability to construct such a perfect regression was the holy grail of quantitative research. Now that we can achieve it, we need to reflect on whether it was really such a good idea after all. If we can construct a regression which involves hundreds of ‘explanatory’ variables,

190

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

but which is impossible to interpret in a way that makes sense in terms of the mechanisms that we can envisage operating, is it valuable to do so? Statisticians describe such a process of creating a regression that is so good it even accounts for some of the random errors as ‘over-fitting’. And it turns out that over-fitting not only takes up a lot of time, computer power and energy, but it is actually worse at predicting future trends than a regression that is less well fitted. So, the question immediately arises for us, as it did not arise for the pioneers in this field in the 1970s, how far should we go down this route? When is a regression good enough? There is no simple answer to this question. Most obviously, the answer that I suggested before, that more is better, is inadequate. The answer will require judgement, and it will be a judgement about matters that are not intrinsically statistical. It will involve a judgement about which variables are plausibly linked to an explanation, or about the underlying mechanisms that the researcher, or reader, thinks can be used in constructing an explanation. That is to say, it will be a judgement that is based on theoretical considerations and will therefore be dependent on how good the theories are that are embodied in the data.

QUANTITATIVE AND QUALITATIVE The focus of this account has been the use and visualisation of quantitative data. But discussion of that data has taken us beyond the quantitative and into the qualitative in at least two ways. Looking at the data, and the quantities involved, has been used to give an account of the quality of the data – whether it is good, bad or indifferent – by considering the relative importance of underlying trends and random errors that can be seen there. But discussion has not merely related to the quality of the data, but also to its qualities, and whether certain features of the data can be explained by thinking about such things as the size of the sample involved or the plausibility of underlying mechanisms connecting variables.

The crucial point here is not merely that there is no sharp division between the quantitative and the qualitative, although that is certainly true. The relationship between the qualitative and the quantitative works in both directions. Just as the quantitative raises questions about the qualitative, so the use of qualitative data, such as the single case study, necessarily raises questions about the quantitative, and the probability that such a phenomenon will be repeated in similar circumstances. So, the qualitative approach is closely linked to the quantitative approach, to the extent that it should not be possible for a researcher merely to assert that they have adopted the quantitative method, or that they have adopted the qualitative method. Equally impossible should be the claim that they have adopted mixed methods, by using both qualitative and quantitative methods in a single study. All methods must necessarily imply using both qualitative and quantitative approaches to some extent or another. The central point that should be clear from examining quantitative data is that there can be no sense in asserting that one is using the quantitative method, as there are many quantitative methods: one can plot graphs or draw up tables, one can perform linear regressions or non-linear regressions, use ­ frontier analysis, or probit models. Each of these approaches is different, and choices have to be made between the many and various quantitative methods that can be applied. The discussion of data in this chapter has merely addressed the preliminary examination of the data in an appropriate format in order to provide some sort of basis for those subsequent methodological decisions. In much the same way, it makes no sense to claim that one is using the qualitative method: individual interviews, group interviews, focus groups, questionnaires, observation, participant observation and action research are all qualitative methods, and some of them may lead to quantitative analysis. And observations and the categories into which they are grouped may depend on some of the issues that have

Mixed Methods in Education: Visualising the Quality of Quantitative Data

been discussed here. If trends are small and errors are large, it may make little sense to examine only a few cases, as the results are likely to be dominated by the errors and drawing valuable conclusions may not be possible. There really is no substitute for a preliminary examination of the data to give some indication of what the data might be useful for, or which additional studies might be fruitful. The argument here has been based on Cicourel’s (1964) critique of methods of measurement that fail to provide an adequate link between the theoretical basis for the categorisation of events as the same or different, a theory of measurement that builds upon those categories and the uses to which the quantified variables are put. The case could as easily have been made on the basis of Toulmin’s (1958) work on the nature of arguments, and the requirement that there should be a warrant that links the eventual claim made to the evidence that is provided in support of the claim. In the absence of such a warrant, the link between what the evidence is and what it is purported to show is insecure, and can only be established by assertion, or as Cicourel puts it, ‘by fiat’. To draw sound conclusions on the basis of evidence, the strand of argument needs to be linked through from initial measurement to final conclusion, and both quantitative and qualitative considerations must be taken into account. As Gorard (2010) argues, an attempt to set up quantitative and qualitative approaches as somehow oppositional or providing contrasting paradigms is fundamentally mistaken.

CONCLUSION The 1960s and 1970s saw a massive increase in the use of quantitative methods in studies related to education. Statistical analyses related with the Coleman (1966) and Plowden (1967) Reports pioneered new techniques which were then used in the IEA studies and other work. In that early period, most of the

191

data were relatively poor, and involved very restricted data sets with few variables. As a result, it was possible to explain only a small amount of the variance in any dependent variable. But quantitative methods created their own mythology at that time; with improved techniques and better data, it would eventually be possible to understand all the relationships between all of the variables in education, identifying important relationships purely on the basis of statistical measures, such as tests of statistical significance. That is to say, when the techniques were still in their infancy, indeed because they were still in their infancy, quantitative methods held out a vision of the future in which judgements about data would be unnecessary, and conclusions would be objective. This hope for the future drove a movement that maintained that every student and every researcher would have to understand quantitative methods, and that every postgraduate student should follow a tedious course on quantitative methods, which would result in them being able to calculate the standard deviation of a set of data, but would ensure that they had no real feeling for what might be meant by ‘the quality of quantitative data’. Since those early days, statistical methods have come on by leaps and bounds, developing multi-level analyses implemented on ever faster computers. But the hope of interpreting those analyses in simple and intelligible terms has receded as fast as the techniques have developed. And among the latest developments in the field are the creation of computer programs that make visualisation of data relatively easy. And this raises new hopes that it may be possible for students and researchers to gain a more intuitive insight into quantitative data. These changes have created an important new context for educational research. We can now see, as explained above, that improvements in data collection have led to the development of sparse data sets, and that these data sets have characteristics that the pioneers in the 1960s and 1970s never considered. In an environment where the data are very poor,

192

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

one can hope for a future in which there is better data to analyse, without ever worrying about what one could, or could not, do with all that data. Now that we are living that future, and we could explain all, or nearly all, of the variance in a variable, we need to start worrying about whether we should do that. And this is a debate that cannot be limited to quantitative researchers. All researchers need to have the skills to develop and express an opinion on the issues. But such skills are not developed automatically. We need now to be thinking seriously about how researchers and consumers of research need to be prepared in order to improve the overall quality of research. I have deliberately avoided introducing detailed discussion of quantitative methods in this chapter. Implicitly, many statistical concepts have been introduced that could be given a mathematical treatment. But my concern is not exactly with the quantitative or the qualitative, but what I like to think of as qualitative ­mathematics – the general shape of mathematical forms. These might be rising trends, falling trends, or errors (which require a mathematical treatment all of their own). In much the same way, Euclidean geometry is about geometrical shapes, but does not admit the use of measurements with a protractor or ruler. It is that kind of feel for what data are doing that we need to provide for our students. It also needs to be noted that the drive to improve data has been, to some extent, at the expense of developing theory. We have prized reliability and stability in measurements above validity, or a strong theoretical link to what we hoped to be measuring in the first place. Fifty years ago, Cicourel’s work on measurement and method in sociology, which as Smith and Atkinson (2016) observe, has probably been cited more than it has been read, must have appeared very abstract and distant from the immediate concerns of social scientists. Today, when huge advances have been made in data collection, statistical analysis and visualisation, Cicourel’s concern

about the dangers of quantitative measures that have come loose from their theoretical moorings seem prescient, and it may well be time to go back to re-examine the basis on which measurements are made. The publication of ‘data’ is taken up avidly by the popular press, whether that is PISA scores or university rankings, without any thought being given to the measurements that are being used. Whether there is any theoretical justification for taking movements of academics and students into a small state like Singapore and grouping it together as the ‘same thing’ as movements of academics and students into a large country such as France or the US is rarely questioned. Whether a score on a particular reading comprehension test on scientific issues should be counted as a test of scientific ability is a question that is not raised by commentators, most of whom have not read the tests that are being used by PISA. All that is of popular concern is how well the particular system in which we feel we have a stake has done. These are examples, although by no means the only examples, of measurements that have become completely detached from their theoretical foundations, or as Cicourel would put it, ‘measurement by fiat’. Most scholars engaged in comparative education recognise the sterility of the current developments, but the way to progress is less clear. Above all, any suggestion that there are two ways of conducting research that are in opposition, the qualitative and the quantitative, is a major obstacle to relocating measurement on firmer theoretical underpinnings. What is needed is a concerted look at measurement from the perspective of both qualitative and quantitative research. If the qualities and categories in which events, actions and people are grouped are inadequate, the philosophical examination that can improve measurement is bound to be inadequate. The story is told of a curator at the National History Museum who explained to a group of visitors to the dinosaur exhibit, ‘This dinosaur skeleton is eight million and six years

Mixed Methods in Education: Visualising the Quality of Quantitative Data

old’. ‘That is extraordinary’, said one of the visitors. ‘How can you be so precise?’ ‘Well’, replied the curator, ‘When I started work here they told me it was eight million years old, and that was six years ago’. We need to work for a time when everybody who works in education understands why that is funny, even if they do not understand the difference between expressing a variable to one significant figure and expressing it to seven significant figures. Perhaps not very funny but quantifying that may take longer.

REFERENCES Brannen Julia (2005) Mixing methods: The entry of qualitative and quantitative approaches into the research process. International Journal of Social Research Methodology, Vol. 8, No. 3, pp. 173–184. Cicourel, Aaron V. (1964) Method and Measurement in Sociology. New York: The Free Press). Central Advisory Council for Education (England) (1967) Children and their primary schools. London: Her Majesty’s Stationery Office. Coleman, James S. (1966) Equality of educational opportunity. Washington, D.C.: US Government Printing Office for the Office of Education.

193

Gorard, Stephen (2010) Research design, as independent of methods. In Abbas Tashakkori & Charles Teddlie (Eds.), Sage Handbook of Mixed Methods in Social & Behavioral Research (second edition). Thousand Oaks, CA: Sage. Joynson, R. B. (1989) The Burt Affair. London: Routledge. Noah, Harold J. & Eckstein, Max A. (1969) Toward a Science of Comparative Education. London: Macmillan. Smith, Robin James & Atkinson, Paul (2016) Method and measurement in sociology, fifty years on. International Journal of Social Research Methodology, Vol. 19, No. 1, pp. 99–110. Snow, Charles Percy (1959) The Two Culture. London: Cambridge University Press. Toulmin, Stephen E. (1958) The Uses of Argument. Cambridge: Cambridge University Press. Turner, David A. (2004) Theory of Education. London: Continuum. UK Parliament (2012) Science and Technology Committee – Second Report: Higher Education in Science, Technology, Engineering and Mathematics (STEM) Subjects. London: UK Parliament. Available at: www.publications .parliament.uk/pa/ld201213/ldselect/ ldsctech/37/3702.htm (accessed 5 April 2017). UNESCO (2017) UIS.Stat. UNESCO Institute of Statistics Data Centre. Available at: http:// data.uis.unesco.org/ (accessed 14 February 2017).

This page intentionally left blank

PART III

Research Practices in Comparative Studies of Education

This page intentionally left blank

11 Growth and Development of Large-Scale International Comparative Studies and their Influence on Comparative Education Thinking Larry E. Suter

INTRODUCTION The general purpose of formal educational systems is to transmit cultural knowledge from the older generation to the upcoming cohorts of youth, thereby preparing them for adult responsibilities. The field of educational research explores the wide range of organizational systems and individual behaviors that exist to support the expansion of youthful learning. While some aspects of learning, such as speaking a language, may occur naturally without formal instruction, other domains of knowledge, especially those which were constructed by humankind – such as science, mathematics and ­ technology – require formal and informal forms of instruction for students to become expert.

Therefore, when the fathers of large-scale international comparative studies introduced the idea of measuring student learning through empirical methods of assessment through testing, they opened a Pandora’s jar (some say box) of problems. Some say they may have released evils in the form of unsolved political and educational problems that would torment educational researchers for generations. In the Greek myth, the gods punished humans (because Prometheus had given them fire stolen from the gods) by creating Pandora, who released and spread evils throughout the earth (except for Hope, who remained in the jar). Like the myth, many researchers and administrators feared that cross-national measures of student achievement would release a torment of unfair comparisons between countries

198

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

(Husén, 1979). Once released, the comparisons whether fair or unfair could not be returned to the jar. Today, the field of comparative education applies various methods of comparative analysis to grapple with the consequences and meaning of these studies. This chapter, along with all chapters in this volume, describes the development and conduct of large-scale surveys from their date of inception. The purpose is to provide a critical review of how these studies were intended to contribute to the understanding of educational processes and to place these large-scale studies in the context of the field of comparative education. Issues involved with the data collection, analysis, publication and acceptance of largescale international comparative studies are presented in this chapter in three sections. The first section discusses how the developers of the first large-scale international surveys overcame the challenges of obtaining funding for cross-national studies and the challenges of creating new forms of student assessment that met standards of scholarly research. It begins with the origins of selective large-scale student assessment surveys in the 1950s and documents the forces that interacted to establish them in research centers worldwide. This section is informed by the experiences of a participant in the decision making about the nature and funding of the of these studies from 1982 to 2011. The review is guided by the question of whether the original intent of the founding educational researchers to improve a science of educational research was accomplished and whether the studies were useful in settling questions about educational practices. The second section describes the growth in number and diversity of large-scale surveys between 1960 and 2015, focusing mostly on the largest ongoing surveys of science and mathematics. It discusses the role of the international organizations that managed the cross-national collaboration for conducting the studies, the costs of the studies, the type of publicity received in the public domain, and the criticisms that they faced. It also discusses

and considers specific topics that affected the quality and use of large-scale surveys. The third section continues the analysis of the utility and acceptability of large-scale surveys by the field of comparative education by examining a large collection of research publications found in academic journals, books and governmental agencies between 1960 and 2017. The analysis addresses whether the field of comparative education incorporated largescale surveys of many countries into the development of theory and practical knowledge of educational practices. It will provide new information about which aspects of the studies received academic attention and explore whether the publication evidence shows whether they have become a significant aspect of comparative education as a field of study.

WHAT ARE LARGE-SCALE INTERNATIONAL SURVEYS OF EDUCATION? Large-scale international surveys use samples of students, teachers and adults to produce measurements of achievement, attitudes, educational experiences and career preparation. The surveys include pre-primary, primary, secondary, adult and tertiary (teacher preparation education). The topics of the surveys have included mathematics, science, reading, civics education, computer technology, writing, and adult literacy. The surveys contain items (questions) that have been standardized across languages and cultures specifically for the purpose of comparing differences between economic units and cultures (several chapters in this volume discuss the validity of the methods used). The surveys were created at first to supplement subjective observational descriptions of the conditions of schooling and for conducting research and analyses of causes and consequences of student achievement (Husén, 1967; Martin, Mullis, & Chrostowksi, 2004; OECD, 2009). They are known as ‘largescale’ studies because each ‘study’ engages

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 199

250 200 150 100 50 0

Europe

Americas

Asia

Oceania

East Asia & Pacific

Africa

Figure 11.1  Number of individual large-scale international surveys from all sources by world region: 1980 to 2016

dozens of countries and thousands of students in one common assessment of achievement and backgrounds. Analysis methods require sophisticated statistical and psychometric models to achieve reliable comparisons. The large-scale international surveys did not originate by proclamation by an existing international organization; instead, the studies were developed sui generis by educational researchers who shared a common interest in the study of student assessment and an interest in countries outside their own (Husén, 1983). The organization of the surveys became more formalized in the twenty-first century. By 2016, 33 different comparative surveys that followed the model of the earliest surveys had been administered by the International Association for the Evaluation of Educational Achievement (IEA), the Organization for Economic Cooperation and Development (OECD), UNESCO, the World Bank, and the ASER Centre, India. The total number of countries (and economic units) that participated in large-scale international studies has increased since 1960 to over 125 by 2015. The number of participants has increased at an especially rapid rate since 2000. The first experimental surveys in 1960 were constrained to

12 countries. The total number of participants in surveys since 2010 represents about half of all world countries listed by the International Organization for Standardization (ISO, 2018) and about 84 percent of the world population. Representative samples of an additional 50 units within country boundaries have been conducted for some of the large-scale surveys. (For example, the US has enlisted 18 states, Spain included four regions, China included four areas, and Canada included six provinces.) The total number of separate geographic areas tested is over 200. If each survey is counted for each year and country for which it was administered, the total number of individual surveys reached an accumulation of 1,500 individual surveys by 2015. The number of individual surveys conducted each year is shown in Figure 11.1 for six world regions.

HISTORY OF LARGE-SCALE SURVEY DEVELOPMENT Founding Events Like many topics in the social sciences (such as the increased growth of social sciences following the civil rights movement during the

200

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

1960s), the international research program in education was initiated in response to a crisis brought on by the ‘science scare’ created when the Soviet Union launched Sputnik in 1957 (Husén, 1979). The launching of the first satellite by the Soviet Union convinced many national leaders that the US and Western Europe educational systems had failed (Dow, 1991; National Academy of Sciences, 1997). It was in this environment that researchers in education proposed that an objective method be derived to evaluate the relative achievements of students in different countries. Pioneering researchers, led by education researchers such as Benjamin Bloom, Edward Thorndike, and John Anderson of the USA, Torsten Husén of Sweden, and Bill Wall of the National Foundation for Educational Research in England first proposed that students be assessed with an objective test and that empirical studies be conducted across countries (Grisay & Griffin, 2006; Husén, 1979; Ross & Genevois, 2006). Although standardized testing existed for many prior years, it had not yet been attempted across countries. The originators of the comparative studies were academic researchers in psychometrics, education, and psychology, who were more interested in developing improvements in theories of education than they were in engaging in the politics of international comparisons (Husén, 1979; Ross & Genevois, 2006). They believed that if research about education was conducted only in a single country (the country of residence of the researcher), then other possible policies and practices that do not occur in that country would be unknown. Thus, by using variations in educational practices across the world as a natural laboratory, they expected to be able to identify the reasons for differences in achievement caused by teacher preparation methods, curriculum, study practices, and time spent on study. Searching for these independent factors could produce a new body of scientific observations that might be used to improve understanding of education processes (Husén, 1973). Publications of comparative education

have presented many differing points of view about whether this expectation was achieved (Grek, 2009; Prais, 2003). My own belief is that increases in public attention to otherwise unobserved conditions of schools and instruction has been the main contribution of making invidious comparisons of student achievement across countries. The surveys are more likely to raise good questions than to propose good answers. Over time, the research and development required to mount an international study has sharpened the questions about educational practices and the resulting studies have become more relevant to educational policy. Some of the challenges of large-scale crossnational surveys which were known by the designers at the outset, and which continue to concern users of large-scale survey student assessments, are these: (1) How to define the expected content of a subject matter across countries so that a test would be fair to all countries (Travers & Westbury, 1989); (2) Whether translated test items could be a fair assessment of populations of different historical, cultural and social contexts; (3) Whether differences in curriculum practices from country to country overwhelm the common thread in growth of student learning (Schmidt & McKnight, 1995); (4) Knowing whether tests, in which students respond to a small set of questions written on paper or computer about mathematics, science, or reading, validly represent the student’s ability in that subject area (Berliner, 2015); (5) Knowing whether a score derived from a self-administered test provides sufficient information to be useful to an instructor attempting to assist the students learning (Rutkowski & Rutkowski, 2009); (6) Knowing whether the results of an achievement test in specific subject areas and designed to be used internationally provide an accounting of subject matter achievement that is meaningfully different from general ability, aptitude, ‘G’ or ‘intelligence’ (Baumert, Lüdtke, Trautwein, & Brunner, 2009; Rindermann, 2007). These methodological and philosophical topics are general concerns for all large-scale international surveys. Without knowing the

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 201

answers to any of these questions, the founders of the studies implemented a series of studies (‘experiments’) to learn by experience whether international comparisons of educational achievement were possible. The following paragraphs describe the steps taken by scholars and international agencies to achieve the breadth of large-scale surveys available today. Since the methods for conducting student assessment surveys was not common knowledge in academic research or governmental agencies, the establishment of the international large-scale studies has provided a training ground for data collection, administration and analysis (Lockheed, 2011).

First-generation IEA Studies The first generation of large-scale international studies of student achievement was carried out during the 1960s using the methods of sampling and student assessment that were avail­ able at that time and which had been applied in two major national studies: The US study of Equality of Educational Opportunity by Coleman and associates (1966) and a British study of primary education by Plowden and the Central Advisory Council for Education (1967). The leaders of these international experiments in education research were academic researchers in several universities (University of Chicago, Columbia University, and the University of Stockholm). In 1967, these research directors formed an association of educational research institutes called the ‘International Association for the Evaluation of Educational Achievement’ (IEA) to continue research on methods for conducting comparative surveys of student achievement (Husén, 1996). Today, the IEA is an organization with 60 members (institutes, government agencies, and universities) from around the world and is headquartered in Amsterdam (International Association for the Evaluation of Educational Achievement, 2017). The founders of large-scale student assessment studies were initially concerned that their ability to measure student achievement might

be insufficient and the experiment would fail. Possibly, they thought, student achievement in different cultures was unmeasurable (Husén, 1996). The founders, such as Anderson, Thorndike, Husén, and Bloom, realized that they had not created a completed model of assessment and learning (Foshay, Thorndike, Hotyat, Pidgeon, Walker, 1962). In fact, they openly engaged in scholarly debates about the meaning and value of rankings and comparisons (Husén, 1983). For example, in 1973, following the release of one of the early studies, Husén reflected on the status of the comparative education venture in a preface to the first IEA Science study. These paragraphs, succinctly written, provide insight into the conditions faced by the researchers of these first studies: What is the rationale for embarking on a venture with such far reaching administrative and financial implications and such frustrating technical complexities? We, the researchers who almost 14 years ago decided to cooperate in developing internationally valid evaluation instruments, conceived of the world as one big educational laboratory where a great variety of practices in terms of school structure and curricula were tried out. We simply wanted to take advantage of the international variability with regard both to the outcomes of the educational systems and the factors which caused differences in those outcomes. The aim of the IEA international surveys has been to go beyond the purely descriptive identification of salient factors which account for cross-national differences and to explain how they operate. Thus, the ambition has been the one prevalent in the social sciences in general, that is to say to explain and predict and to arrive at generalizations. Apart from studying descriptively the teaching of Science and the conditions under which it is conducted in various countries, the overriding thrust of the present study, as of all recent IEA work, has been an attempt to relate factors in the social, economic, and pedagogical domains characteristic of a series of national systems to output factors measured by international tests and covering both cognitive and affective outcomes. (Husén, in Preface of Comber & Keeves, 1973, pp. 10–12)

Some of the original intentions and hopes of the founding members of the IEA continue

202

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

to be debated by scholars in the field of comparative education today (Berliner & Biddle, 1995; Fairbrother, 2005; Rutkowski, von Davier, & Rutkowski, 2013; Singer & Braun, 2018; van de Vijver & Leung, 1997). But with the difference that researchers today have access to the results of many years of international comparative studies as evidence for claims of strengths and weaknesses. A brief description of each of the first international student assessment studies, with references to publications, is provided next.

all students (comprehensive) could provide as high a quality of education as selective schools (Dahllöf, 1966). The reporter noted that the percentage of 13-year-olds in the participating countries who were enrolled in mathematics courses in other countries was lower than in the United States; a potential reason for US low achievement but also promising evidence that the US educational system was reaching a broader segment of the population. This is one of the comparative issues that continued to be a matter of debate in all studies.

12-Country Pilot-study (1960)

Six-subject study (1966–1973)

The first experiment to determine whether it was possible to conduct large-scale studies of student achievement cross-nationally was executed in 1960 in 12 countries (Foshay et  al., 1962; Härnqvist, 1975; Husén, 1979, 1983). The international coordination of the first study was funded by the US Office of Education by a grant to C. Arnold Anderson at the University of Chicago and located in the UNESCO Institute for Education in Hamburg (Härnqvist, 1975; Husén, 1967). The results of this study provided sufficient evidence that the proposed research methods were feasible in a variety of countries so that additional studies were anticipated.

First International Mathematics Study, 1964 (FIMS) The IEA conducted the first cross-national studies of mathematics in 12 countries (Husén, 1979). Mathematics was chosen because the study directors believed that mathematics would be largely understood the same way across countries. The First International Mathematics Study (FIMS) resulted in a twovolume publication (Husén, 1967). The results of this study were reported in a front-page story of the New York Times on March 7, 1967. The news story led with a headline that ‘12-nation study ranks US low in math teaching; Japan Leads’ (Hechinger, 1967), a harbinger of the attention that was given to later studies. One of the issues in education during this period was whether schools that admitted

The six-subject study was conducted in 1971 and included testing in science, civics, reading, geography, foreign language, and writing (Husén, 1979; Peaker, 1975; Postlethwaite, 1974a). An issue of the Comparative Education Review was dedicated to 18 articles about this study and another 13 volumes were published (Carroll, 1975; Choppin, 1974; Comber & Keeves, 1973; Lewis & Massad, 1975; Noonan, 1976; Oppenheim & Torney, 1974; Peaker, 1975; Postlethwaite, 1969; Postlethwaite, 1974; Purves, 1973; Thorndike, 1973; Torney, Oppenheim, & Farnen, 1975; Walker, 1976). The authors of the 1974 issue are notable for representing a broad set of countries (Sweden, Hungary, England, Germany, India, the USA, Australia, Finland, Belgium, Israel, and Japan). The content of the articles reflects the breadth of the survey content in that the authors addressed survey techniques, assessment methods, curriculum, reading, science, civics, mathematics, noncognitive measures, social attitudes, techniques as well as race and sex differences.

Second-generation IEA Studies Second International Mathematics Study (SIMS) Following the completion of the first generation of IEA studies, a new survey of mathematics was planned in the 1970s and conducted in 1981. It was called the Second

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 203

International Mathematics Study (SIMS) and took place in 18 countries (Travers & Westbury, 1989). It was designed to address the concerns of critics of mathematics education that the first study was an inadequate study of the breadth of mathematics curriculum, did not contain measures that enabled the drawing of causal inferences, and assumed that mathematics was reflective of all of school learning (Freudenthal, 1975; Travers & Westbury, 1989). This survey was organized by the New Zealand Government Department of Education and the University of Illinois. One of the unique contributions of this survey was to include a pre- and post-test for students in eight countries, thereby allowing an examination of the conditions that led to change (Burstein, 1993; Suter, 2017). The survey was broader than most, with samples of Grade 4, Grade 8, and advanced secondary education. The framework designed for the mathematics assessment was carefully constructed to reflect as much of the mathematics curriculum of the participating studies as possible (Travers et al., 1988). The survey was conducted prior to the development of the internet and therefore suffered some of the same consequences of other studies conducted under conditions of low funding, such as low school participation rates, delayed communication between countries, and late production of analysis. Nevertheless, by including a strong central committee of researchers and mathematics educators, the IEA second-generation study improved the mathematics content and research design (by adding an optional preand post-measurement of achievement). The study was considered useful to the field of mathematics education, to the development of improved mathematics frameworks, and to comparative education research (Dossey & Wu, 2014; Heyneman, 2003; Heyneman & Lykins, 2007; Suter, 2017; Travers & Westbury, 1989). Also, the release of country achievement averages in newspapers at the time was further evidence that the rankings of countries would be of national concern.

Science and reading: 1980s Additional second-generation international surveys of science and reading were designed by academic institutions in the IEA (Binkley, Rust, & Williams, 1996; Elley, 1992; Lundberg & Linnakylä, 1993; Rossier & Keeves, 1991) and by the Education Testing Service in mathematics and science (Lapointe, Mead, & Askew, 1992; Lapointe, Mead, & Phillips, 1988). The Second International Science Study and the International Reading Study each sought to improve measurement methods of student assessment and techniques for data collection across countries (Papanastasiou, Plomp, & Papanastasiou, 2011). An important contribution of these surveys was to continue the training and involvement of countries participating in the earlier studies and the improved development of survey methods for comparative studies. These studies also received increased levels of attention by the public press and elected officials (Cavanagh, 2012). All first- and second-generation IEA surveys were carried out by research institutes usually associated with a university rather than a statistical company or agency. The international coordination of the effort also was located at a research institute or university. Although the first study was associated with UNESCO in Hamburg in 1960, that relationship with an existing international organization did not continue after 1965 until later. For the period of these first studies, statistical quality was secondary compared to the importance of the research and development effort. Upon establishing the third generation of studies, three international centers for design, collection, and analysis were supported with US statistical agency and international funding (National Center for Education Statistics, 2018; Suter, 2011). During this period, the number of publications about methods and findings from international assessments, by academic and non-academic authors, increased (see Chapter 1, this volume), adding a wide range of researchers who responded to the challenges of conducting and interpreting large-scale surveys.

204

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Third-generation IEA and the OECD TIMSS, 1995 The IEA was able to continue with a new generation of studies with support for the international organization in 1991 from US governmental agencies and participating countries. It received funding to establish an international center to design a third study that would include two subjects, science and mathematics, with data collection to be carried out in 1995. The Third International Mathematics and Science Survey (TIMSS) was conducted in two subjects for three populations (Grade 4, Grade 8, and advanced high school). The study also included analyses of classrooms by video observation and ethnographic observation (National Center for Education Statistics, 2018; Stevenson & Nerison-Low, 1999). In the tradition of earlier IEA studies, the third generation of assessments began with an extensive research effort to identify differences in curriculum coverage that could account for student achievement differences (Robitaille et  al., 1993; Schmidt, Jakwerth, & McKnight, 1998; Schmidt, Jorde et  al., 1996). The TIMSS Eighth grade survey was designed to analyze the relationships between school curriculum and student achievement by quantifying the coverage of curriculum in science and mathematics at three levels: as intended by the country education officials, as implemented in the classroom, and as achieved by the students (International Association for the Evaluation of Educational Achievement, 2011; Mullis et  al., 2009; Robitaille et  al., 1993). Published research studies identified a complex relationship that resulted in a small causal effect of curriculum differences on student achievement (Schmidt, McKnight, & Raizen, 1997). The basic design of the 1995 TIMSS assessments have been repeated for at least one grade every four years from 1995 to 2015 as the ‘Trends in Mathematics and Science Study’ and has spawned both official and academic reports (Martin, Mullis, & Chrostowksi, 2004). The informal gathering of interested

researchers that created the IEA in 1960 had been transformed into an international communication center in The Hague, a data collection center in the Australian Council for Educational Research (ACER) in Melbourne, and a study directing center at Boston College in Massachussets. These centers organize the extensive communication that is required among countries for creating the questionnaires, conducting the training for countries, and preparing the statistical reports.

PISA, 2000 A second, competing approach, for the measurement of student achievement in secondary school was introduced by the member countries of OECD in 2000. It was called the Program for International Student Assessment (PISA) and designed for the population of 15-year-olds (McGaw, 2008; Schleicher, 2016). The new study supported a demand for information about the preparation of a technical workforce and would be administered every third year. The test instruments included measurement of reading, mathematics and science, with a special emphasis on one of the subjects in each cycle. Survey items in the student questionnaires were modified each year to suit the analysis of the emphasized subject. The OECD maintains a comprehensive website providing access to published reports, survey data, and discussions of the studies (OECD, 2016a). The set of countries participating in either of these large-scale studies varies from year to year and from study to study. In more recent years, the number of participants in Western Europe has declined in TIMSS, while the number of countries from Asia and Africa has increased. Country participation in PISA increasingly included Southern and Eastern Europe and South America (see Figure 11.1). Both studies have added special populations, such as cities and regions within countries as independent samples. The change in coverage of countries affects how the questions are designed and interpreted; therefore, the

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 205

questionnaire and test frameworks have been modified to cover the topics taught in those countries (Mullis & Martin, 2011). The key survey overseeing agencies, the IEA and OECD, had established quality guidelines for systematizing statistical data collection, sample design, and monitoring of field procedures to ensure that survey practices would be reliable across countries (Martin, Gregory, & Stemler, 2000; OECD, 2005). Each country participating in a study selects samples of students that range from 3,000 to 6,000 students. The total sample size for PISA 2015 was about 500,000 students in 68 countries and the total sample for TIMSS in 2015 was about 272,000 students in 47 countries. The existing international agencies (IEA, OECD and UNESCO) have always played an important role in stimulating and guiding cross-national studies. The different frameworks of the IEA compared with the OECD allow independent analyses of differences in school policies and curriculum (IEA) or of student preparation for the workforce (PISA). The IEA was originally organized informally as a loose network of willing participants which sought to avoid restrictions of governmental organizations as they experimented with ideas and approaches to measuring student achievement. However, once a workable foundation of ideas and practice was established, the IEA turned to national and international agencies for financial support, thereby restricting their ability to experiment. The OECD, on the other hand, developed its surveys within a framework of informing policies of national economic development consistent with the purpose of that organization. Their surveys overtly are organized to link educational practices to student preparation for work (through increases in cognitive and affective achievements) more often than exploring educational goals, such as moral behavior.

Developing Country Assessments The number of low- and middle-income countries in Africa, Latin America, and Asia

that have conducted assessments of students is increasing in recent years due to encouragement by development agencies. The Australian Council of Educational Research (ACER) reviewed 68 studies conducted in 35 countries of the Asian and Pacific region (Australian Council for Educational Research, 2014). The International Bureau of Education reviewed assessment frameworks in 40 middle- or low-income countries (Marope, 2017). UNESCO also has developed an online guide for student assessments for those who wish to implement them or to study their effects (UNESCO, 2018). The website lists assessments for 69 countries (mostly in Africa). The goal of the project is to improve learning by supporting national strategies for assessment and developing internationally-comparable indicators and methodological tools to measure progress. UNESCO has established a Technical Cooperation Group (TCG), which discusses and develops indicators for monitoring education. The TCG is composed of representative members and is hosted by the UNESCO Institute for Statistics. A review by ACER of the implementation of large-scale assessments in development programs found that the assessments are used for evaluation and monitoring and for setting agendas and implementation policy. Interestingly, the authors of the ACER report explain that a key aspect of the use of large-scale surveys is to develop communication between policy makers and the public through the dissemination of results of the comparisons by the public media. These very goals of assessments concern other researchers and policy makers, who believe that the use of these data, especially when distributed mainly through the public media, tend to pressure policy makers to adopt policies without deep understanding (Lingard & Rawolle, 2004). In 2015, the OECD established a new program called ‘PISA for Development’, which is aimed at student learning outcomes for middle- and low-income countries. The

206

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

PISA survey instruments are being enhanced to be more relevant for the contexts found in ­middle- and low-income countries. A pilot project is being conducted between 2015 and 2018 including the World Bank, UNESCO, UNICEF and OECD (OECD, 2016b).

CONDITIONS AFFECTING LARGESCALE SURVEY DEVELOPMENT The establishment of large-scale surveys between 1960 and 2000 occurred within an environment that demanded new funding sources, international systems of communication, and new survey technologies to enable the rapid exchange of documents across large distances, as well as being a response to public attention and criticism, and to controversy regarding the use of student assessments as a means of evaluating the condition of an entire education system. This section discusses how these conditions affected their establishment and growth.

International and National Organization The first and second generation of studies in the 1960s through the 1980s were conducted inexpensively within the halls of universities. The centers of research for each participating country were reviewed by the international coordinating body (the IEA) for their ability to conduct education research. The international coordination operated with small amounts of financial support until the 1990s when the number of country participants grew above 25. Consequently, these early surveys were little more than informally conducted experiments in survey methods for cross-nationallevel studies that rarely used known methods of survey sampling, data collection, and management. They suffered from uneven coverage of age groups, inclusion of all school types, and low response rates. The results of such

informal methods of survey analysis were not highly regarded by statistical agencies and using the results for policy making and accountability was widely criticized (Berliner & Biddle, 1995; Bracey, 1996; Glass, 2008; Rotberg, 1998). International large-scale education studies of student achievement are an expensive investment by national and international agencies. Total costs for large-scale international surveys are difficult to arrive at because funding is not provided by one central source. Records derived from my work as program officer from 1990 to 2001 at the National Science Foundation for the costs of developing and executing the 1995 TIMSS provide an approximate guide. The development of the international survey instruments and administration required around $20 million over a five-year period (US 1995 dollars) and about $10 million for US data collection (for three populations, video collection, and ethnographic observation). Each country supports their own costs of data collection with materials and training provided by the international organization. Following the initial investment of research agencies in the early 1990s, the data collection organ­ izations established a fee for country-level participation that supported the international management. Individual country expenses were only required for survey data collection and processing (saving the cost of test and questionnaire development).

Research Conditions The computer revolution of the 1980s altered the quality and quantity of large-scale crossnational research and increased access to large databases by researchers. The process of collecting survey data across countries, combining them into a useful statistical database, and analyzing surveys with hundreds of thousands of respondents was greatly enhanced by the development of increased computing power, statistical software, and

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 207

the internet. Prior to the internet, survey documents were sent from one country to another by air or ship, then the data had to be keyed into a database by hand. It was a slow and messy process. Much of the analysis was difficult and slow. This increased ability to conduct analyses of large-scale surveys not only improved the quality of the surveys themselves, but also allowed a larger number of researchers and statistical agencies to become engaged in analyses and debates about methodology. Increased computer power and techniques gave educational researchers easy access to large-scale surveys of hundreds of thousands of respondents and dozens of countries. After the launching of PISA at the OECD in 2000, competition between the OECD and IEA pushed developments and analyses to new levels (as originally imagined by the early proponents of large-scale research). The quality and quantity of sophisticated analyses found in the reports of the two international organizations have continued to evolve, as have the methods of data collection, such as using laptop computers for data collection (OECD, 2014). Many of the analytical reports produced by both survey organizations provide administrators with a more thorough and nuanced view of the machinery of educational systems (OECD, 2001, 2007, 2009, 2014). The modern large-scale surveys can be depended upon to provide reliable estimates of the size and direction of relationships between essential educational components such as family background and student achievement. Sometimes these relationships are large enough to affect policy (such as the PISA analysis of the strength of family resources on student achievement; OECD, 2007) and sometimes they negate an expected positive relationship, such as between hours of after-school attendance and achievement (OECD, 2011). And as Husén pointed out in 1983, the results of cross-national analyses may be as likely to support points of view on one side of an argument as another, but they do not provide

a fundamental basis for making choices about the values of good and bad educational practices (Husén, 1983). The power of survey analysis to settle long-standing educational issues about length of school time, curriculum, or study habits is limited. For one thing, the content of the surveys must be restricted to short answer periods. For another, unique relationships frequently occur across cultures. Therefore, the analyses of patterns within countries often brings new problems to a policy discussion.

Publicity, Criticism and Revision The content and administration of the largescale surveys was affected by the public discussion of their value as an assessment method of educational systems. After each study, published criticisms of their design, analysis, and potential political influence on education policy appeared (for example, Barrett & Crossley, 2015; Berliner & Biddle, 1995; Brown, 1996; Fairbrother, 2005; Freudenthal, 1978; Goldstein, 1996; Inkles, 1978; Jerrim, 2013; Rutkowski, Rutkowski, & Zhou, 2015). Even the director of the IEA, Husén, reviewed the failings of these studies, such as problems with timing, coordination, incomplete analysis, inadequate and uneven funding, inadequate management, inadequate sampling frameworks, inadequate content area framework, and a study design which did not include an accompanying ethnographic study that might have illuminated the observed differences (Husén, 1979). While some of these limitations were considered large enough to discontinue the international efforts, the IEA leaders considered the limitations of this new approach to be an inevitable cost of initiating an original body of research (Husén, 1979) and they continued with new studies. While the intentions of the educational researchers conducting large-scale surveys were mainly of academic interest concerning the causes of student achievement, the

208

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

researchers knew early in the process that inevitably comparisons would be made between countries to find the ‘best’ system. Husén wrote in 1979, ‘When the IEA study was launched, what in the minds of some academics was perceived as a major exercise in basic research was perceived by others as an international contest in mathematics. Now, it would at last be possible to find out which country scored highest’ (Husén, 1979, p. 379). Husén, then Chairman of the IEA, articulated the difficulties in explaining the researchers’ cautions to policy makers: At the first, great efforts were made to play down the ‘horse race’ aspect by referring to the fact that countries had different curricula. We pointed out that differences in average performance between countries could not without great reservations be interpreted as reflecting differences in the efficacy of mathematics education because of the impact of social and economic differences on student competence. Furthermore, the structure and selectivity of the systems played an important role. Although 13-year-olds in England and Germany, who had transferred to academic, selective secondary schools, had already been confronted with algebra and geometry, this was generally not the case in Sweden and the United States, the two countries with the lowest average performances at that age level. Despite efforts to point out such causes for differences in national scores, the outcry was tremendous in both these countries. (Husén, 1979, p. 380)

Even though the international studies were considered flawed in method of collection, they received public attention in news stories. Already mentioned is that one of the first journalistic reports of the comparative studies was published by the New York Times by Hechinger in 1967 with a headline, ‘U.S. Ranked Low in Mathematics Teaching: 12-Nation study says Japan does best job in subject’. Little else about international studies was available publicly until an academic paper that reanalyzed the rankings of countries was published 15 years later (Lerner, 1982). The paper was written by a lawyer who had been a student at the University of Chicago during the data collection and analysis of the IEA mathematics study and

thus was aware of its existence. The article presented the original statistics in a format of rankings of countries displaying the US achievement as too low, especially compared with the amount of funds spent on education. This was the type of comparisons that Husén had been trying to avoid. However, the Lerner paper was instrumental in supporting the conclusions of the report by the US Commission on Excellence, ‘A Nation at Risk’ (1983). The report stated that the US educational system was failing compared with other countries and that government attention to education was required. This report in turn increased governmental attention, and eventually led to greater financial support for large-scale international surveys.

International Coordination, Management and Accountability The increased number of studies by the 1980s, and the substantial attention they were receiving in the national presses of each country involved in the studies, led the government funding agencies to become concerned about the quality and value of the surveys. For example, in the United States, national leaders, such as the President and members of Congress, were referring to the results of these studies in public speeches to support their policy conditions (Kuenzi, 2008; Ravitch, 1990). At the same time, researchers were concerned that the studies were not rigorous enough to receive such public attention (Berliner & Biddle, 1995; Bracey, 1996). Consequently, the leaders of these studies were forced to seek sufficient funding and establish the necessary organizational leadership to respond to the criticisms and recommendations of educational researchers, leading to improvements in the conceptualization and measurement of student learning. To ensure that the organization that carried out the studies was coherent, a committee of statisticians, anthropologists, educational policy makers, and researchers was established at

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 209

the National Research Council (NRC) of the US National Academy of Sciences in 1988 to address the promises and limitations of the existing international surveys (Bradburn & Gilford, 1990; Porter & Gamoran, 2002). The Board for International Comparative Studies in Education (BICSE) existed from 1988 to 2002 as a resource to guide the planning and analyses of large-scale international comparative studies of the United States participation and international coordination. The board members included experts in educational subject areas of reading, mathematics and science, the subject matter of the studies, to ensure that the publicity of the studies would be relevant to the ongoing policy recommendations of professional education associations. The recommendations in their published reports guided the international as well as the US system of large-scale international research. In 1990, BICSE released, A Framework and Principles for International Comparative Studies in Education, which established guidelines for funding agencies and for researchers to improve and maintain statistical and conceptual rigor of the survey methods and the quality of the measurements (National Research Council, 1990). Later the committee released, A Collaborative Agenda for Improving International Comparative Studies in Education, which elaborated on the purposes and uses of comparative studies of education (National Research Council, 1993). During the 1990s, many governments around the world had a growing concern for educational accountability and therefore increased their financial support of crossnational survey organizations to insure high survey quality (Elliott & Chabbott, 2002). Fewer of the studies were carried out solely by academic institutions without the involvement of government agencies. This process may have reduced the influence of educational researchers on the content of the studies but also improved the statistical quality of the studies and reduced the criticisms that had existed in prior studies. Concerns about the quality of the survey research methods used

in these studies had been addressed by academic researchers as well as by the National Academy of Sciences (Barrett, & Crossley, 2015; Benavot, 2005; Heyneman, 2003; Loveless, 2007; Porter & Gamoran, 2002; Seeber & Lehmann, 2013; Wolf, 1998).

Summary of Large-scale Survey Development This review of the growing pains experienced during the development of the first large-scale surveys found that the studies were expected to serve differing purposes of academic educational research, student assessment, journalistic reporting of education, and policy applications such as organizational accountability. The initial studies conducted by the IEA were experiments designed by researchers seeking answers to questions about the measurement of schoolbased student learning, teacher instruction, and curriculum. By 2000, the viability of a cross-national enterprise had been demonstrated and new studies were launched by the OECD and later by other international organizations to provide a basis for the evaluation of the school systems of each participating country. The motivation to maintain ongoing systems of international coordination came from political and educational policy leaders in the participating countries and the international organizations which used the evaluations of the status of their educational systems for policy purposes (Elliott & Chabbott, 2002). The content of the frameworks for the systematic study of student learning and the relationship of background, attitudes, teacher instruction, and parental influences on student learning was designed by academic researchers from many fields. The expertise required knowledge of psychometrics, content domains (mathematics, science, reading, writing, pre-school, computer technology, civics, and teacher training), and statistical survey design (Dossey & Wu, 2012).

210

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

These competing interests accompanied the creation of the large-scale international studies over the past 60 years. Beginning with doubts and competing purposes, the originators of the large-scale surveys expected to be able to increase understanding of the function of large educational systems and of the causes of individual differences in learning. By the early twenty-first century, knowledge that significant differences in student achievement level do exist at the aggregate level of whole countries was excepted. Interest in such comparisons has led governments and international agencies to expand the participation of the studies into all continents. However, less clear is whether the reasons for the observed country-to-country differences in student achievement are understood. The next section examines the extent to which large-scale surveys are represented in published articles and reports in the field of comparative education. The analysis will examine whether they have provided useful evidence of educational practices, as originally envisioned.

BIBLIOGRAPHIC ANALYSIS To describe changes in how large-scale surveys were represented in comparative education literature a bibliography of publications relevant to the development, execution and evaluation of large-scale surveys was accumulated for analysis. The bibliography provided a basis for analyzing the types of studies published, the source of the publication, and the trend over time in topics addressed by researchers and international organizations. The types of questions are these: Are the researchers who publish analyses and reviews of large-scale studies members of academia or are they from the research foundations that carried out the studies (and thus are unlikely to be an objective reporter)? Have critical reviews of the use of large-scale studies increased or decreased over time?

Have large-scale surveys played a significant role in contributing knowledge to the field of comparative education research? Are the survey data collected by the surveys used only as a resource for government agencies seeking information about the accountability of the school systems? The answer to these questions will help inform the field about how much, if at all, large-scale studies have affected the field of Comparative Education.

Methods A list of comparative education publications was accumulated from several sources, reviewed for relevant content, and organized into categories to be analyzed. The initial collection included titles that were relevant for commentary about the contribution to methods and epistemology of cross-national comparisons. An initial list of approximately 5,000 titles was reduced to 2,300 published reports and peer-reviewed research papers. Creation of a unique set of publications was necessary because no existing database of comparative education articles that included empirical research of large-scale studies (with national surveys) and relevant small-scale studies could be located. The Comparative Education Society had assembled and organized a bibliography of journal articles on Comparative Education until 2015 and even made the database publicly available at Florida State University (Easton, 2014, 2015, 2016). The size of the Comparative and International Education Society (CIES) collection was large – about 8,000 papers – and is growing annually. Unfortunately, the articles in that collection do not contain representative statistical and small-scale research papers that are published in disciplinary journals. Nor was it possible to use existing publicly available online search systems. A search for ‘IEA Education’ with Google Scholar returned about 120,000 citations; a search for ‘PISA survey’ identified about 200,000 titles; and a search for ‘TIMSS education’ resulted in 27,000 titles. The Google Scholar reports

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 211

searches include many duplicate publications and the content was impossible to assemble and analyze. Therefore, publications by the international survey organizations and academic scholars that commented on the status of the field of Comparative Education were identified through several collection methods. The author assembled a bibliography from a collection of titles maintained throughout 30 years of study in the field of Comparative Education. This collection was supplemented with references in bibliographies of papers in 20 review articles, five comparative education textbooks, five handbooks, two syllabi for college courses in comparative education, and published lists of research and statistical reports by the international survey organizations. The collection includes publications issued by the IEA, the OECD, UNESCO, and the World Bank. The IEA listed over 250 statistical reports between 1970 and 2016; the OECD website lists about 165 reports from PISA and the Program for the International Assessment of Adult Competencies (PIAAC), including 70 major reports for the six cycles, 73 ‘Focus’ reports, and 22 reviewed papers between 2000 and 2015. A few papers by UNESCO and the World bank were also included. The bibliography included papers issued during the first years of the first IEA studies on topics of education that could be addressed with measurement of schools, students, curriculum, and teachers. It includes early theoretical papers by Husén & Dahllöf (1965), Foshay et  al. (1962), and Carroll (1963), describing the issues that were considered while creating the frameworks for the first international surveys. Representative papers by researchers who were involved with assessing the effects of the Swedish reform during the 1960s are purposively included (Dahllöf, 1966, 1974; Husén, 1974, 1996; Husén & Boalt, 1967). The collection includes articles by authors in many countries, including some in French, Spanish, German, Russian and Chinese; however, the dominant language was English.

It does not include statistical and analytical reports on surveys produced by individual countries reporting on their participation in an international cooperative study. References to all publications were collected and organized with Mendeley, a bibliographic organization system. Copies of papers were obtained from the University of Michigan library. This accumulated list was reduced by eliminating studies of individual countries and others that did not directly discuss the content of Comparative Education as a field. The final bibliographic set was intended to represent a focused but fair representation of academic and nonacademic publications of large-scale international studies and, for comparison, of smallerscale studies in the field of comparative education research. The collection of papers included titles from 1900 to 2017; however, the analysis of change was conducted on publications between 1960 and 2017. The selective bibliography of reports and academic research papers provides a basis with which to illustrate change in coverage and analysis of large-scale surveys compared to comparative education research that did not rely on large-scale survey analysis. The period of coverage will allow an examination of trends in academic use of and interest in large-scale surveys. After collecting the bibliography, the references were coded from titles into major categories so that comparisons could be made by method of study. The coding included whether the reference was to an official report or a peer-reviewed paper, the year of publication, whether the paper referenced a large-scale survey, the name of the large-scale survey mentioned, whether the author was from an academic institution or not, general content area, and whether the article critiqued the value and quality of large-scale studies. The bibliography includes about 2,300 different articles for years 1960 to 2017. Although I cannot test how well this bibliography represents all comparative education research, it’s size and variability of authorship will provide useful guidance about the changes

212

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Table 11.1  Number of comparative education publications by academic researchers and large-scale research organizations by source of data: 1960 to 2017 Data source Publisher Total Research journal Large-scale study only IEA OECD Other

Total 2455 2056 312 178 172 49

in use of large-scale surveys for the field of Comparative Education. Readers should note that the apparent decline in numbers between 2012 and 2016 is a bias caused by difficulty in identifying all recent studies; the most recent years are not representative of the true number of publications or authors. Many academic papers in this set included discussions of the nature of the field of Comparative Education itself (its identity, goals, and methods), another 10 percent were textbooks or handbooks for instruction, and about 40 percent were discussions of specific research reports or methods of research. Not included in the list of titles is a comprehensive set of published news stories about international comparisons. News organizations reported widely on the studies of international comparisons issued by survey organizations (Baroutsis & Lingard, 2016; Heckinger, 1967). For example, Yemini and Gorden (2015) report that journalists in Israel largely ignored student testing results issued by the school system, but they widely reported on the international comparisons to criticize the quality of the national education system. Is this same attention to large-scale studies evident in the number of publications in the journals of comparative education?

Publication Type and Source The total number of publications in the bibliography for 1960 to 2017 are shown in Table 11.1 by the type of organization issuing the

IEA 363 200 66 152 1 10

OECD 477 289 144 16 170 2

Other 1615 1567 102 10 1 37

publication and by the source of the large-scale data. The references obtained for the bibliography are not dominated by publications by the large-scale international organizations. Each organization, the IEA and the OECD, published around the same number of publications each during this period and a similar proportion of the total papers on the major surveys of TIMSS and PISA were written in academic journals (about 60 percent). Overall, the field of Comparative Education is dominated by academic researchers, who accounted for 85 percent of all cross-national publications identified for the bibliography. Another way to perceive the connection between large-scale studies and who publishes about them is to see that about a fourth of all publications about comparative education (including both peerreviewed journals and issued reports) are on large-scale ­surveys whereas only 15 percent of the peer-reviewed publications were about large-scale surveys during this entire period. Figure 11.2 shows the changes between 1970 and 2017 in the number of publications per year on comparative education by whether the publication referenced a largescale research study and whether the publication was written by an academic researcher or by one of the international data collection organizations (the OECD or IEA). This ana­ lysis shows just how quickly the number of comparative education publications grew in the twenty-first century. Prior to 1970, fewer than 10 papers a year had been written about large-scale comparative studies. After 2000, the number of papers per year increased to

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 213

140 120 100 80 60 40 20

IEA

OECD

RESEARCHER

2016

2014

2012

2010

2006

2008

2004

2002

2000

1998

1996

1994

1992

1990

1988

1986

1984

1982

1980

1978

1976

1974

1972

1970

1968

1966

1964

1962

1960

0

RESEARCHER

Figure 11.2  Number of comparative education publications per year by author type and whether large-scale or not: 1960 to 2017

as high as 145 in 2012 and 2013 (the bibliographic search ended in 2017). The proportion of comparative education research papers in this bibliography that referenced large-scale surveys grew from 10 percent of all researcher-written publications in 2000 to around over one-third by 2014. The chart shows how the early IEA studies represented a very significant portion of large-scale cross-national research prior to 2000. With the creation of PISA in 2000, the number of annual reports by the OECD surpassed that of the IEA and the number of academic publications also increased. Was the increase in academic papers a response to the growing interest in large-scale comparisons or to the general growth of comparative education scholars? One way to answer that question is by classifying the content of the publications.

Who Writes About Large-scale Studies? The total number of individual authors for comparative education research represented in the database is over 2,000. Of these, about

23 percent were writing about large-scale research. Figure 11.3 shows the distribution of the number of different first authors of papers for large- and small-scale research between 1970 and 2016. While the number of different authors per year for small-scale international comparative research peaked at about 65, the number of authors conducting large-scale analysis was around 20 per year in the second decade of the twenty-first century. The proportion of first authors writing about largescale research increased from 20 percent of all authors in 2000 to 30 percent in 2015. The change over time in authorship of publications on comparative education indicates that for about 30 years large-scale surveys received significant public attention in the national media of participating countries but were less likely to be discussed in comparative education publications until after 2000. Several technical reasons can explain why large-scale studies were not common in comparative education journals prior to 2000. First, few researchers paid attention to the first IEA studies in 1967 and 1972 (de Landsheere, Grisay, & Henry, 1974; Kazamias, 1974; Pollock, 1974; Postlethwaite, 1969, 1974a;

214

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

80 70 60 50 40 30 20 10 0

Count of Large-scale papers

Count of Small-scale papers

Figure 11.3  Number of different authors for large-scale and small-scale researchers

Sandbergen, 1974). Individual researchers not directly associated with the data collection would have had a difficult time conducting analysis of the large-scale surveys prior to their collection and distribution later. Although some fellowships were provided to a few graduate students who worked in Stockholm with the leaders of the IEA during the 1970s, only a few graduate students received training at universities on how to use the survey data prior to 2000. Some of those who were trained during this period became leaders for the second round of IEA studies, such as Leigh Burstein and Judith Torney-Purta (Papanastasiou, Plomp, & Papanastasiou, 2011). The release of the results of SIMS in 1986 (McKnight et al., 1987) brought further attention to the educational achievement comparisons between countries by other researchers and the public. However, the study did not generate a large body of analysis in the academic world because the surveys were complex and difficult to analyze (although see Baker, 1993). The largest influence on general academic interest in large-scale surveys appears to have been generated by the success of the PISA surveys, especially in Europe

after 2000. For example, 60 percent of the publications on PISA in the bibliography were written by academics compared with only 20 percent for TIMSS. These estimates of academic influence are based on a rough analysis of publication titles. Also, allocating roles of published papers is sometimes fuzzy because many researchers switched roles between producing large-scale research and writing academic papers. Nevertheless, the changes noted in the analysis of publications and authorship provides evidence that the large-scale surveys of the OECD and IEA stimulated the field of Comparative Education to increasingly discuss all types of comparative studies.

Content of the Research Papers in the Bibliography The titles of the papers were coded into categories of research method to obtain an estimate of the importance of the major trends in the processes of analysis by comparative education researchers. About three-quarters of the research papers contained comments about the methods of research and the use of

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 215

comparative methods. Another 18 percent of the research papers commented on the nature of the field of Comparative Education itself, such as the ideology and epistemologies represented by the various researchers. Very few papers in this set of researchers discussed educational practices or education policy as a main theme in the paper. Some papers in this collection include discussion of and commentary on the quality and proper use of large-scale surveys. Some of the critical points made in these papers are that the collection of large-scale survey data for international comparative studies: (1) lack a valid conceptual design for the content areas measured, especially in the early studies (Freudenthal, 1978); (2) have serious weaknesses in sample selection and response (Bracey, 1996; Bradburn & Gilford, 1990); (3) cause policy makers to use results without acknowledging the limitations of the survey execution; and (4) are not appropriate for making policy in a single country (Benavot, 2005; Berliner & Biddle, 1995; Fairbrother, 2005; Glass, 2008; Goldstein, 1996, 2004; Loveless, 2003). Many of the issues were known to exist by the originators of the IEA studies in the 1960s (Husén, 1974; Thorndike, 1962; Wolf, 1998). In recent years, a growing number of published papers have addressed the value of large-scale studies and have discussed issues of measurement techniques with attention to the negative political impact of comparative studies on public education policy (Heyneman, 2003; Loveless, 2003; Rutkowsky, Rutkowsky, & Zhou, 2015; van de Vijver & Leung, 1997). These changes in acceptability of international large-scale survey studies no doubt reflect improvements in the quality of survey design and analysis. However, the application of a set of statistics for policy making is still a matter of open debate in the field of Comparative Education – issues that are addressed throughout this volume (Baird et al., 2017; Barrett & Crossley, 2015; Carnoy & Rothstein, 2013). The introduction of PISA in 2000 has had a strong influence on the role of large-scale

surveys on comparative education research over the past 15 years (see Figure 11.2). In part, the influence of PISA may be the large number of analytical reports produced by the OECD staff, including about 80 reports directed at specific policy issues in education, such as (to name a few) gender, school resources, opportunity to learn, parent– teacher relationships, immigrants, technology, and vouchers. The influence of PISA assessments on the European education system has received substantial academic comments (Baird et  al., 2017; Crossley, 2014; Ertl, 2006; Figazzolo, 2009; Grek, 2009; Strajn, 2014). Grek argues that the OECD had become both a technologically strong agency and a policy-making agency, and thus had a significant level of credibility among governments in the European Union (Grek, 2009). She reports that the policies about elementary-secondary education in Germany and Finland were dramatically affected by the analyses produced by the OECD reports on PISA. Neither of these countries had an ongoing assessment program prior to PISA. Grek concluded that policy making in England was not influenced by PISA results and that PISA had its biggest effect in Germany, where it led to curricular reforms and other reforms such as applying new quality control measures in the form of teacher and student evaluations and tests (Grek, 2009, p. 8). Now that large-scale surveys have accumulated a large body of evidence over a significant period of time, such analyses of their influence, or lack thereof, is a popular topic of academic and governmental discussion. The papers in this Handbook reflect a portion of these discussions.

CONCLUSION This chapter has reviewed the reasons for the creation of large-scale comparative studies of education and has examined the trends in the academic and non-academic publication of

216

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

the studies over the past 60 years. Researchers involved with the implementation of ‘empirical’ assessments of student learning across diverse countries intended the studies to provide unbiased reference points for better understanding the quality of education. Although the originators of the studies had attempted to restrict the use of the studies to scientific analysis of educational practices, the availability of comparisons of student achievement across countries immediately drew the attention of the public media, who reported the results as a judgement of the quality of each country’s system of education. Debates about the quality and usefulness of international comparisons of large-scale surveys began at the time the first reports were issued in the 1960s. Critical analysis of survey methods, of definitions of student knowledge, of the role of curriculum and teacher methods, of the use of student time, and of cultural and historical differences between countries have been a topic of many publications. However, rather than stop the growth of these studies, their producers were challenged by the criticisms and sought methods to strengthen the methods of data collection and analysis (Bradburn & Gilford, 1990). By the second decade of the twenty-first century, over 100 countries conduct surveys of student assessment in mathematics, science, reading and literacy. The publication of cross-country comparisons has had an impact on public policy in many countries. Yet, academic debates should continue to address how to use empirical results from diverse settings to inform scientific knowledge and educational policy. Unvarnished comparisons between very large and developed countries with smaller and less developed countries should raise major questions in the minds of researchers about the validity of the measurement of concepts of achievement and social status. The beliefs of the originators of large-scale studies was summarized by Husén (1983), who wrote that the empirical results provided by the first IEA studies were useful as guides for policy making and for theoretical analysis, but not

as a guide for establishing goals for education or society (Husén, 1983). He noted that the acceptability of empirical research in the 1950s was regarded differently by different countries. Some, like Germany, were not certain that empiricism could provide useful answers to policy questions, whereas others, as in the USA at the time, did not believe otherwise. Regarding the relationship between empirical research on education and their interpretation, Husén noted that the results of experiments in Sweden during the period of reform from selective to comprehensive schools could be used for arguments on both sides of the debate about the relative effectiveness of selective or comprehensive school organization (this is also noted by Dahllöf, 1966, 1974). Husén’s themes for considering the role of large-scale surveys in policy making and comparative education research are still relevant today. Empirical results derived from student responses to a limited set of questions raise as many questions about the validity of the questions, and responses, as they answer questions about the causes of educational achievement. Many researchers and policy makers are unaware, or have forgotten, the guidance of philosophers who explain that empirical evidence is useful for negating a claim, but not very useful for positively supporting a claim (Zarebski, 2009). The clear lesson from the analysis of cross-national studies is that human behavior appears to be motivated by so many possible sources of actions all at once that it is impossible for researchers to recommend a single key factor, as desired by policy makers, to improve student learning. Researchers who closely examine the results of the comparisons are aware that empirical studies may be useful as guidance, and that they always present complex problems of interpretation that may nullify the positive aspects. Note, for example, the discussion of the relationship between economic growth and student attitudes toward science careers by the OECD (2016b). As Porter and Gamoran (2002) suggested, international comparisons may be mostly useful

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 217

for generating new hypotheses because they expand the pool of possible behavior, not reduce them to a minimum. Therefore, this chapter has sought to determine whether the large-scale studies have been used mainly for policy debates or more for scientific study? The bibliographic analysis of thousands of publications over 60 years provided a descriptive device for discussing the extent to which scientific research about educational practices was conducted by scholars. The trends in publications demonstrated that hundreds of reports of large-scale research have been produced by the responsible data collection organizations and by governmental and international policy-analysis organizations. However, the dominant trend in the past 20 years has been a large increase in academic attention to both large-scale and small-scale internationally comparative studies. The observation of changes in publication patterns of the field of Comparative Education shown in this chapter and the evidence of recent expansion of data collection activities to many countries appears to have a momentum that is not likely to let up. The domain of comparative education research will benefit from the increased number of researchers examining international differences in educational practices – applying not only in largescale survey methods. Likewise, the field of Comparative Education will contribute to the development of improved empirical measurement methods, frameworks and theories of understanding cross-national differences.

REFERENCES The Center for Global Education Monitoring, (2014). The Latin-American laboratory for assessment of the quality of education: Measuring and comparing educational quality in Latin America. (Assessment series no. 3). Australian Council for Educational Research (ACER). ISSN 2203-9406. Baird, J., Johnson, S., Hopfenbeck, T. N., Isaacs, T., Stobart, G., Yu, G., & Isaacs, T. (2017). On

the supranational spell of PISA in policy. Educational Research, 58(2), 121–138. http://doi. org/10.1080/00131881.2016.1165410 Baker, D. P. (1993). Compared to Japan, the US is a low achiever… really: New evidence and comment on Westbury. Educational Researcher, 22(3), 18–20. http://doi.org/10. 3102/0013189X022003018 Baroutsis, A., & Lingard, B. (2016). Counting and comparing school performance: An analysis of media coverage of PISA in Australia, 2000–2014. Journal of Education Policy, 1–18. http://dx.doi.org/10.1080/0268 0939.2016.1252856 Barrett, A. M., & Crossley, M. (2015). The power and politics of international comparisons. Compare: A Journal of Comparative and International Education, 45(3), 467–470. http://doi. org/10.1080/03057925.2015.1027509 Baumert, J., Lüdtke, O., Trautwein, U., & Brunner, M. (2009). Large-scale student assessment studies measure the results of processes of knowledge acquisition: Evidence in support of the distinction between intelligence and student achievement. Educational Research Review, 4(3), 165–176. http://doi. org/10.1016/j.edurev.2009.04.002 Benavot, A. (2005). School knowledge in comparative and historical perspective – BIBLIOGRAPHY. School knowledge in comparative and historical perspective (Vol. 18). Dordrecht: Springer Netherlands. http://doi. org/10.1007/978-1-4020-5736-6 Berliner, D. C. (2015). The many facets of PISA. Teachers College, 117(10), 519–521. Berliner, D. C., & Biddle, B. J. (1995). The manufactured crisis. New York: Addison-Wesley. Binkley, M., Rust, K., & Williams, T. (Eds.). (1996). Reading literacy in an international perspective: Collected papers from the IEA Reading Literacy Study. Washington, DC: National Center for Education Statistics. Bracey, G. W. (1996). International comparisons and the condition of American education. Educational Researcher, 25(1), 5–11. Bradburn, N., & Gilford, D. M. (Eds.) (1990). A framework and principles for international comparative studies in education. Washington, DC: National Academy Press. Brown, M. (1996). FIMS and SIMS: The first two IEA international mathematics surveys. Assessment in Education. London: Taylor & Francis.

218

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Burstein, L. (Ed.) (1993). The IEA Study of Mathematics III: Student growth and classroom processes. Oxford: Pergamon. Carnoy, M., & Rothstein, R. (2013). What do international tests really show about US student performance? Washington, DC: Economic Policy Institute. Carroll, J. B. (1963). A model of school learning. Teachers College Record, 64, 722–733. Carroll, J. (1975). The teaching of French as a foreign language in eight countries. Stockholm: Almqvist & Wiksell. Cavanagh, Sean (2012). Concern over American students’ middling scores on high-profile tests vies with caution about cultural and political factors that shape school improvement. Ed Week, 1 December. Retrieved from edweek. org/ew/articles/2012/01/12/16overview.h31. html Choppin, B. (1974). The correction for guessing on objective tests: A report of the general findings of the IEA study of guessing behavior. Stockholm: IEA. Coleman, J. S., Campbell, E. Q., Hobson, J. C., McPartland, J., Mood, A. M., Weinfield, F. D., & York, R. L. (1966). Equality of educational opportunity. Washington, DC: U.S. Government Printing Office. Comber, L. C., & Keeves, John P. (1973). Science education in nineteen countries: An empirical study. Stockholm: Almqvist & Wiksell. Commission on Excellence (1983). A nation at risk. Washington DC: U.S. Department of Education. Crossley, M. (2014). Global league tables, big data and the international transfer of research modalities. Comparative Education, 50(1), 15–26. Dahllöf, U. (1966). Recent reforms of secondary education in Sweden. Comparative Education, 2, 71–92. Dahllöf, U. (1974). Trends in process-related research on curriculum and teaching at different problem levels in educational sciences. Scandinavian Journal of Educational Research, 18(1), 55–77. http://doi. org/10.1080/0031383740180104 de Landsheere, G., Grisay, A., & Henry, G. (1974). High achievers in Belgium: A partial analysis of IEA science, literature and reading comprehension data. Comparative Education Review, 18(2), 180–187.

Dossey, J. A., & Wu, M. L. (2012). Implications of international studies for national and local policy in mathematics education. In Alan J. Bishop, Christine Keitel, Jeremy Kilpatrick, & Frederick K. S. Leun (Eds.), Third international handbook of mathematics education. London: Springer. Dow, Peter (1991). Schoolhouse politics: Lessons from the Sputnik Era. Cambridge, MA: Harvard University Press. Easton, P. B. (2014). Documenting the evolution of the field: Reflections on the 2013 Comparative Education Review Bibliography. Comparative Education Review, 58(4), 555– 574. http://doi.org/10.1086/678047 Easton, P. B. (2015). Comparative Education Review Bibliography 2014: Catching up with the rapid growth of the field. Comparative Education Review, 59(4), 743. Easton, P. B. (2016). Comparative Education Review Bibliography 2015: Galloping growth and concluding reflections. Comparative Education Review, 60(4), 833–843. http:// doi.org/10.1086/688766 Elley, W. B. (1992). How in the world do students read? IEA Study of Reading Literacy. The Hague: IEA. Elliott, E. J. & Chabbott, C. (2002). Long-term research agenda for international comparative education: Understanding others, educating ourselves. Washington, DC: National Academies Press. Ertl, H. (2006). Educational standards and the changing discourse on education: The reception and consequences of the PISA study in Germany. Oxford Review of Education, 32(5), 619–634. Fairbrother, G. P. (2005). Comparison to what end? Maximizing the potential of comparative education research. Comparative Education, 41(1), 5–24. http://doi.org/10.1080/ 03050060500073215 Figazzolo, L. (2009). Impact of PISA 2006 on the education policy debate. Education International. Retrieved from www.ei-ie.org/ research/en/documentation.php Foshay, A. F., Thorndike, R. L., Hotyat, F. et al. (1962). Educational achievements of thirteen-year-olds in twelve countries. Hamburg: Unesco Institute for Education. ­ Downloaded from http://unesdoc.unesco. org/images/0013/001314/131437eo.pdf

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 219

Freudenthal, H. (1975). Pupils’ achievement internationally compared; The IEA. Educational Studies in Mathematics, 6, 127–186. Freudenthal, H. (1978). Weeding and sowing: Preface to a science of mathematical education. Amsterdam: Reidel. Glass, G. V. (2008). Fertilizers, pills, and magnetic strips: The fate of public education in America. USA: Information Age Publishing. Goldstein, H. (1996). The IEA Studies [Special issue]. Assessment in Education: Principles, Policy & Practice, 3(2). Goldstein, H. (2004). International comparisons of student attainment: Some issues arising from the PISA study. Assessment in Education: Principles, Policy & Practice, 11(3), 319–330. http://doi.org/10.1080/ 0969594042000304618 Grek, S. (2009). Governing by numbers: The PISA ‘effect’ in Europe. Journal of Educational Policy, 24(1), 23–37. Grisay, A., & Griffin, P. (2006). What are the main cross-national surveys. In K. N. Ross & J. Genevois (Eds.), Cross-national studies of the quality of education: Planning their design and managing their impact. New York: UNESCO, International Institute for Educational Planning. Harnqvist, K. (1975). 3: The International Study of Educational Achievement. Review of Research in Education, 3(1), 85–109. http:// doi.org/10.3102/0091732X003001085 Hechinger, F. (1967). U.S. ranks low in mathematics teaching. New York Times, March 7. Heyneman, S. (2003). History and problems of making educational policy at the World Bank, 1960–2000. International Journal of Educational Development, 23, 315–337. Heyneman, S. P., & Lykins, C. R. (2007). The evolution of comparative and international education statistics. In Helen F. Ladd & Edward B. Fiske (Eds.), Handbook of research in education finance and policy (pp. 107– 130). London: Routledge. Husén, T. (1967). International study of achievement in mathematics: A comparison of twelve countries (Vols 1 & 2). New York: Wiley. Husén, T. (1973). Foreword. In L. C. Comber & J. P. Keeves (Eds.), Science education in nineteen countries: International studies in evaluation (Vol. 1). Stockholm: Almqvist & Wiksell.

Husén, T. (1974). In review: Introduction to the reviews of three studies of the International Association for the Evaluation of Educational Achievement (IEA). American Educational Research Journal, 11(4), 407–408. http://doi. org/10.3102/00028312011004407 Husén, T. (1979). An international research venture in retrospect: The IEA surveys. Comparative Education Review, 23. Husén, T. (1983). The international context of educational research. Oxford Review of Education, 9(1), 21. Husén, T. (1996). Lessons from the IEA studies. International Journal of Educational Research, 25(3). http://doi.org/10.1016/ 0883-0355(96)82850-3 Husén, T., & Boalt, G. (1967). Educational research and educational change: The case of Sweden. Stockholm: Almqvist & Wiksell; New York: John Wiley. Husén, T., and Dahllöf, U. (1965). An empirical approach to the problem of curriculum content. International Review of Education, 11(1), 51–76. Inkeles, Alex (1978). The International Evaluation of Educational Achievement. In Proceedings of the National Academy of Education, 4, 139–200. Washington, DC: National Academy of Education. International Association for the Evaluation of Educational Achievement (IEA) (2011). Brief history of the IEA. Retrieved from www.iea. nl/brief_history_of_iea.html International Association for the Evaluation of Educational Achievement (IEA) (2017). TIMSS and PIRLS. Retrieved from https://timssandpirls.bc.edu/isc/publications.html) International Organization for Standardization (ISO) (2018). Codes for the representation of names of countries and their subdivision: Country Codes – 3166. Retrieved from www. iso.org/iso-3166-country-codes.html Jerrim, J. (2013). The reliability of trends over time in international education test scores: Is the performance of England’s secondary school pupils really in relative decline? Journal of Social Policy, 42(2), 259. http:// doi.org/10.1017/S0047279412000827 Kazamias, Andreas M. (1974). Editor’s Note. Comparative Education Review, 18(2), 330–332. Kuenzi, J. J. (2008). Science, technology, engineering, and mathematics (STEM) education:

220

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Background, federal policy, and legislative action. Congressional research service reports, 35. Retrieved from http://digitalcommons.unl.edu/crsdocs/35 Lapointe, A. E., Mead, N. A., & Askew, J. M. (1992). Learning mathematics. Princeton, NJ: Educational Testing Service. Lapointe, A. E., Mead, N. A., & Phillips, G. W. (1988). A world of differences: An international assessment of mathematics and science. Princeton, NJ: Educational Testing Service. Lerner, B. (1982). American education: How are we doing? The Public Interest, 69, 69–82. Lewis, C., & Massad, C. (1975). The teaching of English as a foreign language in ten countries. Stockholm: Almqvist & Wiksell. Lingard, B., & Rawolle, S. (2004). Mediatizing educational policy: The journalistic field, science policy, and cross-field effects. Journal of Education Policy, 19(3), 361–380. Lockheed, M. (2011). Reflections on IEA from the perspective of a World Bank Official Marlaine E. In C. Papanastasiou, T. Plomp, & E. C. Papanastasiou (Eds.), IEA 1958–2008: 50 Years of Experiences and Memories (Vol. 1). Nicosia, Cyprus: International Association for the Evaluation of Educational Achievement (IEA). Loveless, T. (2003). How well are American students learning. Washington, DC: Brookings Institution–Brown Center, 3(3), 40. Retrieved September 4, 2017 from www. brookings.edu/∼/media/research/files/ reports/2001/9/education/09education.pdf Loveless, T. (Ed.) (2007). Lessons learned: What international assessments tell us about math achievement. Washington, DC: Brookings Institution. Lundberg, I., & Linnakylä, P. (1993). Teaching reading around the world: IEA Study of Reading Literacy. The Hague: IEA. Marope, M. (2017). Current and critical issues in curriculum, learning and assessment: Monitoring progress towards SDG 4.1: Initial analysis of National Assessment Frameworks for Mathematics. Paris: UNESCO Institute for Statistics. Martin, M., Gregory, K. D., & Stemler, S. E. (Eds.) (2000). TIMSS 1999 technical report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College.

Martin, M., Mullis, I., & Chrostowksi, S. (2004). TIMSS 2003 technical report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. McGaw, B. (2008). The role of the OECD in international comparative studies of achievement. Assessment in Education: Principles, Policy & Practice, 15(3), 223–243. McKnight, C. C., Crosswhite, F. J., Dossey, J. A., Kifer, E., Swafford, J. O., Travers, K. J., & Cooney, T. J. (1987). The underachieving curriculum: Assessing US school mathematics from an international perspective. Champaign, IL: Stipes Publishing. Mullis, I. V. S., & Martin, M. O. (2011). TIMSS 2011 item writing process and guidelines. Chestnut Hill, MA: TIMSS and PIRLS International Study Center, Lynch School of Education, Boston College. Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y., & Preuschoff, C. (2009). TIMSS 2011 assessment frameworks. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College. National Academy of Sciences (NAS) (1997). Symposium, reflecting on Sputnik: Linking the past, present, and future of educational reform. A symposium hosted by the Center for Science, Mathematics, and Engineering Education, Washington, DC. Retrieved from www.nas.edu/sputnik/agenda.htm National Center for Education Statistics (2018). About NCES international comparisons. Retrieved from https://nces.ed.gov/surveys/ international/about.asp National Research Council (1993). A collaborative agenda for improving international comparative studies in education. Washington, DC National Academy of Sciences. Noonan, R. (1976). School resources, social class, and student achievement: A comparative study of school resource allocation and the social distribution of mathematics achievement in ten countries. Stockholm: Almqvist & Wiksell. OECD (2001). Knowledge and skills for life: First results from the OECD programme for international student assessment (PISA) 2000. Paris: OECD Publishing. OECD (2005). PISA 2003 technical report: Programme for International Student Assessment. Paris: OECD Publishing.

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 221

OECD (2007). PISA 2006: Science competencies for tomorrow’s world. Paris: OECD Publishing. OECD (2009). PISA 2006 technical report. Paris: OECD Publishing. OECD (2010). PISA 2009 results: Overcoming social background: Equity in learning opportunities and outcomes (Volume II). Paris: OECD Publishing. Retrieved from http:// dx.doi.org/10.1787/9789264091504-en. OECD (2011). Quality time for students: ­Learning in and out of school (PISA). Paris: OECD Publishing. Retrieved from www. oecd-ilibrary.org/education/quality-time-forstudents-learning-in-and-out-of-school_ 9789264087057-en OECD (2014). PISA 2012 technical report. Paris: OECD Publishing. Retrieved from www.oecd.org/pisa/pisaproducts/PISA2012-technical-report-final.pdf OECD (2016a). PISA products. Retrieved from www.oecd.org/pisa/pisaproducts/ OECD (2016b). PISA for Development brief – 2016/07 (July). Paris: OECD Publishing. Retrieved from www.oecd.org/pisa/aboutpisa/PISA-FOR-DEV-EN-1.pdf Oppenheim, A., & Torney, J. (1974). The measurement of children’s civic attitudes in different nations. Stockholm: Almqvist & Wiksell. Papanastasiou, C., Plomp, T., & Papanastasiou, E. C. (Eds.) (2011). IEA 1958-2008: 50 years of experiences and memories (Vols I & II). Nicosia, Cyprus: International Association for the Evaluation of Educational Achievement (IEA). Peaker, Gilbert (1975). An empirical study of education in twenty-one countries: A technical report. Stockholm: Almqvist & Wiksell. Plowden, B. and Central Advisory Council for Education (England) (1967). Children and their primary schools. London: Her Majesty’s Stationery Office. Retrieved from www.educationengland.org.uk/documents/plowden/ Pollock, J. G. (1974). Some reflections on the Scottish national data. Comparative Education Review, 18(2), 268–278. Porter, A. C., & Gamoran, A. (Eds.) (2002). Methodological advances in cross-national surveys of educational achievement. Washington, DC: National Academy Press. Postlethwaite, T. N. (Ed.) (1969). International Project for the Evaluation of Educational

Achievement (IEA) [Special issue]. International Review of Education, 15(2). Postlethwaite, T. (Ed.) (1974a). What do children know? International studies on educational achievement [Special issue]. Comparative Education Review, 18(2), 155–156. Postlethwaite, T. N. (1974b). Target populations, sampling, instrument construction and analysis procedures. Comparative Education Review, 18(2), 157–163. Purves, A. (1973). Literature Education in Ten Countries. An empirical study. Stockholm: Almqvist & Wiksell. Prais, S. J. (2003). Cautions on OECD’S Recent Educational Survey (PISA). Oxford Review of Education, 29(2), 139–163. Ravitch, D. (1990, January 10). Education in the 1980s: A concern for ‘quality.’ Education Week, 9(16), 48. Retrieved from www.rfwp. com/ Rindermann, H. (2007). The g-Factor of international cognitive ability comparisons: The homogeneity of results in PISA, TIMSS, PIRLS and IQ-Tests across nations. European Journal of Personality, 21(5), 667–706. Retrieved from http://dx.doi.org/10.1002/per.634 Robitaille, D. F., Schmidt, W. H., Raizen, S., McKnight, C., Britton, E., & Nocol, C. (1993). Curriculum frameworks for mathematics and science. Vancouver: Pacific Educational Press. Rosier, M. J., & Keeves, J. P. (Eds.) (1991). The IEA Study of Science I: Science education and curricula in twenty-three countries. Oxford: Pergamon Press. Ross, K. N., & Genevois, I. (2006). Crossnational studies of the quality of education: Planning their design and managing their impact. Paris: UNESCO, Institute for Educational Planning. Rotberg, I. C. (1998). Interpretation of international test score comparisons. Science, 280(5366), 1030–1031. Rutkowski, L., & Rutkowski, D. (2009). Trends in TIMSS responses over time: Evidence of global forces in education? Educational Research and Evaluation, 15(2), 137–152. Rutkowski, L., Rutkowski, D., & Zhou, Y. (2016). Parameter estimation methods and the stability of achievement estimates and system rankings: Another look at the PISA model. International Journal of Testing, 16(1), 1–20. doi: 10.1080/15305058.2015.1036163

222

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Rutkowski, L., von Davier, M., & Rutkowski, D. (2013). Handbook of international largescale assessment: Background, technical issues, and methods of data analysis. London: Chapman and Hall/CRC. Sandbergen, S. (1974). The prediction of two non-cognitive educational criteria. Comparative Education Review, 18(2), 262–267. Schleicher, A. (2016). International assessments of student learning outcomes. In D. Wyse, L. Hayward, & J. Pandya (Eds.), The SAGE handbook of curriculum, pedagogy and assessment. London: Sage. Schmidt, W. H., Jakwerth, P. M., & McKnight, C. C. (1998). Curriculum sensitive assessment: Content does make a difference. International Journal of Educational Research, 29(6), 503–527. doi:10.1016/ S0883-0355(98)00045-7 Schmidt, W. H., Jorde, D., Cogan, L. S., Barrier, E., Gonzalo, I., Moser, U., Shimizu, K., Sawada, T., Valverde, G., McKnight, D., Prawat, R., Wiley, D. E., Raizen, S., Britton, E. D., & Wolfe, R. G. (1996). Characterizing pedagogical flow: An investigation of mathematics and science teaching in six countries. Dordrecht: Kluwer. Schmidt, W. H., & Mcknight, C. C. (1995). Surveying educational opportunity in mathematics and science: An international perspective. Educational Evaluation and Policy Analysis, 17(3), 337–353. doi:10.3102/01623737017003337 Schmidt, W. H., McKnight, C. C., & Raizen, S. A. (1997). A splintered vision: An investigation of US science and mathematics education. Dordrecht: Kluwer. Seeber, S., & Lehmann, R. (2013). The impact of international large-scale assessments on work-related educational monitoring and policy-making in Germany. Research in Comparative and International Education, 8(3), 342–348. http://doi.org/10.2304/rcie.2013. 8.3.342 Singer, J. D., & Braun, H. I. (2018). Testing international education assessments. Science, 360(6389), 38–40. Stevenson, Harold W., & Nerison-Low, Roberta (1999). To Sum it up: Case studies of education in Germany, Japan, and the United States. Washington, DC: National Institute on Student Achievement, Curriculum, and

Assessment, Office of Educational Research and Improvement, US Department of Education. Retrieved March 25, 2017 from www. ed.gov/offices/OERI/SAI Strajn, D. (2014). The PISA syndrome: Can we imagine education without comparative testing? Solsko Polje, 25(5/6), 13. Suter, L. E. (2011). Guiding international comparative studies 1981 to 2009: Reflections of a US program director. In C. Papanastasiou, T. Plomp, & E. C. Papanastasiou (Eds.), IEA 1958–2008: 50 Years of Experiences and Memories (Vol. 1). Nicosia, Cyprus: International Association for the Evaluation of Educational Achievement (IEA). Suter, L. E. (2017). How international studies contributed to educational theory and methods through measurement of opportunity to learn mathematics. Research in Comparative & International Education, 12(2), 174–197. Thorndike, Robert L. (1973). Reading comprehension education in fifteen countries: An empirical study. Stockholm: Almqvist & Wiksell. Thorndike, Robert L. (1962). International comparison of the achievement of 13-year-olds. In Foshay, A. F, Thorndike, R. L., Hotyat F. et  al. (Eds.), Educational achievements of thirteen-year-olds in twelve countries. Hamburg: UNESCO Institute for Education. Retrieved from http://unesdoc.unesco.org/ images/0013/001314/131437eo.pdf Torney, J., Oppenheim, A., & Farnen, R. (1975). Civic education in ten countries: An empirical study. Stockholm: Almqvist & Wiksell. Travers, K. J. et al. (1988). Introduction to the second international mathematics study. In D. F. Robitaille & R. A. Garden (Eds.), The IEA study of mathematics: Volume II: Contexts and outcomes of school mathematics (pp. 1–16). London: Pergamon Press. Travers, K. J., & Westbury, I. (Eds.) (1989). The IEA study of mathematics II: The analysis of mathematics curricula. Oxford: Pergamon. UNESCO (2018). Global Alliance to Monitor Learning (GAML). New York: UNESCO. Retrieved from http://uis.unesco.org/en/ topic/learning-outcomes van de Vijver, Fons & Leung, Kwok (1997). Methods and data analysis for cross-cultural research. London: Sage. Walker, David A. (1976). The IEA six subject survey. An empirical study of education in

GROWTH AND DEVELOPMENT OF LARGE-SCALE INTERNATIONAL COMPARATIVE STUDIES 223

twenty-one countries. Stockholm: Almqvist & Wiksell. Wolf, R. M. (1998). Validity issues in international assessments. International Journal of Educational Research, 29(6), 491–501. http:// doi.org/10.1016/S0883-0355(98)00044-5 Yemini, M., & Gordon, N. (2015). Media representations of national and international

standardized testing in the Israeli education system. Discourse: Studies in the Cultural Politics of Education, 1–15. https://doi.org/ 10.1080/01596306.2015.1105786 Zarebski, T. (2009). Toulmin’s model of argument and the logic of scientific discovery. Studies in Logic, Grammar and Rhetoric, 16(29), 267–283.

12 The Meaning of Motivation to Learn in Cross-National Comparisons: A Review of Recent International Research on Ability, Self-Concept, and Interest M i n g - Te W a n g 1 , J e s s i c a L . D e g o l , a n d J i e s i G u o

INDIVIDUAL AND GENDER DIFFERENCES IN ACHIEVEMENT MOTIVATION Academic learning is an incremental process, in which expertise in a particular domain is gained by integrating new knowledge and skills upon previously learned material. However, successful and productive learning is more likely to occur if skill acquisition is preceded or accompanied by high learning motivation. When individuals feel both capable and interested in a particular subject domain, they are more likely to stay engaged and excel in that area (Wang & Degol, 2016). Although motivation has been defined in a number of ways, our chapter focuses on the competence beliefs, interests, and value that

one attaches to relevant subject domains. These competence beliefs and task values are mainly drawn from expectancy-value theory and have played a key role in achievementrelated success (Eccles, 1983). Expectancy-value theory (EVT) is a longstanding motivational theory that describes why differences in academic performance, subject course selection, and major and career choice emerge throughout adolescence and early adulthood (Eccles, 1983). Theoretical and empirical research has converged to demonstrate that academic and career choices are partly explained by individual differences in academic motivation. EVT proposes that individuals do not simply select courses, majors, or careers in areas that they are highly skilled. They also

THE MEANING OF MOTIVATION TO LEARN IN CROSS-NATIONAL COMPARISONS

choose areas that they are highly motivated to pursue. In other words, individuals must not only have the skills to pursue a subject domain, but also have the motivational drive to pursue and persist in that domain. These domain-specific motivational beliefs are comprised of two main constructs: expectancy beliefs (e.g., ability self-concept) and subjective task values. Expectancy beliefs refer to an individual’s expectations of future success in a particular field. If individuals believe that they have a high likelihood of successfully completing a task in a particular subject, then they have high expectancy beliefs in that domain. Subjective task values, on the other hand, refer to the value that individuals place on a particular subject domain. These task values are comprised of attainment value, utility value, intrinsic value, and cost. Attainment value refers to the extent an individual feels a domain aligns with their core personal goals. Utility value represents a person’s perceived usefulness of the subject in achieving their future goals. Intrinsic value refers to an individual’s perceived interest or enjoyment in engaging in a particular subject, which is related to the construct of intrinsic motivation in which students are more likely to invest time and energy in an activity that they feel is personally meaningful and relevant to their lives (Frank & Scharff, 2013). Cost represents how burdensome it may be to pursue the field (e.g., time, effort, emotional well-being). Taken together, expectancy beliefs and task values are distinct yet closely related constructs that define the important role of domain-specific motivation in determining personal investment in related subject domains. In Western cultures, research has supported the EVT framework by demonstrating the importance of motivational beliefs in explaining both individual and gender differences in academic choices (Wang & Degol, 2014). For example, a study utilizing a US sample found cross-lagged associations between expectancy beliefs and math

225

performance as well as between task values and math performance measured in both 7th and 10th grade (Wang, 2012). By 12th grade, adolescents who had higher expectancy beliefs and task values in math in 10th grade were more likely to have enrolled in a greater number of math courses and to have had higher career aspirations in math-intensive fields, regardless of their past performance in math (also see Guo, Parker, Marsh, & Morin, 2015a, for similar findings based on an Australian sample). A US sample also found that expectancy beliefs and task values in literacy measured in 4th grade predicted pleasure reading in 10th grade, and course choices and career aspirations in 12th grade (Durik, Vida, & Eccles, 2006). Additional research has found similar associations, although some work suggests that expectancy beliefs are more consistently linked to academic performance, while task values are more consistently linked to future major or career goals (Degol, Wang, Zhang, & Allerton, 2018; Eccles, 2007; Guo et al., 2016, 2017; Meece, Glienke, & Burg, 2006; Wang, Degol, & Ye, 2015). For example, in a sample of US youth, not only was higher math task value in 12th grade linked to a greater likelihood of choosing a STEM career as an adult, but math task value played a unique role in women’s STEM career decisions (Wang et al., 2015). Therefore, task values are not only important predictors of math outcomes, but they also seem to be important determinants for future career choices, particularly for females with a high interest in math. The theoretical framework of EVT has also been applied to East Asian cultures to identify if the motivational patterns in these cultures mirror findings from Western populations (e.g., Bong, Cho, Ahn, & Kim, 2012; Chen, Hwang, Yeh, & Lin, 2012; Chiu & Xihua, 2008; Guo, Marsh, Parker, Morin, & Yeung, 2015b; Jack, Lin, & Yore, 2014; Kadir, Yeung, & Diallo, 2017; Lee, 2014; Lee, Bong, & Kim, 2014). In general, empirical EVT studies across Western and East Asian countries follow a similar

226

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

pattern: expectancy is more predictive of academic achievement, while task value is a better predictor of achievement-related choices. For example, one study using a sample of Taiwanese adolescents found cross-lagged reciprocal associations between math selfconcept and math achievement and between Chinese self-concept and Chinese achievement (Chen et al., 2012). Other research has found that for Korean students, mathematics self-efficacy predicted middle school students’ achievement indirectly through math task value and math anxiety (Bong et  al., 2012). Similarly, interest in mathematics also predicted math achievement indirectly through self-regulation among a Korean sample (Lee, Lee, & Bong, 2014). Gender differences in motivation have also been examined demonstrating that gender stereotypes regarding male/female performance in East Asian cultures mirrors that of Western samples. For example, one study found an interaction between gender and motivational beliefs such that males in Hong Kong, on average, had higher math self-concept than females, which led to their higher achievement (Guo et  al., 2015b). However, when males and females had equivalent levels of both self-concept and intrinsic value, females actually had higher math achievement than males. These findings demonstrate that while motivational beliefs are important for all individuals, they may be especially important in promoting positive math outcomes for females. While these findings parallel Western data, few studies to our knowledge have directly compared the EVT prediction between Western and East Asian countries.

INDIVIDUAL CHANGES IN ABILITY SELF-CONCEPT AND INTEREST FROM PRIMARY TO SECONDARY SCHOOL A number of studies have examined the development of ability self-concepts and

interest task values in students across several years (Wang & Degol, 2014, 2016). The main pretext for this research is that if motivation is closely linked to achievement, then an improved understanding of changes in self-concept and task values will help predict which students are at greater risk for academic failure, and which students are on the right track toward academic success. Much of this research has demonstrated that motivation in math and language arts learning declines over time, with stark decreases becoming especially pronounced as children transition from elementary to middle and secondary school environments (Fredricks & Eccles, 2002; Jacobs, Lanza, Osgood, Eccles, & Wigfield, 2002). Although these declines have been widely documented, there are two main factors that potentially explain these developmental changes. The declines experienced by children are likely to be caused by maturational changes in brain development (Stipek & MacIver, 1989) (e.g., young children are not as adept as older children or adults at assessing their strengths and weaknesses across various domains) (Eccles, 1999; Eccles, Wigfield, & Schiefele, 1998; Wigfield et  al., 1997), and environmental changes (e.g., elementary school settings compared with secondary school settings are frequently characterized by smaller class sizes and less structured and rigid classroom settings and routines) (Eccles, 1999; Eccles, Wigfield, Harold, & Blumenfeld, 1993). Previous studies have consistently reported that academic motivation in general declines for most students as they advance through their elementary and secondary school years; these findings hold across various cultures and for both math and reading self-concept and subjective interest (Western countries [the US, Canada, Australia, Germany], East Asian [e.g., China (Shi, Wang, Wang, Zuo, Liu, & Maehr, 2001; Wang & Pomerantz, 2009); Korea (Lee & Kim, 2014); Singapore (Yeung, Lau, & Nie, 2011); Hong Kong (King & McInerney, 2014)]). For example, in Australia, students’ self-perceptions of

THE MEANING OF MOTIVATION TO LEARN IN CROSS-NATIONAL COMPARISONS

ability and subjective task values (i.e., interest and utility) in math and English declined across grades 7 through 11 (Watt, 2004). Similarly, in a US sample, competence beliefs and task values (i.e., utility and interest) in math and language arts declined on average for students from 1st through 12th grade (Jacobs et al., 2002). It is noteworthy that few cross-cultural studies have directly compared motivational changes between Western and East Asian countries. Wang and Pomerantz (2009) present a notable exception, in which they examined motivational trajectories during early adolescence in the US and China. They found that students’ motivational beliefs deteriorated over the 7th and 8th grades in both the US and China. American children also valued academics less, with declines in their motivational behavior as well. On the contrary, Chinese children continued to value academics, sustaining their motivational behavior. In both countries, children’s motivational beliefs and behavior predicted their grades over time, suggesting that declines in motivation can have deleterious effects on academic performance. Interestingly, research also seems to suggest that students experience a faster decline of motivation in math than in English over time, particularly in East Asian countries (e.g., Gottfried, Marcoulides, Gottfried, Oliver, & Guerin, 2007; Jacobs et al., 2002; Lee & Kim, 2014; Yeung et  al., 2011). Based on a Hong Kong sample, King and McInerney (2014) found that youth’s selfconcept in English slightly increased during junior high school. This finding was also replicated in a Korean sample in which math intrinsic motivation declined continuously from 7th through 11th grade, while English self-concept decreased during the middle school years but increased during high school (Lee & Kim, 2014). There are two possible reasons that explain these differences in math and English motivation. First, English is not the first language of East Asian students, and it may be the case that

227

as students progress through higher educational levels, they become more confident in their English language abilities. In addition, students also learn English as a second language for leisure goals such as traveling abroad and communicating with people from different cultures (Gardner, 2010). Due to the additional instrumental value of learning English for leisure, English intrinsic motivation could decrease less compared to math intrinsic motivation. Overall, self-concept and interest tend to decline from elementary to middle school and then stabilize in the high school years, although the specific trends of development are somewhat different across countries and cultures. While math motivation seems to decline across Western and East Asian cultures, English language learning declines less rapidly in Eastern countries. Additionally, self-concepts and task values mutually reinforce each other and follow similar developmental trends over time, becoming more closely linked as children age. However, most of these studies reflect average changes in motivation. Studies examining heterogeneity in Western samples have revealed more nuanced changes in math, reading, and science motivation than steady declines (Jacobs et  al., 2002). More research is needed to examine these developmental changes across Western and East Asian samples to determine the extent to which heterogeneity in change is universal or culture-specific.

THE INTERACTION BETWEEN EXPECTANCY AND VALUE While self-concept and task values are clearly linked to achievement and career choices, recent research addresses the importance of examining interaction effects between selfconcepts and task values. Several researchers have argued that while most non-­experimental studies have examined the additive effects of

228

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

self-concept and task value on various learning outcomes (i.e., self-concepts and task values modeled as separate and independent predictors of achievement outcomes in regression models), expectancies and values actually have multiplicative effects on learning outcomes (e.g., Guo et al., 2015a, 2015b; Trautwein, Marsh, Nagengast, Lüdtke, Nagy, & Jonkmann, 2012). While we typically expect to see individuals with higher expectancies or self-concepts to have higher task values and vice versa, some individuals do report high expectancies but low value in a subject, while others report low expectancies but high value. Comparing these subgroups reveals significant findings about the lack of high value being able to compensate for low expectancies and for the lack of high expectancies being able to compensate for low value, depending upon the outcome in question. Interestingly, different patterns emerge when examining the interaction effect between self-concept and task value in Western and East Asian countries. PISA2012 examined the relationship between students’ math self-concept/interest and performance across countries. Results indicated that the correlation between math self-concept/interest and performance is stronger in East Asian countries (e.g., Korea, Chinese Taipei, Japan, Hong Kong-China) than in Western countries (OECD, 2013; also see Chiu & Klassen, 2010). In addition, researchers found that the interaction of students’ self-concept and values predicted a variety of student outcomes in different academic domains based on the Western context (i.e., Australia, Germany) (Guo et  al., 2015a, 2016, 2017; also see Trauweint et  al., 2011; Lauermann, Tsai, & Eccles, 2017). For example, Guo et  al. (2015a) found that high school students were more likely to take an advanced math course, have higher university entry rankings, and enter STEM fields of study when their expectancies and values were both high in math, indicating a synergistic interaction between self-concepts and task values. More

specifically, when it comes to achievement, ability self-concepts or expectancies seem to matter more than task value. In a German sample, for example, students with low expectancies and high value in both math and English had the lowest achievement in their respective domains, whereas students with both high expectancies and value in math and English had the highest achievement (Trautwein et  al., 2012). When it comes to career choices, interest seems to matter more than self-concepts, at least in math (Guo et al., 2015a; Lauermann et al., 2017). For example, in the Lauermann et  al. (2017) study, math self-concept could not compensate for low levels of intrinsic interest in determining the likelihood of attaining a math-related career. This supports the relative inability of expectancy beliefs and task values to compensate for one another when determining learning outcomes. In Asian contexts, however, researchers (Guo et al., 2015b; Lee, Bong, & Kim, 2014; Lee, Lee, & Bong, 2014) found that the value students attach to some achievement activities predicts their achievement behavior more strongly when their self-concepts for that activity are low, indicating a compensatory interaction. For instance, based on three cohorts of Hong Kong’s TIMSS dataset, Guo et al. (2015b) found that utility value predicted students’ scores on an international standardized math exam and their intentions to pursue advanced education more strongly when their self-concepts were low. Lee and colleagues (2014) also found that Korean middle school students with high intrinsic or utility value for learning English were more likely to procrastinate and to cheat on their English test when they had low self-efficacy. This indicates that high self-concept can compensate for low utility value when it comes to achievement, but when it comes to educational aspirations, self-concept cannot fully compensate for low utility value. In summary, most studies examining motivational beliefs as predictors of achievement,

THE MEANING OF MOTIVATION TO LEARN IN CROSS-NATIONAL COMPARISONS

educational choices, and career decisions focus on the additive role of expectancy beliefs and task values. However, studies that have examined the multiplicative effects of expectancy beliefs and task values have shown that interactions appear to exist between these two constructs. In both Western and East Asian cultures, selfconcept seems to matter more in determining achievement, while task value matters more in determining educational or career choices. However, in Western cultures, high expectancy beliefs or high task values cannot fully compensate for low levels in the accompanying domain, with high levels in both often demonstrating the most positive outcomes. In East Asian cultures, however, evidence suggests a slight compensatory role whereby high self-concept can compensate for low utility value and vice versa, depending on the outcome in question. More research is needed to fully understand how these cultural differences in the multiplicative function of self-concepts and task values manifest.

PARADOX (ACHIEVEMENT–INTEREST/ SELF-CONCEPT) Researchers who study ability perceptions have long noted a paradox in the development of self-concepts and achievement across various domains. According to the Internal/External Frame of Reference Model (IE; Marsh, 2007), the higher an individual performs in a particular subject area, the more competent the individual will feel in that domain. Higher achievement in math, for example, should lead to higher self-­ concept in math over time. These positive correlations are apparent across both math and verbal domains (e.g., higher math ability increases math self-concept, while higher verbal ability increases verbal self-concept). However, a paradox exists in the development of math and verbal self-concepts among

229

adolescents. Despite moderate to strong correlations among math and verbal achievement, statistical models consistently detect near zero-correlations between math and verbal self-concepts (Marsh, 2007). While these findings initially appear to be counterintuitive, verbal and math skills are frequently viewed as contrasting skill sets, in which the prevailing belief is that an individual can be highly skilled in one area but not in the other. This phenomenon is represented by the contrast effects documented in IE research, in which higher math ability predicts higher math self-concept, but also predicts lower verbal self-concept, after controlling for verbal ability. In other words, if we compare two students who have the same level of verbal achievement but differing levels of math achievement, the student with higher math achievement should have lower verbal self-concept than the student with lower math achievement. Likewise, higher verbal ability predicts higher verbal selfconcept and lower math self-concept, after controlling for math ability. The cross-cultural generalizability of the IE model was first tested by Marsh and Hau (2004). They used PISA 2000 data and found that within-domain comparison processes (mean effect sizes [ES] = .51/.47 from mathematic/verbal achievement to mathematic/ verbal self-concept), and cross-domain comparison processes (−.22/−.21 from verbal/ mathematic achievement to mathematic/ verbal self-concept) held invariant across 26 countries (103,558 15-year-old students), providing strong cross-cultural evidence for the IE paradox. A subsequent meta-analysis based on 69 datasets with 125,308 students reported that math and verbal achievements were highly correlated (r = .67), but that math and verbal self-concepts were nearly uncorrelated (r = .10). In further corroboration, the paths from math achievement to math selfconcept were positive (β = .61), but the paths from verbal achievement to math self-concept were negative (β = −.27). Therefore, when self-evaluating their own competence in a

230

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

particular domain (e.g., math self-­concept), students use their performance in both the corresponding domain (math ability) and in the contrasting domain (verbal ability) to guide the development of their self-concept. In addition to using internal comparison processes (e.g., an individual comparing and contrasting his/her skills across multiple domains) to develop domain-specific selfconcepts, students also engage in external social comparison processes to determine their own level of skill in a domain. In other words, students will also judge their competence in a domain by using their classmates or peers as a reference group to rate their own relative performance. For example, if we compare two students with equal math achievement, the student who attends a school with a high proportion of math achievers will rate their math self-concept lower than the student attending a school with a lower proportion of math achievers. This phenomenon is known as the ‘Big Fish Little Pond Effect’ (BFLPE) (Marsh, 2007). According to BFLPE theory, students compare their own academic achievement with the achievements of their classmates and use this social comparison as a basis of reference against which they form their own academic self-concepts. BFLPE indicates that students attending low-achieving schools judge themselves more positively than comparable students (equal in ability) attending high-achieving schools. Therefore, BFLPE describes the tendency for students to base self-evaluations on local comparison groups that they can use as a reference point. The BFLPE in conjunction with the IE describes an important paradox in the development of self-concept. In most cases, high math ability will predict high math self-concept. However, surrounding oneself with high ability students in math will predict lower math self-concept, regardless of individual ability. In other words, high achievement alone is not sufficient to produce high levels of self-concept in its respective domain. Students must also feel competent relative to everyone else around

them. Recent research has found that similar frames of references operate for the formation of academic interest, given that students tend to develop interests and value in areas where they feel competent (Guo et al., 2015a, 2017; Schurtz, Pfost, Nagengast, & Artelt, 2014). Multiple comparative studies have provided strong support for cross-­cultural generalizability of BFLPE predictions in relation to math and English self-concepts and interest (e.g., Chiu, 2012; Marsh & Hau, 2004; Marsh et al., 2014, 2015).

NATIONAL-LEVEL ‘BIG FISH LITTLE POND EFFECT’ EFFECT While much of the research on the BFLPE has focused on social comparisons within schools, cross-cultural researchers (Min, Cortina, & Miller, 2016; Peng, Nisbett, & Wong, 1997; Van de gaer, Grisay, Schultz, & Gebhardt, 2012) argue that adolescents may use peers within their respective nation, region, or culture as a frame of reference to contrast academic performance. For example, Min et al. (2016) analyzed three TIMSS datasets from 2003 to 2011 and found that national-level achievement has a negative effect on individual’s self-concept and interest while controlling for both individual- and school-level achievement (also see Van de gaer et  al., 2012). In other words, national achievement levels accounted for individual differences in self-concept and interest even after individual differences in achievement and individual differences in school-wide achievement were controlled for. Clearly, on some level, students are aware of the expectations and standards of performance in their country or culture, which informs judgements of their own abilities by comparing against those standards. However, the TIMSS study (Min et  al., 2016) neglected to use anchoring vignettes to adjust for cross-­ cultural differences in perceptions of response scales (He & Van de Vijver, 2016;

THE MEANING OF MOTIVATION TO LEARN IN CROSS-NATIONAL COMPARISONS

Vonkova, Zamarro, Deberg, & Hitt, 2015), which was a problem considering that students in East Asian cultures are more likely to favor the middle-point of response scales, while students in Western cultures are more likely to favor the extreme points of response scales. Therefore, interpretation of these findings is limited due to the absence of a direct comparison between both cultures. Based on both international TIMSS and PISA assessments, researchers also found evidence spanning decades that academic self-concept/interest and achievement were positively correlated at the level of individual students within each country (e.g., a student with high achievement is more likely to also have high self-concept or interest), but negatively correlated at the country level (e.g., a student attending school in a high-achieving country has lower selfconcept or interest) (e.g., Min et  al., 2016; Mullis, Martin, Foy, & Arora, 2012; Mullis, Martin, & Loveless, 2016; Van de gaer et al., 2012). This cross-cultural paradox seems to exist across school subjects and academic domains (e.g., math, sciences, reading, etc.), grades, and cohorts (e.g., Chang, McBrideChange, Stewart, & Au, 2003; Kennedy & Trong, 2006; Van de gaer et  al., 2012). For example, the TIMSS 2015 assessment (Mullis et  al., 2016) reported that highscoring countries have large percentages of students who do not believe they usually do well in math and do not enjoy learning math, whereas students in low-scoring countries express more confidence and enjoyment with math. More specifically, the five East Asian countries – Singapore, Korea, Hong Kong SAR, Chinese Taipei, and Japan – register high TIMSS achievement along with low levels of confidence and enjoyment. In contrast, England and the United States did not score as high as these East Asian nations, but they exhibited much lower percentages of students lacking confidence and disliking math – England (11%) and the United States (13%) (Mullis et  al., 2016; also see Mullis et al., 2012).

231

POTENTIAL EXPLANATION FOR THE PARADOXICAL RELATIONSHIP Culture Difference While the BFLPE suggests that students may make comparisons with their peers at the classroom or school level, it is unlikely that students are comparing their performance to everyone else in the entire nation when they rate their own abilities. In contrast, it is highly likely that students are evaluating their performance against the expectations and standards of their cultures. One example is the relative individualistic/collectivistic focus of Western and Eastern cultures. Western cultures value individual expression and autonomy, while East Asian cultures value conformity and supporting group needs above individual interests. As such, East Asian cultures value modesty more than Western cultures, which may lead students to seek upward comparisons resulting in lower self-concepts (modesty bias). In contrast, individuals in individualistic societies are more likely to choose downward comparisons, resulting in higher self-concepts (selfenhancement bias) (Heine, Lehman, Peng, & Greenholtz, 2002; Peng et al., 1997). The prevalence of Confucian traditions in East Asia also results in cross-cultural differences in response styles on Likert-type scales. Previous research has demonstrated that compared to North Americans, East Asians tend to avoid the extreme points on the Likert-type scale when responding to statements reflecting emotions (Heine et al., 2002; Peng et  al., 1997). More recently, researchers tried to detect the cross-cultural associations between academic motivation and achievement by controlling for the modesty bias prevalent in East Asian cultures. For example, the TIMSS team combined two negative categories (‘disagree a little’ and ‘disagree a lot’ with the statement ‘I enjoy learning mathematics’) when examining the percentage of 4th and 8th grade students who reported disliking mathematics (Mullis et al.,

232

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

2016). One benefit from collapsing two questionnaire categories differing in degree but not in direction is the dampening of potential cultural bias in response styles. Nevertheless, the results still indicated a national-level paradoxical relationship between self-­concept/ interest and achievement (Mullis et  al., 2016). However, some researchers suggest that the problems with response scale differences between East Asian and Western cultures confound interpretation of crosscultural comparison studies (He & Van de Vijver, 2016; Vonkova et  al., 2015). Such research suggests the best method of making cross-cultural comparisons is to ask students to evaluate a hypothetical scenario and directly compare cultural differences in responses. After adjusting for these differences, a more accurate evaluation of motivational differences can be made. Without these adjustments, however, differences between cultures may be overestimated. Another potential explanation for the BFLPE paradox could be cross-cultural differences in educational structure. In North America and the United Kingdom, secondary school is compulsory, meaning that all students must attend regardless of achievement or educational goals. In contrast, many East Asian cultures have compulsory education for approximately nine years, and then examinations are used to place students in different tracks for senior secondary school. While many American secondary public schools, for example, also have different tracks (e.g., technical, academic), these tracks are often self-selected and students in different tracks are not necessarily segregated into different school systems (e.g., students in non-college tracks may be heavily mixed into many of the same classes as students in college tracks). As a result, many American students may perceive greater competence in subjects, as they are likely to be comparing themselves against less demanding course expectations and against students with more diverse skills and abilities compared to students in East Asian cultures. Indeed, research examining

the effects of tracking on the relationship between self-esteem and achievement confirms this speculation. For example, the correlation between self-esteem and achievement is lower in European countries where academic tracking is used than in countries where tracking is not used (Van Houtte, Demanet, & Stevens, 2012). While this does not exemplify a cross-cultural difference between Western and East Asian cultures, it does suggest that the reference group that students compare themselves against is an important consideration in examining both between- and within-cultural differences in STEM motivation.

Competitive Examinations System An additional factor explaining the BFLPE paradox may be attributed to cultural differences in educational standards and rigor. Higher achieving countries tend to have more demanding curricula, higher academic standards, more pressure to achieve, and higher expectations of teachers and parents, which may result in a less enjoyable learning experience in school, lowering the levels of absolute self-concept or enjoyment (Eccles & Roeser, 2009). Conversely, students in countries with relatively low academic standards and expectations may actually develop more positive perceptions of their own abilities because they are able to meet teachers’ tasks and demands even though their average achievement is rather low according to international (‘real’ or ‘absolute’) standards.

GENDER DIFFERENCES IN ABILITY SELF-CONCEPT AND INTEREST FROM PRIMARY TO SECONDARY SCHOOL Achievement differences among males and females have been examined across several cultures using TIMSS and PISA assessment data. Five East Asian countries (Korea,

THE MEANING OF MOTIVATION TO LEARN IN CROSS-NATIONAL COMPARISONS

Hong Kong SAR, Chinese Taipei, Japan, and Singapore) were among the highest achieving countries in math in both TIMSS and PISA assessments (Mullis et al., 2012, 2016; OECD, 2013, 2016). However, the TIMSS assessments showed that gender differences in math achievement were generally small or negligible in both 4th and 8th grades in these highest achieving countries (Mullis et  al., 2016). A similar pattern of negligible performance was found for 15-year-old boys and girls using the PISA assessment (OECD, 2013, 2016). In fact, 20-year trends of TIMSS countries that participated in both 1995 and 2015 (including several East Asian and Anglo countries) indicate that the gender gap became even smaller over time, with a decrease in the difference favoring boys and the emergence of a difference favoring girls (Mullis et al., 2016). In mathematics specifically, boys outperformed girls in just six countries, with an average achievement difference of nine points. On the contrary, 26 countries (two-thirds of the countries) experienced no gender difference in achievement. In the most noticeable development, girls outperformed boys in seven countries, with an average difference of 17 points. For example, by 2015, Hong Kong, Japan, and South Korea no longer had significant gender differences in math performance. Singapore, which had a gender disparity in mathematics achievement in 1995, was the only 20-year trend country where girls outperformed boys – by a difference of nine points. Research examining self-concepts and interest in math and English have found clear gender differences across both Western and Eastern cultures. Despite gender similarities in achievement, boys reported more positive math self-concept and interest than girls in Western cultures (Else-Quest, Hyde, & Linn, 2010; Hyde, 2014). Similarly, in East Asian cultures, such as Hong Kong and Korea, girls have higher intrinsic motivation and self-­concept in English, while boys have higher intrinsic motivation and selfconcept in math (Guo et  al., 2015b; King

233

& McInerney, 2014; Lee & Kim, 2014). An enormous body of studies has demonstrated that girls have higher English self-concept and interest in both Western and East Asian countries (OECD, 2007; also see Watt, 2016, for a review). In Western contexts, some research shows that the gender gap in math motivation (favoring boys) decreases as students progress through grades 1 to 12 because boys’ motivation declines more rapidly than girls’ during this time (Fredricks & Eccles, 2002; Jacobs et al., 2002). Other studies tracking development across middle and high school showed that the gender difference in math and English motivation remains constant across time (Nagy et  al., 2010; Petersen & Hyde, 2017; Watt, 2004). In East Asian contexts, Lee and Kim (2014) found that Korean boys seemed to experience a faster decrease in math intrinsic motivation than girls, leading to a smaller gender gap from grades 7 to 11. Similarly, King and McInerney (2014) also found that Hong Kong girls experienced a less steep decline in math self-concept compared to boys from grades 7 to 9. These findings suggest that gender differences in math motivation are more likely to occur in younger grades (Fredricks & Eccles, 2002; Jacobs et al., 2002). With regard to English motivation, Jacobs et  al. (2002) found that girls had higher English values than boys in 1st grade; but because girls’ values declined more rapidly than did boys, the gap narrowed by late elementary school, and then increased again during high school as girls’ values for English increased and boys’ values leveled off. Similarly, Lee and Kim (2014) found that Korean girls’ intrinsic motivation in English decreased more slowly during junior high school and increased at a faster rate during senior high school when compared to boys (see Yeung et al., 2011, for a similar pattern based on Singapore adolescents). Overall, these studies suggest that the gender gap in English motivation became smaller over time. Recent work (Else-Quest et  al., 2010; Stoet, Bailey, Moore, & Geary, 2016) has

234

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

begun to unpack the relationships between national-level gender equality indices and math motivation. These studies found that global measures of gender equality (i.e., the Gender Empowerment Measure [GEM] and the Gender Gap Index [GGI]) were significantly associated with gender gaps in math motivation based on the TIMSS 2003 and PISA 2003 datasets. Results showed that in nations with greater gender equity, gender differences in self-concept, self-efficacy, interest, and utility value were larger (favoring males) than in nations with less gender equity (Else-Quest et  al., 2010). Similarly, Stoet et  al. (2016) showed that countries with higher levels of gender equality, as measured by the GGI, exhibited larger gender differences in math anxiety (females had higher math anxiety) in both the PISA 2003 and PISA 2012 datasets. However, it seems that this pattern is highly driven by Muslim countries (e.g., Jordan, UAE, Qatar, Turkey; see OECD, 2013) where girls tend to have slightly higher or similar math motivation as boys, accompanied with the lowest measures of nation-level gender equity. Indeed, there are no significant differences in gender gaps in math self-concept and interest between Western and East Asian countries, although East Asian societies have lower levels of gender egalitarianism than Western societies (Mullis et al., 2016; OECD, 2013; Stoet et al., 2016). For example, the latest TIMSS 2015 report (Mullis et al., 2016) showed that 8th grade boys and girls had similar levels of math self-concept and interest in both East Asia (e.g., Japan, Korea, Hong Kong, Chinese Taipei) and Western countries (e.g., the US, England, Australia, New Zealand). To summarize, ability self-concepts do not develop in a vacuum, but rather are dependent upon multiple factors. First, as described by the IE model, math and verbal self-concepts are formed by comparing and contrasting individual ability across these two domains. Not only do high verbal ability and high math ability predict high self-concept in their respective domains, but they also predict

low self-concept in their opposing domains. Second, as described by the BFLPE model, self-concepts are also formed by contrasting individual performance against the performance of students at the school level and at the national/cultural level. When students attend a school or country in which the standards are high, the curriculum is rigorous, and a large percentage of students are highachieving, students are more likely to have lower self-concept regardless of their ability status. This BFLPE phenomenon details the cultural differences we see across Western and East Asian cultures, in which Western students residing in nations that rank lower on international assessments of math and science tests are more likely to report higher math self-concepts than students matriculating in East Asian countries that rank higher on international standards. Internal (performance within various domains) and external (performance of peers) comparisons are therefore used by individuals to determine the formation of their own ability self-­concepts. Both internal and external comparison processes also help individuals to distinguish domains in which they can specialize, and for which they could develop particular interests. An additional factor that shapes the development of ability self-concept and interest is gender. Research across both Western and East Asian cultures seems to suggest that males have greater motivational beliefs in math, while females have greater motivational beliefs in verbal domains. However, despite gendered patterns in motivational beliefs that are somewhat consistent across cultures, it is not necessarily the case that gender gaps in math performance favoring males are prevalent across both cultures. In the highest performing countries in East Asia, females outperform their Western counterparts, and demonstrate equal or superior performance to their male counterparts in East Asia. Future research, therefore, is needed to better understand the cultural factors that influence gender differences in math performance and motivational beliefs.

THE MEANING OF MOTIVATION TO LEARN IN CROSS-NATIONAL COMPARISONS

THE IMPLICATIONS FROM THE CULTURAL AND COUNTRY COMPARISON Motivation is an increasingly important determinant of academic learning, educational choices, and career decisions during adolescence and early adulthood. Recent intervention research suggests that motivation is malleable and thus can be successfully targeted and improved via training and instruction. Indeed, a growing body of research has successfully boosted motivation and persistence through psychosocial approaches that target student attitudes and mindsets about learning (e.g., Blackwell, Trzesniewski, & Dweck, 2007; Walton & Cohen, 2011). For example, Blackwell and colleagues (2007) showed through a randomized control trial that growth intelligence mindsets can be taught to 7th grade students via the repeated delivery of engaging messages and discussion (e.g., the brain forms new connections when it learns, intelligence grows similar to how muscles grow, fixed labels such as ‘smart’ and ‘dumb’ should be avoided). Students who took the growth mindset curriculum had better math grades following the intervention compared to students in the control condition. While motivational beliefs have been widely examined across Western populations, recent work has examined the universal application of EVT theories across diverse cultures. In particular, much can be gleaned from the existing crosscultural research on the similarities and differences in the development of self-concept and interest across Western and Eastern cultures. First, while it appears that declines in math and English motivational beliefs (i.e., ability self-concept and interest) and gender gaps favoring males in math and females in verbal ability occur across multiple cultures, we must keep in mind that local context matters. One cannot expect the results of a policy enacted in one country to be duplicated in another. All countries have their own unique challenges to combating declines in achievement

235

and motivation throughout adolescence. In particular, local/regional differences can occur in the quality and rigor of educational systems within a country. Prejudice and unequal access to resources experienced by particular groups, not just girls and women, can also impact motivation, demonstrating how the intersectionality of race, gender, and class can affect educational choices. In other words, the needs of individuals in a country are so varied and complex, a policy or practice that is successful in one will not necessarily translate well in another culture. Second, the level of analysis of motivation matters. Surveys may be measuring different perceptions of self-concept and interest at the individual, class, school, regional, and national levels. Policy and practice crafted in response to research findings at one level of analysis may not produce the same outcomes at another level of analysis. For example, the statement, ‘I am good at math,’ may elicit a different response on a Likert-type scale than the statement, ‘Compared to most other people, I am good at math.’ The second statement deliberately asks students to rate their competence against the competence of those around them, while the first does not identify the method through which students are supposed to weigh their abilities. According to the BFLPE model, students given the second statement may rate their self-concept lower or higher depending on the relative performance of their school-based peers (i.e., turning their comparisons outward). Students given the first statement, may simply rely on their past grades in math and also their performance in comparable and contrasting domains (i.e., turning their comparisons inward rather than outward). Therefore, special attention should be given to the wording of survey items that measure self-concept. In addition, in genderstratified countries, girls often have higher motivation in math, potentially because they are comparing their motivation to other girls in their society, while girls in egalitarian societies are comparing their performance to both boys and girls (Else-Quest et al., 2010).

236

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Analyzing attitudes and motivation in math and English would benefit from establishing the reference group that students use when judging their own abilities. Third, there may be differences in ratings of self-concept and interest due to variability in the meaning of ability beliefs and values across cultures. Van de Vijver and Leung (2001) discussed the importance of ensuring that constructs are equivalent across different cultural groups when doing cross-cultural research. Differences in perceptions of constructs can jeopardize the equivalence of data across cultural groups (see Van de gaer, Grisay, Schulz, & Gebhardt, 2012, for more discussion). This is particularly relevant when comparing self-concept across individualistic and collectivist cultures. In East Asian cultures, which tend to be more collectivist and focused on maintaining strong social bonds and group harmony, individuals may feel discouraged from rating themselves as highly competent in a subject area, viewing it as boastful or immodest. Western cultures, on the other hand, are more likely to encourage children to emphasize their skills and find ways to effectively stand out from the crowd. Other research suggests that these cultural differences may also factor into differences with task values. Wigfield, Tonks, and Klauda (2016), for example, speculated about ways that each of the task value components might differ across cultures. In collectivistic cultures, where high importance is placed on the group, usefulness to the group may play a large role in determining an individual’s utility value of a task. On the contrary, in Western cultures, the usefulness of a field in achieving personal goals is likely to be the perception of utility value. Similarly, in Western cultures, the cost of pursuing a field may be viewed in terms of individual costs (e.g., time, money, effort), while East Asian cultures may view a field as costly if pursuing it would prove less beneficial or useful to the group. Additional research is needed to determine if expectancies and values are truly perceived and ranked differently across both cultures.

CURRENT LIMITATIONS AND FUTURE DIRECTION While a number of studies have been conducted to better understand individual and gender differences in achievement and educational decision-making, there are still a number of limitations that need to be addressed. First, motivational researchers have mostly relied on Western-based measures to study student motivation. However, the issue of the cultural validity of these instruments has not always been carefully considered. Psychologists need to develop instruments that are culturally valid and sensitive (see King & McInerney, 2016, p. 293). For example, most Western studies on intrinsic interest assume that individuals pursue a field because it is intrinsically rewarding to them, they enjoy pursuing it, and they want to spend their personal time engaging with it. However, in East Asian cultures, the decision to pursue a field may be less intrinsically motivated, but rather be more group-­ oriented. In other words, the pursuit of a field may reflect less of a personal drive and more of a desire to pursue work that is meaningful or useful to the family. Therefore, the emphasis on motivation as it is currently defined may rely too heavily on Western assumptions of what makes a field or career valuable to pursue. In addition, the assumption that high extremes on Likert-type scales represent the most positive or desirable responses on measures of motivation largely conforms to Western norms and values. In East Asian cultures, modesty and humility are more valued traits, indicating that answers falling in the mid-range of response scales would be perceived as more adaptive and desirable. These cultural differences in response patterns need to be taken into account in all research examining cross-cultural differences in selfreports of motivation. Second, while EVT is well suited for cross-cultural investigations, very little cross-cultural work has been done (Wigfield

THE MEANING OF MOTIVATION TO LEARN IN CROSS-NATIONAL COMPARISONS

et  al., 2016). Empirical EVT studies based on Western cultures have dominated the literature on academic motivation. This creates a problem in that not only are East Asian cultures understudied but, when they are researched, Western assumptions about motivational drives are applied to these contexts. Perhaps as a next step, future research should conduct qualitative studies to determine if students’ definitions of academic self-­ concept and task value differ across cultures. For example, researchers could ask East Asian and Western students if a person who self-identifies as highly skilled in math is a good thing or if choosing to pursue a career due to high personal interest is a priority in career decision-making. A third limitation concerns the lack of studies examining heterogeneity in motivational trajectories throughout school years. Most extant research has examined average trends in a sample, masking individual differences in developmental trends. Only a few studies have examined heterogeneity in math and English self-concept and interest in recent years, though all of these studies have utilized Western samples. For example, a study on literacy motivation identified seven groups with unique developmental trajectories. All seven groups experienced declines in literacy motivation from 1st through 12th grade, though the rates of declines differed across groups (Archambault, Eccles, & Vida, 2010). Similarly, using a US sample tracking changes in math motivation from 4th grade to college, three different groups were identified: a group with high self-concept; a group with slow declines in self-concept; and a group with fast declines in self-concept (Musu-Gillette, Wigfield, Harring, & Eccles, 2015). Similar groups were also identified for math interest. There is, to our knowledge, no study that has examined heterogeneity in motivational trajectories in East Asian cultures. Future research is needed to better understand which students in each culture are at increased risk for experiencing motivational declines throughout adolescence.

237

A fourth limitation to this research is the overemphasis on the use of cross-cultural work on motivation to establish universal patterns across cultures (etic), rather than focus on culturally-specific (emic) findings. In other words, the prevailing interest is typically to identify if a theory or guiding framework is universal (consistent across all groups, settings, and time periods), rather than to adapt or revise a theory or framework to fit a specific culture. Identifying unique challenges and strengths across specific cultures can advance and streamline efforts to specifically focus on the improvement of motivational outcomes for individual students. For example, a closer examination of cultural differences across Western and East Asian cultures may also explain gender differences in performance and motivation. In East Asian cultures, girls achieve in math at higher rates than girls in Western cultures. This could be due to a high value placed on math and science in East Asian cultures and a potential desire for girls to succeed in a field that the group (e.g., family, school, country) places high emphasis on. In Western cultures, on the contrary, math and science have been marketed as fields that are largely incompatible with women’s personal interests and goals, which tend to be more people-­ oriented and altruistic (Diekman, Weisgram, & Belanger, 2015). Therefore, applying a different cultural lens may help us to better understand why differences or similarities between cultures occur, without simply assessing if a theory or model is universal.

CONCLUSION The motivation to pursue a field or domain is an important driving force in adolescent career decision-making. High ability is not sufficient to determine persistence in a particular area. Adolescents must also feel competent in the domain and also be interested in pursuing it further. Motivation to succeed is

238

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

important across both math and English domains, but what is particularly noteworthy is the universal application of EVT. While most research has focused on validating EVT across Western cultures, there is emerging support for the prominence of ability selfconcept and task value in determining achievement and career decision across East Asian cultures. However, more systematic research is needed to understand how to best intervene across diverse nations and cultures to set individuals on an optimal path toward success. Understanding how cultural values, gender norms and expectations, and educational systems operate both across and within diverse cultures is necessary to design and implement effective and culturally-sensitive strategies. Understanding differences across cultures may also inform strategies in other cultures as well. For example, East Asian cultures are much more successful than Western cultures at promoting high female achievement and career orientation in math, despite attempts in Western nations to draw more females into math and science careers. Exploring strategies that have been successful in other nations may provide useful insights that may be adapted into other cultures. Despite cultural differences, the motivation to succeed is a key to success in any field.

Note 1  Jessica L. Degol and Jiesi Guo have equal intellectual contribution to this chapter so both share the second authorship.

REFERENCES Archambault, I., Eccles, J. S., & Vida, M. N. (2010). Ability self-concepts and subjective value in literacy: Joint trajectories from grades 1 through 12. Journal of Educational Psychology, 102, 804–816. Blackwell, L. S., Trzesniewski, K. H., & Dweck, C. S. (2007). Implicit theories of intelligence

predict achievement across an adolescent transition: A longitudinal study and an intervention. Child Development, 78, 246–263. Bong, M., Cho, C., Ahn, H. S., & Kim, H. J. (2012). Comparison of self-beliefs for predicting student motivation and achievement. The Journal of Educational Research, 105, 336–352. Chang, L., McBride-Chang, C., Stewart, S. M., & Au, E. (2003). Life satisfaction, self-­concept, and family relations in Chinese adolescents and children. International Journal of Behavioral Development, 27, 182–189. Chen, S.-K., Hwang, F.-M., Yeh, Y. C., & Lin, S. S. J. (2012). Cognitive ability, academic achievement and academic self-concept: Extending the internal/external frame of reference model. The British Journal of Educational Psychology, 82, 308–326. Chiu, M.-S. (2012). The internal/external frame of reference model, big-fish-little-pond effect, and combined model for mathematics and science. Journal of Educational Psychology, 104, 87–107. Chiu, M. M., & Klassen, R. M. (2010). Relations of mathematics self-concept and its calibration with mathematics achievement: Cultural differences among fifteen-year-olds in 34 countries. Learning and Instruction, 20, 2–17. Chiu, M. M., & Xihua, Z. (2008). Family and motivation effects on mathematics achievement: Analyses of students in 41 countries. Learning and Instruction, 18, 321–336. Degol, J. L., Wang, M.-T., Zhang, Y., & Allerton, J. (2018). Do growth mindsets in math benefit females? Identifying pathways between gender, mindset, and motivation. Journal of Youth and Adolescence, 47, 976–990. Diekman, A. B., Weisgram, E. S., & Belanger, A. L. (2015). New routes to recruiting and retaining women in STEM: Policy implications of a communal goal congruity perspective. Social Issues and Policy Review, 9, 52–88. Durik, A. M., Vida, M., & Eccles, J. S. (2006). Task values and ability beliefs as predictors of high school literacy choices: A developmental analysis. Journal of Educational Psychology, 98, 382–393. Eccles, J. S. (1983). Expectancies, values, and academic behaviors. In J. T. Spence (Ed.),

THE MEANING OF MOTIVATION TO LEARN IN CROSS-NATIONAL COMPARISONS

Achievement and achievement motives: Psychological and sociological approaches (pp. 75–146). San Francisco, CA: Freeman. Eccles, J. S. (1999). The development of children ages 6 to 14. The Future of Children: When School is Out, 9, 30–44. Eccles, J. S. (2007). Where are all the women? Gender differences in participation in physical science and engineering. In S. J. Ceci & W. M. Williams (Eds.), Why aren’t more women in science: Top researchers debate the evidence. Washington, DC: American Psychological Association. Eccles, J. S., & Roeser, R. W. (2009). Schools, academic motivation, and stage-­environment fit. In R. M. Lerner & L. Steinberg (Eds.), Handbook of adolescent psychology (3rd ed., pp. 404–434). Hoboken, NJ: John Wiley & Sons. Eccles, J. S., Wigfield, A., Harold, R. D., & Blumenfeld, P. (1993). Age and gender differences in children’s self- and task perceptions during elementary school. Child Development, 64, 830–847. doi:10.1111/j.14678624.1993.tb02946.x Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation to succeed. In N. Eisenberg (Ed.), Handbook of child psychology. Vol. 3 of Social, emotional, and personality development (pp. 1017–1095). Hoboken, NJ: John Wiley & Sons. Else-Quest, N. M., Hyde, J. S., & Linn, M. C. (2010). Cross-national patterns of gender differences in mathematics: A meta-analysis. Psychological Bulletin, 136, 103–127. Frank, T., & Scharff, L. F. V. (2013). Learning contracts in undergraduate courses: Impacts on student behaviors and academic performance. Journal of the Scholarship of Teaching and Learning, 13, 36–53. Fredricks, J. A., & Eccles, J. S. (2002). Children’s competence and value beliefs from childhood through adolescence: Growth trajectories in two male-sex-typed domains. Developmental Psychology, 38, 519–533. Gardner, R. C. (2010). Motivation and second language acquisition: The socio-educational model. New York: Peter Lang. Gottfried, A. E., Marcoulides, G. A., Gottfried, A. W., Oliver, P. H., & Guerin, D. W. (2007). Multivariate latent change modeling of developmental decline in academic intrinsic

239

math motivation and achievement: Childhood through adolescence. International Journal of Behavioral Development, 31, 317–327. Guo, J., Marsh, H. W., Parker, P. D., Morin, A.  J.  S., & Dicke, T. (2017). Extending expectancy-value theory predictions of ­ achievement and aspirations in science: Dimensional comparison processes and expectancy-by-value interactions. Learning and Instruction, 49, 81–91. Guo, J., Marsh, H. W., Parker, P. D., Morin, A. J. S., & Yeung, A. S. (2015b). Expectancyvalue in mathematics, gender and socioeconomic background as predictors of achievement and aspirations: A multi-cohort study. Learning and Individual Differences, 37, 161–168. Guo, J., Nagengast, B., Marsh, H. W., Kelava, A., Gaspard, H., Brandt, H., …Trautwein, U. (2016). Probing the unique contributions of self-concept, task values, and their interactions using multiple value facets and multiple academic outcomes. AERA Open, 2, 1–20. Guo, J., Parker, P. D., Marsh, H. W., & Morin, A. J. S. (2015a). Achievement, motivation, and educational choices: A longitudinal study of expectancy and value using a multiplicative perspective. Developmental Psychology, 51, 1163–1176. He, J., & Van de Vijver, F. J. (2016). The ­motivation–achievement paradox in international educational achievement tests: Toward a better understanding. In R. B. King & A. B. I. Bernardo (Eds.), The psychology of Asian learners (pp. 253–268). Singapore: Springer. Heine, S. J., Lehman, D. R., Peng, K., & Greenholtz, J. (2002). What’s wrong with crosscultural comparisons of subjective Likert scales? The reference-group effect. Journal of Personality and Social Psychology, 82, 903–918. Hyde, J. S. (2014). Gender similarities and differences. Annual Review of Psychology, 65, 373–398. Jack, B. M., Lin, H., & Yore, L. D. (2014). The synergistic effect of affective factors on student learning outcomes. Journal of Research in Science Teaching, 51, 1084–1101. Jacobs, J. E., Lanza, S., Osgood, D. W., Eccles, J. S., & Wigfield, A. (2002). Changes in children’s self-competence and values: Gender

240

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

and domain differences across grades one through twelve. Child Development, 73, 509–527. Kadir, M. S., Yeung, A. S., & Diallo, T. M. O. (2017). Simultaneous testing of four decades of academic self-concept models. Contemporary Educational Psychology, 51, 429–446. Kennedy, A., & Trong, K. (2006). A comparison of fourth-graders’ academic self-concept and attitudes toward reading, mathematics, and science in PIRLS and TIMSS countries. Proceedings of the IEA International Research Conference, 2, 49–60. King, R. B., & McInerney, D. M. (2014). Mapping changes in students’ English and math self-concepts: A latent growth model study. Educational Psychology, 34, 581–597. King, R. B., & McInerney, D. M. (2016). Culture and motivation: The road travelled and the way ahead. In K. R. Wentzel & D. B. Miele (Eds.), Handbook of motivation in school (2nd ed., pp. 275–299). New York, NY: Routledge. Lauermann, F., Tsai, Y.-M., & Eccles, J. S. (2017). Math-related career aspirations and choices within Eccles et  al.’s expectancy– value theory of achievement-related behaviors. Developmental Psychology, 53, 1540–1559. Lee, H., & Kim, Y. (2014). Korean adolescents’ longitudinal change of intrinsic motivation in learning English and mathematics during secondary school years: Focusing on gender difference and school characteristics. Learning and Individual Differences, 36, 131–139. Lee, J. (2014). Universal factors of student achievement in high-performing Eastern and Western countries. Journal of Educational Psychology, 106, 364–374. Lee, J., Bong, M., & Kim, S. (2014). Interaction between task values and self-efficacy on maladaptive achievement strategy use. Educational Psychology, 34, 538–560. Lee, W., Lee, M. J., & Bong, M. (2014). Testing interest and self-efficacy as predictors of academic self-regulation and achievement. Contemporary Educational Psychology, 39, 86–99. Marsh, H. W. (2007). Self-concept theory, measurement and research into practice:

The role of self-concept in educational psychology. Leicester, UK: British Psychological Society. Marsh, H. W., Abduljabbar, A. S., Morin, A. J. S., Parker, P. D., Abdelfattah, F., Nagengast, B., & Abu-Hilal, M. M. (2015). The big-fishlittle-pond effect: Generalizability of social comparison processes over two age cohorts from Western, Asian, and Middle Eastern Islamic countries. Journal of Educational Psychology, 107, 258–271. Marsh, H. W., Abduljabbar, A. S., Parker, P. D., Morin, A. J. S., Abdelfattah, F., & Nagengast, B. (2014). The big-fish-little-pond effect in mathematics: A cross-cultural comparison of U.S. and Saudi Arabian TIMSS responses. Journal of Cross-Cultural Psychology, 45, 777–804. Marsh, H. W., & Hau, K.-T. (2004). Explaining paradoxical relations between academic selfconcepts and achievements: Cross-cultural generalizability of the internal/external frame of reference predictions across 26 countries. Journal of Educational Psychology, 96, 56–67. Meece, J. L., Glienke, B. B., & Burg, S. (2006). Gender and motivation. Journal of School Psychology, 44, 351–373. Min, I., Cortina, K. S., & Miller, K. F. (2016). Modesty bias and the attitude–achievement paradox across nations: A reanalysis of TIMSS. Learning and Individual Differences, 51, 359–366. Mullis, I. V. S., Martin, M. O., Foy, P., & Arora, A. (2012). TIMSS 2011 international results in mathematics. Chestnut Hill, MA: TIMSS & PIRLS. Mullis, I. V. S., Martin, M. O., & Loveless, T. (2016). 20 years of TIMSS: International trends in mathematics and science achievement, curriculum, and instruction. Boston, MA: TIMSS & PIRLS International Study Center; Boston College. Musu-Gillette, L. E., Wigfield, A., Harring, J. R., & Eccles, J. S. (2015). Trajectories of change in students’ self-concepts of ability and values in math and college major choice. Educational Research and Evaluation, 21, 343–370. Nagy, G., Watt, H. M. G., Eccles, J. S., Trautwein, U., Lüdtke, O., & Baumert, J. (2010). The development of students’ mathematics

THE MEANING OF MOTIVATION TO LEARN IN CROSS-NATIONAL COMPARISONS

self-concept in relation to gender: Different countries, different trajectories? Journal of Research on Adolescence, 20, 482–506. OECD (2007). PISA 2006 science competencies for tomorrow’s world. Paris: OECD Publishing. OECD (2013). PISA 2012 results: Ready to learn – Students’ engagement, drive and selfbeliefs (Vol. III). PISA. Paris: OECD Publishing. OECD (2016). PISA 2015 results: Excellence and equity in education (Vol. I). PISA. Paris: OECD Publishing. Peng, K., Nisbett, R. E., & Wong, N. Y. C. (1997). Validity problems comparing values across cultures and possible solutions. Psychological Methods, 2, 329–344. Petersen, J. L., & Hyde, J. S. (2017). Trajectories of self-perceived math ability, utility value and interest across middle school as predictors of high school math performance. Educational Psychology, 37, 438–456. Schurtz, I. M., Pfost, M., Nagengast, B., & Artelt, C. (2014). Impact of social and dimensional comparisons on student’s mathematical and English subject-interest at the beginning of secondary school. Learning and Instruction, 34, 32–41. Shi, K., Wang, P., Wang, W., Zuo, Y., Liu, D., & Maehr, M. L. (2001). Goals and motivation of Chinese students: Testing the adaptive learning model. In F. Salili, C.-Y. Chiu, & Y.-Y. Hong (Eds.), Student motivation: The culture and context of learning (pp. 249–270). New York: Kluwer Academic/Plenum. Stipek, D., & MacIver, D. (1989). Developmental change in children’s assessment of intellectual competence. Child Development, 60, 521–538. Stoet, G., Bailey, D. H., Moore, A. M., & Geary, D. C. (2016). Countries with higher levels of gender equality show larger national sex differences in mathematics anxiety and relatively lower parental mathematics valuation for girls. PloS One, 11, e0153857. Trautwein, U., Marsh, H. W., Nagengast, B., Lüdtke, O., Nagy, G., & Jonkmann, K. (2012). Probing for the multiplicative term in modern expectancy–value theory: A latent interaction modeling study. Journal of Educational Psychology, 104, 763–777. Van de gaer, E., Grisay, A., Schulz, W., & Gebhardt, E. (2012). The reference group effect:

241

An explanation of the paradoxical relationship between academic achievement and self-confidence across countries. Journal of Cross-Cultural Psychology, 43, 1205–1228. Van de Vijver, F. J. R., & Leung, K. (2001). Personality in cultural context: Methodological issues. Journal of Personality, 69, 1007–1031. Van Houtte, M., Demanet, J., & Stevens, P. A. (2012). Self-esteem of academic and vocational students: Does within-school tracking sharpen the difference? Acta Sociologica, 55, 73–89. Vonkova, H., Zamarro, G., Deberg, V., & Hitt, C. (2015). Comparisons of student perceptions of teacher’s performance in the classroom: Using parametric anchoring vignette methods for improving comparability. EDRE Working Paper No. 2015-01. Available at SSRN: https://ssrn.com/abstract=2652400 or http://dx.doi.org/10.2139/ssrn.2652400 Walton, G. M., & Cohen, G. L. (2011). A brief social-belonging intervention improves academic and health outcomes of minority students. Science, 331, 1447–1451. Wang, M.-T. (2012). Educational and career interests in math: a longitudinal examination of the links between classroom environment, motivational beliefs, and interests. Developmental Psychology, 48, 1643–1657. Wang, M.-T., & Degol, J. (2014). Motivational pathways to STEM career choices: Using expectancy-value perspective to understand individual and gender differences in STEM fields. Developmental Review, 33, 304–340. Wang, M.-T., & Degol, J. (2016). Gender gap in STEM: Current knowledge, implications for practice, policy, and future directions. Educational Psychology Review, 28, 1–22. Wang, M.-T., Degol, J., & Ye, F. (2015). Math achievement is important, but task values are critical, too: Examining the intellectual and motivational factors leading to gender disparities in STEM careers. Frontiers in ­Psychology, 6. https://doi.org/10.3389/ fpsyg.2015.00036 Wang, Q., & Pomerantz, E. M. (2009). The motivational landscape of early adolescence in the United States and China: A longitudinal investigation. Child Development, 80, 1272–1287. Watt, H. M. G. (2004). Development of adolescents’ self-perceptions, values, and task

242

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

perceptions according to gender and domain in 7th- through 11th-grade Australian students. Child Development, 75, 1556–1574. Watt, H. M. G. (2016). Gender and motivation. In K. R. Wentzel & D. B. Miele (Eds.), Handbook of motivation at school (2nd ed., pp. 320–339). New York: Routledge. Wigfield, A., et al. (1997). Change in children’s competence beliefs and subjective task values across the elementary school years: A

three-year study. Journal of Educational Psychology, 89, 451–469. Wigfield, A., Tonks, S., & Klauda, S. L. (2016). Expectancy-value theory. In K. R. Wentzel & D. B. Miele (Eds.), Handbook of motivation at school (pp. 55–74). New York: Routledge. Yeung, A. S., Lau, S., & Nie, Y. (2011). Primary and secondary students’ motivation in learning English: Grade and gender differences. Contemporary Educational Psychology, 36, 246–256.

13 Examining Change over Time in International Large-Scale Assessments: Lessons Learned from PISA Christine Sälzer and Manfred Prenzel

International comparative studies, like the IEA’s Trends in International Mathematics and Science Study (TIMSS), the Progress in International Reading Literacy Study (PIRLS), the OECD’s Programme for International Student Assessment (PISA), as well as the Programme for the International Assessment of Adult Competencies (PIAAC), provide important data for educational monitoring. Yielding a collection of relevant indicators, these large-scale assessments help to identify relative strengths and weaknesses of educational systems. The mentioned international comparative student studies have made their way into educational policy around the world by publishing representative results every three to five years. Many countries nowadays use an orchestrated set of national and international comparative educational studies for educational monitoring (see Martin, Mullis, Foy, & Hooper, 2016; Mullis,

Martin, Foy, & Hooper, 2016, 2017; OECD, 2016a). Large-scale student assessments generally have two main functions: monitoring and benchmarking (Seidel & Prenzel, 2008). Both functions imply comparisons, either with a set of standards (monitoring) or other educational systems’ structures, procedures, and outcomes (benchmarking). With regard to monitoring, large-scale student assessments provide thorough descriptions of the state of educational indicators as well as a database for considerations about changes and improvement of educational processes. The normative frame of reference in this case is at first located at the national level, since results are compared to nationally defined policy objectives or – after some time – to results of earlier rounds of a study. The benchmarking function of international large-scale assessments makes relative strengths and

244

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

weaknesses of educational systems visible through the comparison to other educational systems (Drechsel, Prenzel, & Seidel, 2014). In that sense, the benchmarking function focuses on the comparison to other countries and their educational systems (international frame of reference), whereas the monitoring function focuses more on the single state and especially on the development of indicators over time (i.e., trends, national or regional frame of reference). Benchmarks identified by international comparisons, however, may be captured by a country as a set of references for systematic monitoring activities at a national level. Central results from large-scale assessments refer, on the one hand, to the level of students’ competencies and their distributions and to certain relationships, for example, with context factors indicating disparities. On the other hand, international comparisons of student proficiencies at different times during schooling allow capturing and describing changes over time (Rutkowski & Prusinski, 2011). In addition to an international classification of student competencies, the findings of large-scale assessments can be compared over several rounds. This way, possible effects and consequences of measures taken and interventions, in the meantime, can be estimated. It is also possible to detect potentially problematic developments at an early stage (see Drechsel & Prenzel, 2008; Drechsel et  al., 2014). Usually, trend analyses are carried out, comparing results of different cohorts of students who have been tested in different years, but longitudinal developments accompanying students over time are feasible as well. The results of such studies thus do not speak for themselves; they need to be read and interpreted carefully, due to the complexity of data and methods, and the challenge of taking into account manifold processes occurring in an educational system across time is demanding. In this regard, measuring student competencies and educational contexts over time is one of the main strengths of studies like PISA, TIMSS,

and PIRLS. Both educational monitoring and benchmarking gain additional meaningfulness through such trends, as indicators captured in different rounds of the assessment can reflect changes over time and thus help with understanding the interplay of different characteristics in education. In this chapter, we present different possibilities for using large-scale assessments systematically for the analysis of changes across time. In a first step, the development of largescale assessments in one country (Germany) serves as an example for carving out different ways of how large-scale assessments can be used to identify and describe problems in educational systems, to reflect benchmarking options, and also to monitor progress in the educational sector. Subsequently, we discuss – at a general and systematic level – the potential of large-scale assessments and challenges for the analysis of trends in educational systems.

FROM CASES TO PATTERNS: LARGESCALE ASSESSMENTS IN GERMANY Unfolding results from large-scale assessments from single cases of countries to overall patterns in the data are not trivial. In this section, we describe how Germany, as one case, has received and processed data from PISA and other large-scale studies. In doing so, we elaborate necessary considerations for broadening the scope from one to several countries and argue why comparisons at the international level are so demanding and complex. In some countries, like Germany, the publication of the first PISA cycle in 2001 set off a more or less troubling experience, sometimes referred to as the ‘PISA-shock’ (cf. Roeder, 2003). Educational researchers, practitioners, and policymakers started looking for answers to why student performance was below the international average and equity was

EXAMINING CHANGE OVER TIME IN INTERNATIONAL LARGE-SCALE ASSESSMENTS

disastrous. They also discussed which other educational systems could serve as some kind of a role model and could give ideas for how to improve. Anyway, this ‘PISA-shock’ triggered a comprehensive agenda of educational reform in Germany, which is best represented by a document called PISA 2000: Central Fields of Action, published by The Standing Conference of the Ministers of Education and Cultural Affairs of the Länder in the Federal Republic of Germany (KMK, 2002). The Central Fields of Action later became part of a long-term strategy (KMK, 2006). The seven central fields of action identified in these strategic considerations, however, needed to be transferred into specific measures to improve the educational situation in the whole country. The fields of action express priorities and general goals, which can serve as a reference when interpreting findings and changes of subsequent large-scale assessments with respect to improvements over time. After PISA 2000, such measures of educational policymaking were, for instance, improving language skills and elementary schooling, a better link of pre-schooling and elementary schooling, more support for children at risk, quality assurance in teaching and school, and professional teacher education and all-day schooling. One concrete example of how these fields of action have been addressed is the implementation of national educational standards in a federal system of 16 states that are carrying the responsibility for education. A group of researchers and experts in the field of education conceptualized a framework for the development and implementation of standards (Klieme et  al., 2003). This framework served as a roadmap for combining societal objectives, scientific findings about competence development, as well as concepts and procedures of test development. At the same time, educational standards were defined for relevant subject areas and different levels of schooling, and for this purpose, a new institution (Institut zur Qualitätsentwicklung im Bildungswesen (IQB) [Institute for Educational Quality

245

Improvement]) with responsibility for the development of standards and respective assessment tools was established. The know­ ledge generated by regular assessments of educational standards is useful for steering the educational system toward the goals defined by the KMK. Trends help to check whether identified problems are about to increase or decrease and whether set objectives will be reached within a reasonable time. This ‘PISA-shock’ (Roeder, 2003) was the beginning of a new era in Germany’s educational policy. The federally structured educational system, with its 16 states, called ‘Länder’, was confronted with a common problem and a common challenge. However, the average achievement in reading, mathematics, and science, which was just below the OECD average, was alarming. Along with blatant differences and inequalities between the 16 German federal states and their educational systems (identified in an oversampling allowing intra-national comparisons) (see Baumert et al., 2003), the weak performance of 15-year-olds brought movement in the German educational system (Bieber, Martens, Niemann, & Windzio, 2014; Hartong, 2012). After five more rounds of PISA, the performance in all domains was significantly above the OECD average, and the shock has faded away (Reiss, Sälzer, Schiepe-Tiska, Klieme, & Köller, 2016). Besides PISA, Germany participates in a number of comparative educational studies, both at the national and the international levels. At first, the IEA’s TIMSS and PIRLS were administered, followed by the OECD’s PISA and PIAAC. From 2009 on, the National Educational Panel Study (NEPS) has been conducted as well, a multicohort longitudinal study comprising the human lifespan (Blossfeld, Rossbach, & von Maurice, 2011). These large-scale assessments differ in research design, varying from cross-sectional trend studies to complex panel designs (Seidel & Prenzel, 2008). Such an orchestrated portfolio of educational monitoring, however, enables researchers and

246

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

policymakers to cross-check the plausibility of results across different studies and of several student cohorts at different points of their school careers. For example, when students at the primary level in one country excel in the reading literacy assessment, but students at the secondary level in the same country have apparent problems in this domain, an issue that needs to be pursued is the sustainability of primary schooling in this country. Of course, differences in the theoretical framework, test conception, and structure of the captured competence need to be taken into account. Yet, all of the mentioned studies can reference each other as their frameworks and objectives show sufficient overlap. PISA provides interesting data, but standing alone, they are not sufficient for understanding comparisons and development. As a response to this limitation, PISA in Germany was, from the beginning, more than a study run by the OECD and administered through service providers. It was a study pursuing relevant educational questions, enhanced by additional tests, additional questionnaires, and a national oversample of students to make the sample representative at the Länder level and conducted by accomplished educational researchers. The additional tests were based on national curricula and comprised instruments to measure student competencies in reading, mathematics, and science. The extra questionnaires focused on student background variables and attitudes, such as motivation with regard to learning and testtaking or perception of schooling and instruction (cf. Baumert et al., 2001; Baumert et al., 2002; Baumert et al., 2003). Such additions increase the significance of a study already when data are collected only once. The value added by these enhancements even increases when a study is carried out continuously over a large timespan and collects data on a regular basis. Today, PISA has completed six rounds, which formed two cycles of three studies each (PISA 2000–2006, PISA 2009– 2015). Each of the three competence domains has been the major domain twice, reading in

2000 and 2009, mathematics in 2003 and 2012, and science in 2006 and 2015. Taken together, Germany is one of a few countries looking back to an overall positive development over the course of six rounds of PISA. In PISA 2015, 72 countries participated, and only 15 of those reached average proficiency scores significantly above the OECD average in all three domains (cf. OECD, 2016b; Reiss & Sälzer, 2016). Germany is one of them and, as opposed to numerous other countries that have experienced pronounced variations in student achievement throughout the past six rounds, the trend lines for students in Germany show either increasing scores or consolidated levels of competence. Such trend measures are relevant for all participating educational systems, as they provide evidence for the effectiveness of measures taken in order to improve or change elements of educational systems. Taking an international comparative perspective is a solid foundation for describing and judging change. After six rounds of PISA, several issues characterize the public debate about PISA in our case of Germany: structural change processes, disparities, the share of lowperforming students, and the support of top-performing students. Looking at the 16 federal states, the degree of expansion of schools offering a full-time schedule still varies considerably. At the same time, a trend toward a two-strand secondary school system can be observed, comprising Gymnasium schools in all 16 federal states and one alternative school type, rarely two or three. Although the Gymnasium as a school type exists throughout Germany, there are differences with regard to the duration of this school program: in some federal states, students attend it for eight years; in others, they attend for nine. Besides attending a Gymnasium school, students nowadays have more and more alternative pathways available when they want to be granted access to higher education; for example, by graduating from vocational training with a master

EXAMINING CHANGE OVER TIME IN INTERNATIONAL LARGE-SCALE ASSESSMENTS

craftman’s certificate (‘Meisterbrief ’). While disparities and low-performing students have been a topic of discussion from the beginning, the group of highly capable students came into focus more recently. In the German educational system, systematic inequalities with regard to several characteristics have been found throughout all rounds of PISA so far. Gender differences became obvious, especially in reading (in favor of girls) (e.g., Artelt, Stanat, Schneider, & Schiefele, 2001; Hohn, Schiepe-Tiska, Sälzer, & Artelt, 2013) and mathematics (in favor of boys) (e.g., Klieme, Neubrandt, & Lüdtke, 2001; Sälzer, Reiss, Schiepe-Tiska, Prenzel, & Heinze, 2013), but in PISA 2015, also in science (in favor of boys) (cf. Schiepe-Tiska et al., 2016). The data from PISA 2015 give the impression that boys may have considerably caught up in reading, as the difference in average reading proficiency between boys and girls shrank from approximately 40 points in earlier rounds to 21 points in PISA 2015. Interpreting such a precipitous decline of difference needs to be done with great care, as the mode of test administration changed from paper and pencil (PISA 2000 until PISA 2012) to computer (from PISA 2015 on; see section on ‘Change of assessment mode’ below), and comparability across PISA cycles from PISA 2015 on backward to earlier cycles may be limited (Robitzsch, Lüdtke, Köller, Kröhne, Goldhammer, & Heine, 2017). After all, it may be a technical issue related to modeeffects or a substantial difference related to the familiarity of students with regard to information presented digitally – or both. The share of low-performing students in each of the three domains has decreased over time in Germany, but, taking into account that there are other educational systems proving that this group can be even smaller together with a higher average proficiency, this issue remains a topic of discussion. Again, for our case of Germany, the monitoring shows some progress in decreasing gender differences, indicating, at the same time, clear challenges that have to be mastered in the future.

247

Besides the gender gap, another disparity requires special attention in Germany’s educational policy. During the first rounds of PISA, the group of students attending a Gymnasium, the academic track at the secondary level of schooling, appeared to be strong performers with sound potential for excellent achievement. The two most recent rounds of PISA, 2012 and 2015, however, revealed that a high average score in PISA could not guarantee a stable development of competencies when highly talented students are not supported in a way that enables them to excel. In PISA 2012, the overall improvement of the PISA cohort in Germany was mainly due to the fact that students in school types other than Gymnasiums caught up – students with a rather low socio-economic status and with an immigrant background (cf. Gebhardt, Rauch, Mang, Sälzer, & Stanat, 2013; Müller & Ehmke, 2013). In contrast, the average achievement of Gymnasium students stagnated (cf. Sälzer et al., 2013). The impression that the group of top-performing students, who mostly attend a Gymnasium in Germany, was in need of specific and focused support was strengthened by the findings of PISA 2015. In science, the average performance of students at Gymnasium schools declined significantly, although this difference needs to be interpreted with care, due to the modified test design in PISA 2015 (computer-based testing) (Schiepe-Tiska et  al., 2016). Compared to other participating countries, Germany cannot keep up with the best ones with regard to how their topperforming students are supported through their educational institutions and system. As a reaction, profound decisions at the level of educational policy have been made, enabling a systematic diagnostic and a network of measures to be taken both inside and outside school (KMK, 2015). And yet another disparity is still troubling in our case: the relationship between performance and the social background of students. Looking at the international pictures, obviously in all countries participating in PISA,

248

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

social background factors and performance are correlated. However, the strength of this relationship varies considerably, indicating different levels of success in attaining equity. Observing the social gradient and amount of variance in student achievement explained by their social background provides parameters to classify the strength of the association between social background and proficiency in PISA. In all six completed rounds of PISA, the slope of the social gradient in Germany as well as the amount of variance explained by the students’ socio-economic status (HISEI = Highest International SocioEconomic Index of Occupational Status) was significantly higher than in the OECD average (Müller & Ehmke, 2016). However, if, instead of using the students’ socio-economic status, the index of Economic, Social, and Cultural Status (ESCS) is applied (cf. Ehmke & Siegle, 2005), the social gradient in Germany does not differ from the one in OECD countries. The amount of explained variance, however, is significantly higher than the OECD average. Using a rescaled ESCS index to compare the association of students’ scientific competence and their social backgrounds in PISA 2006 and 2015 (when science was the major domain), a partial decoupling of scientific competence and social background appears. This means that, when not only socio-economic indicators but also a set of economic, social, and cultural resources at home are taken into account, the interrelation of social background and scientific competence decreases over time, in relation to the amount of explained variance (Müller & Ehmke, 2016). It can thus be useful to use different indices to operationalize students’ social backgrounds in order to yield more and differential information on problematic issues as well as on partial progress in decreasing inequalities. According to this finding, it seems that, in Germany, cultural and social resources of students’ homes can be better compensated now than was the case in earlier rounds of PISA. Shrinking distance in reading proficiency, which was found

between groups of different social background over time, supports this interpretation (Müller & Ehmke, 2016). Another set of indicators gained attention in our case from the beginning and, due to the increasing international migration movements, is of particular importance nowadays. Students with an immigrant background have constituted a significant subpopulation of the PISA cohort in Germany in all six rounds of PISA. In PISA 2000, the proportion of students with an immigrant background in Germany was 21.6 percent (Baumert & Schümer, 2001; Stanat, Rauch, & Segeritz, 2010) and increased to 27.8 percent in PISA 2015 (Rauch, Mang, Härtig, & Haag, 2016). Within this group, it is useful to distinguish first- and second-generation immigrants in order to describe differences in achievement and learning contexts in more detail. In Germany, the group of first-generation immigrants shrank over time (from 6.6 percent in PISA 2006 to 3.7 percent in PISA 2015), whereas the group of students with one parent born abroad and second-generation immigrants increased (from 7.6 percent in PISA 2006 to 13.1 percent in PISA 2015) (cf. Rauch et al., 2016). For in-depth analyses comparing the situation of students with an immigrant background across countries, even more details regarding the context of migration are required. For one, countries can be distinguished according to their history of migration. Baumert and Schümer (2001) already suggest differentiating four groups of countries: classic immigrant countries that have been founded by immigrants (such as the USA or Australia), former colonial empires (such as the United Kingdom and France), mid-European countries that invited work-related migration (such as Germany, Austria, Luxemburg, or Switzerland), and Nordic countries that received work-related and humanitarian immigrants. The recent changes in migration flows underline how relevant but, at the same time, how fragile such differentiations are (e.g., because of the different compositions and scope of

EXAMINING CHANGE OVER TIME IN INTERNATIONAL LARGE-SCALE ASSESSMENTS

these migrant groups). And obviously, also the immigration policies within these groups vary, as they follow different political objectives and are not consistent over time (Walter & Taskinen, 2007). In observing the constantly found achievement gap between students with and without an immigrant background in our case of Germany, an interesting finding appears: the second cycle of PISA, from 2009 until 2015, enabled researchers to thoroughly compare the results of each major domain with the PISA cohort of nine years before – PISA 2000 and 2009 in reading, 2003 and 2012 in mathematics, and 2006 and 2015 in science. In PISA 2009, an improvement in reading was noted for Germany, and analyses show that especially students with an immigrant background in the first generation were able to significantly catch up compared to classmates without an immigrant background (Stanat et  al., 2010). A similar picture emerged in PISA 2012 for mathematics, when students with an immigrant background were among those who significantly improved compared to 2003 (Gebhardt et al., 2013). Such improvement, however, was not found in PISA 2015 and science, with students with an immigrant background in Germany; neither students with nor those without an immigrant background achieved better than in PISA 2006 (Rauch et al., 2016). So the current situation is characterized by partial improvements, but there is still a considerable gap in proficiency between students in Germany with and without an immigrant background. Large-scale assessments can help to localize the problems, especially if they provide detailed information on disparities by types of migration (e.g., first- or second-generation), country of origin, or language spoken at home. In the case of Germany, large disparities in performance between federal states were blatant in the beginning of PISA, and the differences between the top- and low-performing federal states still are considerable. The findings indicate differential developments in

249

federal states that stimulate questions and ask for explanation, because they occur inside separate national sub-contexts. Differential developments on an international level seem to be far more complex. Within the country, changes have become visible at both the primary and the secondary levels of schooling, where the average performance of some of the federal states declined, while others were able to improve or maintain their level of proficiency (cf. Stanat, Schipolowski, Rjosk, Weirich, & Haag, 2017). The association between students’ social and family backgrounds and their average proficiency still varies between federal states. The challenge of reducing such correlations and disparities remains. When single countries want to use evidence from international comparative studies, the perspective shifts from intranational to international. Especially with regard to observations of trends, educational systems that have undergone a positive development could be chosen as role models from which to learn. Problems identified in the case of Germany, such as large disparities between students from different social and cultural backgrounds and also between students attending schools in different federal states, can ignite a look abroad in order to see what other systems may have done better. Comparing different educational systems and identifying potential keys to success, however, asks for a number or prerequisites. These are elaborated in the following two sections.

COMPARATIVE INTERNATIONAL STUDIES AND USE OF THEIR RESULTS International student assessments are often perceived as sheer rankings of educational systems, along with their average proficiency score. They offer, however, far more and different types of empirical evidence that can be used as an empirical point of reference for educational policy (e.g., Cresswell, Schwantner, & Waters, 2015; Wagemaker,

250

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

2014). Examples of different types of evidence are sketched in the following. Large-scale assessments are, in general, structured along the so-called CIPO-model (Kuger, Klieme, Jude, & Kaplan, 2017; Purves, 1987), combining context, input, process, and output factors related to an educational system. This model separates structural and background variables (context, input) from measures and activities and their alleged effects (process, output). It provides a theoretical frame that enables researchers to arrange indicators relevant for educational monitoring. Accordingly, not only indicators for output shed light on the quality of an educational system, but also indicators for processes (e.g., homework practices, teacher supportiveness) or structures (e.g., class size) can be used for comparisons. Further evidence will be gained when additional relations between indicators (within or between the CIPO-fields) are considered (statistically, e.g., by regression analysis). Indicators taken from the CIPO-model can be used for international comparisons, and especially for comparisons with certain standards. They also help identify problems and can serve as the base for building a set of trend indicators in educational monitoring. Another frame of reference for large-scale assessments with respect to educational monitoring are objectives within the responsibility of educational policy. These objectives range from general educational goals (explicated in the constitution or laws) to concrete objectives in curricula. Also, contemporary political ambitions (like reducing inequality, grade repetition, fostering the level of competences for the digital era, or improving quality development at schools) could be addressed. Monitoring then means relating indicators (and findings) systematically to explicit objectives of educational policy. Trend analyses allow the systematic description of changes across time with respect to such goals as well as to problems or relative weaknesses of educational systems that had been identified in earlier rounds of an

assessment program. For instance, differences in indicators between PISA 2006 and PISA 2015 may be used to point out development in the desired direction: an increased proportion of schools using measures for school development, a decline in the share of grade repetitions in the PISA cohort, or a decreased achievement gap between students with and without an immigrant background. Thus, combining several points of measurement makes trends observable and enables researchers and policymakers to identify potentially problematic developments. It has to be highlighted, however, that the analyses of trends come with a number of challenges. First, there is a need for transformation from data into evidence. Analysis of trends allows for educational monitoring, which produces knowledge with regard to the steering of an educational system. But such knowledge requires being placed and interpreted within a contextual framework in which changes and measures can be controlled. Also, trend analyses should be embedded in other systems of monitoring and should involve theoretical assumptions about delayed effects of changes, reforms, and measures. Second, several requirements should be met when trends are analyzed by means of data from large-scale assessments. In the following, several requirements for analysis of trends in large-scale assessment are discussed.

Link Items Versus Situated Test Conception Studies that repeatedly collect data face a number of challenges that come with the long-term approach of these assessments. For instance, in PISA 2015, the participating cohort of 15-year-olds was just born when the very first PISA-cohort took the test in 2000. As the concept of literacy defined in PISA states that the proficiency captured by the test has to be situated in the 15-year-olds’ everyday life and experience (e.g., OECD,

EXAMINING CHANGE OVER TIME IN INTERNATIONAL LARGE-SCALE ASSESSMENTS

1999), it is obvious that this demand requires the topics and items in the test to be adapted and updated in order to remain close to the respective environment of the participating cohort of students. At the same time, results from different rounds of PISA need to be comparable to each other in order to describe changes in proficiency and also changes in relationships between contextual and background factors and student achievement. In order to ensure such comparability, some of the items need to be administered in several rounds of the study. When results from two or more rounds of PISA are compared to each other, the data from all selected rounds must be linked to each other so the scores can be directly interpreted on a common scale (Carstensen, 2013).

Sampling A second crucial aspect of ensuring comparability over time in international large-scale assessments is the sampling strategy. Samples in different rounds of a study like PISA must be as similar as possible in terms of previously defined aspects, such as representativeness, stratification, or coverage rate of the population. While the definition of the student population assessed in PISA remains the same over time, the composition of this population is not static and will be reflected in the student sample. When, for example, the proportion of students with an immigrant background in a country increased between PISA 2000 and PISA 2015, the student samples are composed accordingly. A relevant and helpful approach to dealing with such differences between samples over time is to distinguish countries of origin as well as first- and second-generation immigrants in order to be able to describe group-specific contexts and backgrounds when interpreting differences in proficiency scores between several rounds of PISA (e.g., Rauch et  al., 2016). A meticulous sampling over several rounds of a study is a further valid indicator

251

of possible distortions of results, which may be due either to unexpectedly large changes in the student population or to an incorrect sample.

Participation Over time, participation rates in countries taking part in large-scale assessments can vary. One issue is the voluntariness of participation in PISA, at both the school and student levels. The voluntariness of participation is determined by each participating country and especially in countries where PISA is not mandatory; participation rates are subject to conditions available for sampled schools. If the timing of the test falls during a very busy period, like right at the beginning or toward the end of a school year, participation rates are likely to be lower than during less busy times. In order to reach the minimum required sample size, it is important to take such conditions into account and, if needed, place the test window slightly earlier or later than it is in other countries. In order to maintain comparability over several rounds of PISA, however, it is crucial that any change in participation and exploitation of the sample be taken into account. Given that the composition of target population is, as well, subject to change, for example, with regard to the group of students with an immigrant background, instruments for capturing detailed information on type of immigrant background (e.g., first- or second-­generation) are required.

Change of Assessment Mode It can be observed that not only large-scale assessments, but also educational systems, are dynamic, and have to be in order to prepare current and future generations of students for the changing demands in work and everyday life. Probably the most obvious adaptation of PISA to the real life of

252

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

15-year-old students was the shift from paper-based to the computer-based assessment, which was completed in PISA 2015 (OECD, 2016a, 2016b, 2017a, 2017b). Conducting a large-scale assessment using computers offers several possibilities (colored tasks, different response formats, integration of video, interactive problems) that were not available during the paperbased period. Test items allowing students to interact with the stimulus and running simulations, capturing response time and strategies to solve items, enable researchers to describe in more detail than before what distinguishes a highly competent student from less competent participants (Goldhammer, Lüdtke, Martens, & Christoph, 2016). Changing the mode of assessment, however, comes with possible effects (socalled mode effects) that may influence the results of the assessment. Results are affected by a mode effect when groups of students are systematically discriminated against or advantaged by either the paper-based or the computer-based mode of assessment. In Germany, for example, gender differences in favor of girls have constantly been found in the first five rounds of PISA (2000–2012). The gap in reading proficiency between boys and girls was considerable and had been found in different studies, both nationally and internationally. In PISA 2015, this previously stable finding changed, and boys caught up with girls so that the difference between both genders was now only about half the size of previous rounds of PISA (Weis, Zehner, Sälzer, Strohmaier, Artelt, & Pfost, 2016).

LIMITS OF INTERNATIONAL COMPARATIVE STUDIES Apparently, a lot of effort is required when conducting an international comparative study as a trend observation. Educational benchmarking serves as an international

comparison of selected indicators, aiming to put national data in a larger context. However, as soon as national findings are expected to provide more than benchmarking with regard to an international perspective, a number of limits of international comparative largescale assessments become obvious. The question about why indicators develop differentially in different countries cannot be answered with data generated by large-scale assessments. In fact, a solid theoretical framework on the development and dynamics of educational systems undergoing change processes is needed. Such a theory would explain how effects and consequences of reform processes could be noticed and captured. The core difficulty here is that effects of reforms and changes at the level of an educational system can only be measured with a certain – yet undetermined – delay. Furthermore, indicators measured in largescale assessments are helpful for continuous monitoring, but they cannot answer the question of what one educational system could or should do to become as successful as another educational system. Such indicators help to draw a picture of the state of educational systems at the time of data collection; the knowledge generated in this manner, however, is not projective and does not point out which measures could be taken to achieve specific objectives. Theories based on patterns detected in the data, such as what educational systems defined as successful have in common, are falsified in the end by just one example for which the pattern does not apply. Drawing conclusions from one case to another is risky, and even when multiple cases are available, the number of cases (countries) are always too small for systematic statistical analyses, because the number of factors that could have an effect far and away exceed the number of cases. While one can search for similarities between educational systems, the number of cases available will always be limited, so that the danger of fallacies is a threat when going beyond benchmarking at the international level.

EXAMINING CHANGE OVER TIME IN INTERNATIONAL LARGE-SCALE ASSESSMENTS

Referring to indicators, international largescale student assessments provide information on several societal challenges. These are, for instance, captured when the relationship between different variables characterizing the students’ social background and average achievement is analyzed. Current societal challenges are: integrating students with an immigrant background in their respective educational system (including students with special educational needs), balancing gender differences in achievement and interest, as well as the systematic correlation of social background and achievement, which is in many countries higher than desired (see OECD, 2016b). Representatively, depicting the target-population of 15-year-old students in participating countries, PISA enables researchers, politicians, and the public to see how immigrant status can be related to student achievement and, as in numerous countries, to a systematic achievement gap in favor of non-immigrant students. At the same time, such evidence makes good practices visible and offers opportunities for educational policymakers to learn from other countries. Learning in that sense does not mean copying structures or procedures from other countries, but rather reasoning and debating about what can make a difference in one’s own educational system, with regard to selected findings, and what can be a worthwhile initiative for improving undesired situations. The intention to compare the situation of students with an immigrant background across countries described above is an example that shows how such learning from other educational systems is limited. Differences in achievement of this subgroup of students across countries are only interpretable when the contextual conditions and regulations of migration are known and taken into account. Reasons for migration and immigration policies vary between countries, and such variation is part systematic and part individual. Even though countries can be tentatively classified according to their immigration policies or according to the most prevalent reasons for

253

immigration, there is always some residual variance that cannot be explained without a thorough knowledge of the concrete context. An example of similar difficulties is the role of digitalization in the learning context. Recent findings from PISA and PIAAC indicate that availability of ICT devices, per se, does not predict achievement and familiarity with such devices (e.g., OECD, 2013, 2017a). Neither does it say much about how the specific value-added of ICT devices are implemented and used in lessons. There is a considerable difference between teachers using computers as digital typewriters with their students and teachers integrating computers or tablets in their lessons in a way that allows students to get individual feedback on tasks. In order to understand what makes a difference in student achievement with regard to digital devices, deep knowledge and understanding about the actual use of such devices is necessary. All in all, the findings from large-scale student assessments are best interpreted with sound knowledge of the structural, administrative, and procedural context conditions in which schooling and learning take place. Otherwise, the danger of fallacies is quite high.

LESSONS LEARNED FROM EXAMINING CHANGES IN INTERNATIONAL LARGE-SCALE ASSESSMENTS We close this chapter by pointing out which lessons can be learned from examining change over time in international comparative studies. A major challenge for such studies is that changes have to be controlled for at several levels. For one, the assessment has to keep up with the target population and their characteristics in order to remain adequate and suitable. This means that it has to combine stable elements in order to be linked to earlier rounds of the study and innovative elements in order to be up to date. At a

254

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

broader level, changes around the assessment and the assessed group of students have to be controlled. Taking the case of Germany as an example, it became obvious how difficult it is to control and assess change, but also how useful it is to strive for a thorough and longterm assessment. When the perspective shifts from a national level to an international, comparative one, a systematic observation of changes in relevant context factors throughout the assessed countries would be worthwhile and helpful. For such an objective, a theoretical framework is needed to reflect how change takes place within and across educational systems and how and when such change can be measured for analysis. Departing from Germany as one case, an international comparison would need to carefully select countries in which the contextual conditions are known over a certain amount of time. Importantly, not just one large-scale assessment should be considered for such a comparison. The case of Germany shows how an overall picture across the educational landscape can be drawn when PIRLS gives an idea about primary school children’s reading competencies, and PISA allows for focusing on the sustainability of high scores at the primary level by adding the secondary level to the results of such studies. On a regular basis, the OECD publishes a series called Education at a Glance, which combines key information on the output of educational institutions and the impact of learning across countries as well as resources invested in education. This series is an impressive example for the challenge of researchers collecting a multitude of indicators worldwide and publishing sophisticated analyses with these, but without a sound theory of how educational systems work, develop, and change, any analysis remains arbitrary and routine. Adding to this, international comparisons are highly useful for monitoring and benchmarking, and international studies gain value when enriched by nationally relevant options. Such options can be inserted via extra questions or test items, extra instruments

(questionnaires or tests), or extra samples of students to better describe subpopulations with special characteristics (e.g., students with an immigrant background or special educational needs). When something in the respective educational system has changed, or earlier results of studies have shown potentially negative outcomes needing to be improved, adding national options to the baseline program of the international comparative study is a useful approach. It could as well serve as a starting point for advancing international comparisons when several countries agree on adding options, including a control of contextual factors in all countries that are part of the comparison. All in all, the value of results from PISA, PIRLS, and other studies is much higher when not only seen at one point but also observed over a longer period. The database is more stable then, and trends can be anticipated when several points of measurement are known. Interpretation can be done with the help of other countries’ results, especially when there are similar developments or unexpectedly positive ones. With regard to educational policymaking, the ensemble of national and international large-scale assessments can be used as an early warning system in order to identify potentially problematic developments. It cannot, however, elaborate on how such undesired developments can be prohibited. For this, we need a theoretical framework for which to refer.

REFERENCES Artelt, C., Stanat, P., Schneider, W., & Schiefele, U. (2001). Lesekompetenz. Testkonzeption und Ergebnisse. In J. Baumert, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, W. Schneider, P. Stanat, K.-J. Tillmann, & M. Weiß (Eds.), PISA 2000: Basiskompetenzen von Schülerinnen und Schülern im internationalen Vergleich [PISA 2000: Basic student competencies in an international comparison] (pp. 69–140). Opladen: Leske + Budrich.

EXAMINING CHANGE OVER TIME IN INTERNATIONAL LARGE-SCALE ASSESSMENTS

Baumert, J., Artelt, C., Carstensen, C. H., Sibberns, H., & Stanat, P. (2002). Untersuchungsgegenstand, Fragestellungen und technische Grundlagen der Studie. In J. Baumert, C. Artelt, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, … M. Weiß (Eds.), PISA 2000 – die Länder der Bundesrepublik Deutschland im Vergleich (pp. 11–38). Opladen: Leske + Budrich. Baumert, J., Artelt, C., Klieme, E., Neubrand, M., Prenzel, M., Schiefele, U., Schneider, W., Tillmann, K.-J., & Weiß, M. (Eds.) (2003). Pisa 2000 – Ein differenzierter Blick auf die Länder der Bundesrepublik Deutschland [PISA 2000 – A detailed look into the federal states of Germany]. Opladen: Leske + Budrich. Baumert, J., Klieme, E., Neubrand, M., Prenzel, M., Schiefele, U., Schneider, W., Stanat, P., Tillmann, K.-J., & Weiß, M. (Eds.) (2001). PISA 2000: Basiskompetenzen von Schülerinnen und Schülern im internationalen Vergleich [PISA 2000: Basic student competencies in an international comparison]. Opladen: Leske + Budrich. Baumert, J., & Schümer, G. (2001). Familiäre Lebensverhältnisse, Bildungsbeteiligung und Kompetenzerwerb. In J. Baumert, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, W. Schneider, P. Stanat, K.-J. Tillmann, & M. Weiß (Eds.), PISA 2000: Basiskompetenzen von Schülerinnen und Schülern im internationalen Vergleich [PISA 2000: Basic student competencies in an international comparison] (pp. 323–410). Opladen: Leske + Budrich. Bieber, T., Martens, K., Niemann, D., & Windzio, M. (2014). Grenzenlose Bildungspolitik? Empirische Evidenz für PISA als weltweites Leitbild für nationale Bildungsreformen. Zeitschrift ftir Erziehungswissenschaft, 17(S4), 141–166. https://doi.org/10.1007/ s11618-014-0513-6 Blossfeld, H.-P., Rossbach, H.-G., & von Maurice, J. (2011). Education as a life-long process: The German National Educational Panel Study (NEPS). Wiesbaden: VS Verlag für Sozialwissenschaften. Carstensen, C. H. (2013). Linking PISA competencies over three cycles: Results from Germany. In M. Prenzel, M. Kobarg, K. ­ Schöps, & S. Rönnebeck (Eds.), Research on

255

PISA: Research outcomes of the PISA Research Conference 2009 (pp. 199–213). Netherlands: Springer. https://doi.org/10. 1007/978-94-007-4458-5_12 Cresswell, J., Schwantner, U., & Waters, C. (2015). A review of international large-scale assessments in education: Assessing component skills and collecting contextual data. Paris: OECD Publishing. Drechsel, B., & Prenzel, M. (2008). Aus Vergleichsstudien lernen: Aufbau, Durchführung und Interpretation internationaler Vergleichsstudien. Schulmanagement – Handbuch (Vol. 126). München: Oldenbourg. Drechsel, B., Prenzel, M., & Seidel, T. (2014). Nationale und internationale Schulleistungsstudien. In E. Wild (Ed.), Pädagogische psychologie (2nd ed., pp. 343–368). Berlin: Springer. Ehmke, T., & Siegle, T. (2005). ISEI, ISCED, HOMEPOS, ESCS. Zeitschrift für Erziehungswissenschaft, 8(4), 521–540. Gebhardt, M., Rauch, D., Mang, J., Sälzer, C., & Stanat, P. (2013). Mathematische Kompetenz von Schülerinnen und Schülern mit Zuwanderungshintergrund. In M. Prenzel, C. Sälzer, E. Klieme, & O. Köller (Eds.), PISA 2012: Fortschritte und Herausforderungen in Deutschland (pp. 275–308). Münster u.a.: Waxmann. Goldhammer, F., Lüdtke, O., Martens, T., & Christoph, G. (2016). OECD education working papers (Vol. 133). Paris: OECD Publishing. Hartong, S. (2012). Overcoming resistance to change: PISA, school reform in Germany and the example of Lower Saxony. Journal of Education Policy, 27(6), 747–760. https:// doi.org/10.1080/02680939.2012.672657 Hohn, K., Schiepe-Tiska, A., Sälzer, C., & Artelt, C. (2013). Lesekompetenz in PISA 2012: Veränderungen und Perspektiven. In M. Prenzel, C. Sälzer, E. Klieme, & O. Köller (Eds.), PISA 2012: Fortschritte und Herausforderungen in Deutschland [PISA 2012: Progress and Challenges in Germany] (pp. 217–244). Münster: Waxmann. Klieme, E., Avenarius, H., Blum, W., Döbrich, P., Gruber, H., Prenzel, M., Reiss, K., Riquarts, K., Rost, J., Tenorth, H.-E., & Vollmer, H. J. (2003). Zur Entwicklung nationaler Bildungsstandards: Expertise. Bonn: BMBF.

256

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Klieme, E., Neubrandt, M., & Lüdtke, O. (2001). Mathematische Grundbildung: Testkonzeption und Ergebnisse. In J. Baumert, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, W. Schneider, P. Stanat, K.-J. Tillmann, & M. Weiß (Eds.), PISA 2000: Basiskompetenzen von Schülerinnen und Schülern im internationalen Vergleich [PISA 2000: Basic student competencies in an international comparison] (pp. 141–191). Opladen: Leske + Budrich. KMK (2002). PISA 2000: Zentrale Handlungsfelder. Zusammenfassende Darstellung der laufenden und geplanten Maßnahmen. Beschluss der 299. Kultusministerkonferenz vom 17./18.10.2002. [PISA 2000: Central Fields of Action. Summary of current and planned measures]. Available at www.kmk. org/fileadmin/Dateien/veroeffentlichungen_ beschluesse/2002/2002_10_07-Pisa2000-Zentrale-Handlungsfelder.pdf KMK (2006). Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland. Gesamtstrategie der Kultusministerkonferenz zum Bildungsmonitoring. München: Wolters Kluwer. KMK (2015). Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland: Förderstrategie für leistungsstarke Schülerinnen und Schüler (Beschluss der Kultusministerkonferenz vom 11.06.2015) [Strategy to support top-­ performing students: Order of the KMK on June 11, 2015]. Available at www.kmk.org/ fileadmin/Dateien/pdf/350-KMK-TOP-011Fu-Leistungsstarke_-_neu.pdf Kuger, S., Klieme, E., Jude, N., & Kaplan, D. (2017). Assessing contexts of learning: An international perspective. Methodology of educational measurement and assessment. Berlin: Springer International. Martin, M. O., Mullis, I. V. S., Foy, P., & Hooper, M. (2016). TIMSS 2015 International results in science. Chestnut Hill, MA: International Association for the Evaluation of Educational Achievement. Müller, K., & Ehmke, T. (2013). Soziale Herkunft als Bedingung der Kompetenzentwicklung. In M. Prenzel, C. Sälzer, E. Klieme, & O. Köller (Eds.), PISA 2012: Fortschritte und Herausforderungen in Deutschland (pp. 245–274). Münster u.a.: Waxmann.

Müller, K., & Ehmke, T. (2016). Soziale Herkunft und Kompetenzerwerb. In: K. Reiss, C. Sälzer, A. Schiepe-Tiska, E. Klieme, & O. Köller (Eds.), PISA 2015. Eine Studie zwischen Kontinuität und Innovation [PISA 2015. A study between continuity and innovation] (pp. 285–316). Münster: Waxmann. Mullis I., Martin, M., Foy, P., & Hooper, M. (2016). TIMSS 2015 International results in mathematics. Chestnut Hill, MA: International Association for the Evaluation of Educational Achievement. Mullis, I. V. S., Martin, M. O., Foy, P., & Hooper, M. (2017). PIRLS 2016 International results in reading. Chestnut Hill, MA: International Association for the Evaluation of Educational Achievement. OECD (Ed.) (1999). Measuring student knowledge and skills: A new framework for assessment. Paris: OECD Publishing. OECD (2013). OECD skills outlook 2013: First results from the survey of adult skills. Paris: OECD Publishing. OECD (2016a). PISA 2015 assessment and analytical framework: Science, reading, mathematic and financial literacy. Paris: OECD Publishing. OECD (2016b). PISA 2015 results. Vol. I: Excellence and equity in education. Paris: OECD Publishing. OECD (Ed.) (2017a). PISA 2015 results. Vol. III: Students’ well being: Students’ use of ICT outside of school. Paris: OECD Publishing. OECD (2017b). PISA 2015 technical report. Paris: OECD Publishing. Purves, A. C. (1987). The evolution of the IEA: A memoir. Comparative Education Review, 31(1), 10–28. Rauch, D., Mang, J., Härtig, H., & Haag, N. (2016). Naturwissenschaftliche Kompetenz von Schülerinnen und Schülern mit Zuwanderungshintergrund. In K. Reiss, C. Sälzer, A. Schiepe-Tiska, E. Klieme, & O. Köller (Eds.), PISA 2015: Eine Studie zwischen Kontinuität und Innovation (pp. 317–348). Münster and New York: Waxmann. Reiss, K., & Sälzer, C. (2016). Fünfzehn Jahre PISA: Bilanz und Ausblick. In K. Reiss, C. Sälzer, A. Schiepe-Tiska, E. Klieme, & O. Köller (Eds.), PISA 2015: Eine Studie zwischen Kontinuität und Innovation [PISA 2015:

EXAMINING CHANGE OVER TIME IN INTERNATIONAL LARGE-SCALE ASSESSMENTS

A study between continuity and innovation] (pp. 375–382). Münster: Waxmann. Reiss, K., Sälzer, C., Schiepe-Tiska, A., Klieme, E., & Köller, O. (Eds.) (2016). PISA 2015: Eine Studie zwischen Kontinuität und Innovation [PISA 2015: A study between continuity and innovation]. Münster: Waxmann. Robitzsch, A., Lüdtke, O., Köller, O., Kröhne, U., Goldhammer, F., & Heine, J.-H. (2017). Herausforderungen bei der Schätzung von Trends in Schulleistungsstudien: Eine Skalierung der deutschen PISA-Daten [Challenges in estimating trends in large-scale assessments: Scaling the German PISA data]. Diagnostica, 63, 148–165. https://doi. org/10.1026/0012-1924/a000177 Roeder, P. M. (2003). TIMSS und PISA: Chancen eines neuen Anfangs in Bildungspolitik, -Planung, -Verwaltung und Unterricht. Endlich ein Schock mit Folgen? Zeitschrift für Pädagogik, 49(2), 180–197. Rutkowski, D., & Prusinski, E. L. (2011). The limits and possibilities of international largescale assessments. Education Policy Briefs, 9(2). Retrieved from https://files.eric.ed.gov/ fulltext/ED531823.pdf Sälzer, C., Reiss, K., Schiepe-Tiska, A., Prenzel, M., & Heinze, A. (2013). Zwischen Grundlagenwissen und Anwendungsbezug: Mathematische Kompetenz im internationalen Vergleich. In M. Prenzel, C. Sälzer, E. Klieme, & O. Köller (Eds.), PISA 2012: Fortschritte und Herausforderungen in Deutschland [PISA 2012: Progress and challenges in Germany] (pp. 47–98). Münster: Waxmann. Schiepe-Tiska, A., Rönnebeck, S., Schöps, K., Neumann, K., Schmidtner, S., Parchmann, I., & Prenzel, M. (2016). Naturwissenschaftliche Kompetenz in PISA 2015: Ergebnisse des internationalen Vergleichs mit einem modifizierten Testansatz. In K. Reiss, C. Sälzer, A. Schiepe-Tiska, E. Klieme, & O. Köller (Eds.), PISA 2015: Eine Studie zwischen Kontinuität und Innovation [PISA 2015: A study between continuity and innovation] (pp. 133–176). Münster: Waxmann. Seidel, T., & Prenzel, M. (2008). Large-scale assessment. In J. Hartig, E. Klieme, & D.

257

Leutner (Eds.), Assessment of competencies in educational contexts (pp. 279–304). ­Göttingen: Hogrefe & Huber. Stanat, P., Rauch, D., & Segeritz, M. (2010). Schülerinnen und Schüler mit Migrationshintergrund [Students with an immigrant background]. In E. Klieme, C. Artelt, J. Hartig, N. Jude, O. Köller, M. Prenzel, & P. Stanat (Eds.), PISA 2009: Bilanz nach einem Jahrzehnt (pp. 200–230). Münster, New York, München and Berlin: Waxmann. Stanat, P., Schipolowski, S., Rjosk, C., Weirich, S., & Haag, N. (Eds.) (2017). IQB-Bildungstrend 2016: Kompetenzen in den Fächern Deutsch und Mathematik am Ende der 4. Jahrgangsstufe in zweiten Ländervergleich. [IQB trends in student achievement 2016]. Münster and New York: Waxmann. Wagemaker, H. (2014). International largescale assessments: From research to policy. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 11–36). New York: Taylor & Francis. Walter, O., & Taskinen, P. (2007). Kompetenzen und bildungsrelevante Einstellungen von Jugendlichen mit Migrationshintergrund in Deutschland: Ein Vergleich mit ausgewählten OECD-Staaten [Competencies and educational attitudes of students with an immigrant background in Germany: A comparison with selected OECD-countries]. In M. Prenzel, C. Artelt, J. Baumert, W. Blum, M. Hammann, E. Klieme, & R. Pekrun (Eds.), PISA 2006: Ergebnisse der dritten internationalen Vergleichsstudie (pp. 337– 366). Münster: Waxmann. Weis, M., Zehner, F., Sälzer, C., Strohmaier, A., Artelt, C., & Pfost, M. (2016). Lesekompetenz in PISA 2015. Ergebnisse, Veränderungen und Perspektiven. In K. Reiss, C. Sälzer, A. Schiepe-Tiska, E. Klieme und O. Köller (Eds.), PISA 2015: Eine Studie zwischen Kontinuität und Innovation [PISA 2015: A study between continuity and innovation] (pp. 249–284). Münster: Waxmann.

14 Qualitative Comparative Education Research: Perennial Issues, New Approaches and Good Practice Michele Schweisfurth

INTRODUCTION Comparative education’s roots lie in ‘travellers’ tales’ of exotic educational practices found abroad by mobile scholars. While these were narrative rather than quantitative, it would be a misnomer to label them ‘qualitative research’ as they lacked clear purpose and rigour in data collection and analysis (and so, effectively, were not research at all). In their ethnocentricity, they also bypassed some of the most basic precepts of qualitative educational research: the need to capture the lived realities of educational experience, reflecting as closely as possible the perspectives of the actors involved. When rigour was introduced, notably by Marc-Antione Jullien de Paris in 1817, the focus was on objectivity and comparability within a positivist framework: a ‘science of education’ from which to extract general principles that might be universally applied to improve practice:

to build up, for this science, as has been done for other branches of knowledge, collections of facts and observations arranged in analytical tables so that they can be correlated and compared with a view to deducing therefrom firm principles and specific rules so that education may become virtually a positive science instead of being left to the whims of the narrow-minded, blinkered people in charge of it or diverted from the straight and narrow path by the prejudices of blind routine or by the spirit of system and innovation. (Jullien 1817, cited in Gautherin, 1993: 6)

Jullien’s grand ambition was to gather comparable data across Europe using an extensive questionnaire and periods of fieldwork. The questionnaire items included themes such as administrative arrangements in schools, about the numbers, ages, training and appointment of staff, and the types of institution. To complement these, he intended to gather non-numerical data, including ‘reputation’ and ‘trustworthiness’, but still within an ostensibly ‘scientific’ approach and an evidently normative framework (Gautherin, 1993).

QUALITATIVE COMPARATIVE EDUCATION RESEARCH

Jullien’s ambition was never realised, but some of the best and worst things about his then-innovative ideas still prevail within the field of comparative education. While there is a very rich qualitative tradition in the field, quantitative studies are particularly influential in policy circles and are well known beyond the boundaries of comparative education. In particular, testing regimes such as PISA, TIMMS and PIACC have become almost synonymous with the field to those not familiar with its more finely-grained nuances (Schweisfurth, 2014). These testing regimes generate league tables of national results and a quest for the right mix of ingredients, isolated as discrete variables, to improve results. But do they get at the heart of teaching and learning, and do they tell us anything about how culture and the ‘sticky materiality’ (Tsing, 2005) of local practice shape experiences and outcomes? This is where good qualitative comparative research is needed, and it treats holistically the issues that quantitative approaches alone cannot reach. In the world of educational research, there are purists who would call themselves ‘qualitative researchers’ (as opposed to researchers who use qualitative methods) and who would argue that the isolation of variables atomises educational experiences to the extent that meaning is lost. They might also argue that the subjectivity of educational experience is paramount in understanding why, for example, some students are more ‘successful’ than others (and possibly whether the label of success is doing violence to them) (for a fuller discussion of qualitative research in education, see Torrance, 2010). That subjectivity would necessarily extend beyond the usual boundaries set by quantitative school effectiveness research, to include culturallybound questions of how education is valued, and what governs relationships between particular teachers and learners or between learners. For some, that commitment to the ‘interpretive paradigm’ is such that they would see it as epistemological inconsistency to include quantitative methods as part

259

of their approach to gathering data. Equally, researchers wedded to quantitative methods might feel the same way about their exclusive use. This chapter will, however, avoid setting out qualitative and quantitative methods as dichotomous, focusing instead on what cannot be answered through quantitative data alone. It will not rehearse the ‘paradigm wars’ and general arguments about why researchers may choose to use qualitative methods in education more generally. In the spirit of this Handbook, it will therefore focus on how the particularities of comparative research demand effective qualitative methods, even if they are not used exclusively. Some of these particularities of comparative research include insider and outsider cultural perspectives; the need to understand the inner workings and influence of context (however defined); and the moral obligation in a post-colonial world to avoid imposing inappropriate agendas. The chapter starts by considering the contributions that qualitative methods can make to comparative research in education. It then turns to relevant longstanding discussions about insider/outsider perspectives, units of analysis, and functional, structural and cultural equivalences. It considers how we might revisit these in the contemporary context of international education governance, forces of global convergence, increasing diversity within sites of research, and the resilience of local practice, and how comparative qualitative research can help us to understand these phenomena. Finally, it explores the promise of new approaches to comparative case study research to make qualitative research meaningful in policy arenas while remaining true to history, context, lived experiences and felt priorities. As such, it sets out an agenda for the practice of comparative education research that incorporates traditional comparative education perspectives while including new methodological insights relevant to the contemporary context of globalisation. Throughout, based in part on my own expertise as a comparative researcher, I draw on the

260

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

example of researching pedagogy to illustrate the value and practice of effective qualitative research. What is true for pedagogy is probably true for all human aspects of the educational endeavour – in other words, all of them.

WHY USE QUALITATIVE METHODS IN COMPARATIVE EDUCATION RESEARCH? Crossley and Vulliamy’s (1997) volume on qualitative international research collectively sets out two central and still relevant rationales for the use of qualitative methods: ‘the importance of context and the dangers of uncritical international transfer’ (Crossley & Vulliamy, 1997: 23). While their volume is now 20 years old and focused on developing countries, these principles still pertain and apply to all settings. How do we understand context, and how can qualitative methods help us to do so? Let us use the example of studying pedagogical practice in comparative perspective. At its simplest and most easily-researched level, the national context usually sets the policy framework within which educators act and learners learn (while noting that in some countries, such as Canada or Germany, education policy is a matter devolved to sub-national units). This framework is of course influential in determining what happens in educational situations as teachers are accountable to it and learners’ entitlements are dictated by it. If it were so simple, there would be little need for in-depth, qualitative, ‘processual’ (Bartlett & Vavrus, 2017) approaches. Researchers could do a content analysis of educational policy in two countries, identifying similarities and differences, and that would reveal much of what they want to know. If they wanted to compare the extent to which the prescriptions of policy are being followed in classrooms, an itemised observation schedule based on the policy would reveal in which country there

is a closer match between policy and practice. But context is far more complex than the black-and-white contents of education policy, and this complexity requires finer-grained and participant-informed approaches. And if we want to know why people are doing what they do, regardless of whether it reflects policy, we need to go beyond what is readable, observable and countable to understand the wider and more subtle workings of context. Policy sociology teaches us that there is rarely a good fit between policy prescriptions and practice. First, this is because the nature of policy is that it is enacted by people who did not write it; who have their own interpretations of it; who may resist, ignore or selectively adapt it, or not understand its intentions; and who may have their own agendas based on what they feel is best for the learners in their care and for themselves (Ball, Maguire, & Braun, 2012). All the way along the policy-into-practice line this will be shaped by actors’ own perceptions. These may be idiosyncratic, but they are also shaped by cultural realities, including how education is valued by teachers and learners, and how much ‘power distance’ (Hofstede, 2001) is normal between teachers and learners (or adults and children) in a given cultural setting. Methods such as ethnography can help the researcher to understand these through the perspectives of the actors involved and to glean why differences between policy and practice and between different settings are evident. ‘Purist’ qualitative researchers would probably be oriented to studying processes of making meaning in the research setting, but some researchers might prefer to take a more variable-oriented approach in order to determine the kinds of factors that appear to shape the phenomenon (for a discussion, see Bartlett & Vavrus, 2017). Sometimes ‘qualitative’ is a question of degree, and where qualitative research begins and ends is a contested matter. In considering what context means for this kind of study, it is also important to note that while policy is (usually) a national matter, a multiscalar approach to comparing classroom

QUALITATIVE COMPARATIVE EDUCATION RESEARCH

practice is more fruitful in a context of globalisation and policy borrowing. This is in part because of the way that local practice is nested in the wider national context, as above. However, in the contemporary world, national policy is also nested in a context of global governance, policy borrowing and the more nebulous workings of international fashion. The latter can also affect practice directly, as teachers and learners can be transnationally networked through, for example, social media. Again, qualitative approaches can help to uncover the vectors of influence: whether, for example, policymakers feel coerced into particular decisions by power imbalances (as can occur with aid recipients) or whether they are using international comparators selectively to promote particular agendas. Equally, the study of teachers’ and learners’ perspectives on the influence – good or bad – of international ideas of good practice can be very revealing of why and how they are adopted, adapted or rejected. Finally, qualitative research in comparative studies of education is particularly valuable for showing how different contextual factors work together holistically (not simply exist or correlate, and not necessarily work as ‘causal factors’). A good example of this is Hufton and Elliott’s concept of a ‘pedagogical nexus’. Rather than starting with a predetermined set of variables and seeing how these ingredients interact, longerterm immersion in the Russian context (and implied comparison with their own national setting of England) led them inductively to set out a description of ‘a set of linked, interactive and mutually reinforcing influences on pupils’ motivation to learn with and because of the schooling process’ (Hufton & Elliott, 2000: 115). They were struck by how neighbourhood comprehensive schools in Russia were able to provide an undifferentiated entitlement, to apparently highly-engaged and motivated students, and also by the continuity and stability of this provision over time (see also Alexander, 2000). This ‘pedagogical nexus’ was not immediately discernible in the

261

classroom but required attentive and openlyframed observation and discussions over an extended period. It included, for example, continuity of school and class teacher; intergenerational continuity; a particular quality of home–school relations; preparation of children for school readiness; articulation of national curriculum, textbooks and pedagogy; and the nature of lessons, textbooks, homework and assessment. They also acknowledge wider cultural forces. The resilience and internally-reinforcing nature of the nexus points not only to the value of immersive and inductive approaches to research, but to the dangers of attempting to transfer one or a few ingredients of a successful system to another in isolation from other parts of the nexus. This question of the avoidance of inappropriate transfer is the second of Crossley and Vulliamy’s key points about the value of qualitative methods in international research. Qualitative methods can pre-empt or reveal inappropriate transfer in a number of ways. First, by taking an inductive approach, researchers can avoid arriving at the site of research with a predetermined frame derived outside the context of the research. So, for example, arriving in a foreign classroom with an itemised observation schedule based on ‘good practice’ in one’s home country could easily lead to ethnocentric judgements about the quality of what is being observed. This can then lead to policy advice which promotes the fixed notion of ‘good practice’ without due attention to the pedagogical nexus in the studied context or to the intentions behind what is being observed, which may reflect local priorities invisible to the observer. It also encourages a harmful deficit perspective on local practice. Fascinating qualitative work has been conducted in post-colonial contexts to understand the tensions between indigenous approaches to and values in education and externally-imposed agendas. These agendas could be defined as inappropriate, not least for how they marginalise local knowledge and generate a deficit view of people

262

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

who have not had ‘Western’ education. One example is Brian Street’s influential work on literacies, which has demonstrated that what has been narrowly defined technically and ‘neutrally’ as the skills of reading and writing, are far more complex in lived practice. The narrow definitions dismiss and overlook a range of sophisticated everyday literacies (Street, 1984). A further relevant example is Robert Aman’s study of how interculturality is understood among indigenous people in the Andes region, in comparison to how it is framed by international organisations such as UNESCO and the European Union. He demonstrates how indigenous knowledge has been marginalised but that new pressures for decolonisation are emerging (Aman, 2014). What these studies have in common is humanist perspectives, time in field, and detailed attention to understanding the worldview of the research subjects. They constitute important counterpoints to dominant discourses about who should learn what, why and how. This approach, however, comes at a cost in a world looking for universal, monitorable policy solutions, as Street notes: The more ethnographers focus on specific local contexts, the harder does it seem to ‘upscale’ their projects to take account of the large numbers of people seen to be in need. As one of the generals in charge of the literacy programme in Egypt said to me, ‘your micro accounts of different literacies

for different people are no help to us in our Campaign; I have 10 million illiterates’. (Street, 2000, Foreword to Robinson-Pant, 2000: vi–vii)

While in policy terms such studies may not be convenient, they are essential steps in reducing the ‘colonial entanglement’ (Takayama, Sriprakash, & Connell, 2016) of the field of comparative education.

KEY CONCEPTS: ONGOING DEBATES AND NEW INSIGHTS Comparative education has evolved over time to reflect methodological advances, changing value perspectives, and new balances of power and influence. While more traditional approaches to comparative education remain of value and in use, new layers are added in response to such changes (see Phillips & Schweisfurth, 2014). In this section, I will explore three longstanding debates in comparative education, considering how they are shifting and what these shifts mean for qualitative research: (1) units of analysis; (2) functional, structural and cultural equivalences; and (3) insider/outsider perspectives. Table 14.1 provides a summary of some of the contrasts between traditional comparative approaches and contemporary revisions, in relation to these debates.

Table 14.1  Summary of the contrasts between traditional comparative approaches and contemporary revisions Methodological theme Traditional comparative approaches

Recent additions/revisions

Units of analysis

‘Policyscapes’ (Carney, 2008) The global ‘in’ the local

Functional, structural, cultural equivalences

Insider/outsider perspectives

Methodological nationalism Extended units of analysis (Bray & Thomas, 1995) Context as bounded Ensuring comparability (‘like with like’) Explaining similarities and differences

Researchers as insiders or outsiders Dominance of western gaze on ‘the other’

Plurality of belief systems (and therefore functions, cultural meanings) within as well as across units Official positions as discourse Greater freedom to select comparators in context of global influence Hybridities, transnational identities – ‘in-betweeners’ (Milligan, 2016) Emphasis on partnership and decolonisation

QUALITATIVE COMPARATIVE EDUCATION RESEARCH

Units of Analysis The field of comparative education has in the past been accused of ‘methodological nationalism’ (Robertson & Dale, 2008; Winmer & Schiller, 2002). The most traditional of comparative approaches to education would involve comparing some education phenomenon within the formal education system in Country A with the same phenomenon in Country B (and, potentially, in Country C, D and E). The national context, as noted above, in most cases sets out a policy framework which may itself be the object of study, but in any case, is assumed to frame whatever is being analysed. It also dictates structures within which a phenomenon is bounded: whether, for example, students start secondary schooling at 11 or 14 years of age, and how much choice they have in what they study when they do. The provision of educational resources, human, financial or material, is also treated as bounded by national economic constraints and political choices about how well funded education systems are, and how resources are distributed. Beyond this focus on policy and structures, historically, national context is seen to contain a wide range of historical and cultural variables. The notion of ‘national character’ (Hans, 1949) has historically also been influential in accounting for how education is shaped by culture and how it in turn shapes it. This tendency continues into the present day with use of frameworks such as Hofstede’s quantified national profiles (2001) to explain in sometimes deterministic ways how education is shaped and enacted. So, using the example of pedagogy, in Confucian Heritage Cultures such as China, pedagogy is explained as being shaped by high power distance and collectivism, leading with a degree of inevitability to authoritarian and authoritative whole-class approaches (Watkins & Biggs, 1996). Bray and Thomas’s influential work on units of analysis (1995) went some way to unbinding context from the national level

263

and provided a detailed set of alternatives for comparative education researchers, including supra-national units, including the global context and global regions, and sub-national units, such as regions, institutions, classrooms and even individual teachers and learners. The precise range of alternatives is presented as a cube, which opens up the range of possibilities but continues to suggest that context is bounded and nameable – this is particularly helpful for variable-oriented studies or when studies are intended to inform policy or practice interventions at a particular level of analysis. Some contemporary approaches to comparison draw on fine-grained qualitative studies in different settings to unbind context. Carney (2008), for example, uses the concept of ‘policyscapes’ to show how in vastly different settings we find resonances in learner aspirations which suggest that what is influencing them is not easily contained within traditional units of analysis. By working inductively and with an open frame in relation to units, such scapes are made visible and help to inform the ways that the global can be found ‘in’ the local in a world of new connectivities.

Equivalences: Functional, Structural and Cultural Many readers will recall adages about ‘comparing like with like’ as a foundation to solid comparisons. Comparing apples with oranges, for example, is dismissed as not fair to either and would put rigour in jeopardy, not least if the oranges are in China and the apples in Finland. In order for comparison to be rigorous, the expectation would be that what was being compared was functionally, structurally and culturally equivalent (Phillips & Schweisfurth, 2014) and that this would need to be established prior to doing research. So, for example, if the researcher was comparing early childhood education (ECE) in two countries, he or she would first ask: does

264

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

ECE serve the same function? Does it, for example, prepare children for primary schooling by teaching basic literacy skills? Second, the researcher would look at structures, asking, for example, at what age do children start and finish ECE, and when is it compulsory and/or free? Finally, the question of cultural equivalence would be considered, checking, for example, whether ECE is valued culturally or whether parents are expected to stay at home with children as part of typical family cultural practice. Ensuring these equivalences were in place prior to comparison would avoid some of the pitfalls of misreading a situation and ascribing observations wrongly, for example, by assuming that teachers decide on what to teach in both contexts being compared, or parents choose whether to send their children to ECE. It would also ensure that differences between what was found in two or more settings being compared could be explained beyond these fundamental differences. Variables could be isolated more rigorously within this tight framing. Qualitative research still demands attention to function, structure and culture, but how these are conceptualised and framed has moved on from the fixed perspectives on contextual ‘factors’ that characterised comparative education in the past. Policy positions on functions and structures are more likely to be analysed as discourses which may be enacted substantially differently in local settings, and close qualitative analyses of those discourses and their differential effects are important projects for comparative research. Culturally, a far less deterministic approach is adopted to explaining how culture shapes policy and practice. In a global context of mobility, diversity and the ready transfer of cultural norms as well as policies, there is also a recognition of pluralism. In a given country, for example, there may be groups for whom staying at home with children is seen as an obligation and gender-based duty, and other groups for whom work is seen as a right or necessity. To understand this cultural

richness requires study in its own right, as opposed to framing it out of the study as a generalised equivalence at the national level.

Insider/Outsider Perspectives The relationship that a comparative researcher has to the contexts under study is an important shaper of the research. Qualitative traditions acknowledge the subjectivity not only of those actors being researched; they note that researchers bring with them their own ways of seeing and interpreting the world, shaped by their experiences and their cultural frames of reference. In comparative studies, these subjectivities can be amplified by the fact that the researcher is usually a relative outsider in at least one of the contexts under study. As Phillips and Schweisfurth (2014) elaborate, this can have a number of effects. The greatest risk is ethnocentrism. An ethnocentric perspective makes the lens of the researcher judgemental. So, for example, someone who studied in schools steeped in learner-centred traditions of pedagogy, with relative freedom for learners to explore their own interests and adopt high levels of collaborative engagement, might, as an outsider, judge more teacher-centred lessons to be restrictive, boring, or even cruel. However, a more nuanced view might reveal ways in which students and teachers work creatively and effectively within these different pedagogical traditions, or the ways in which students demand those relationships as much as teachers impose them. This nuanced view requires time and attention to detail as well as understanding of how those practices and relationships work on the inside. An outsider might also have a tendency to exaggerate differences in what he or she observes and reports, when the foundations of the teaching and learning are in fact largely shared between the settings under study. Intercultural skills and an informed comparative perspective can mitigate these, given that

QUALITATIVE COMPARATIVE EDUCATION RESEARCH

being an outsider some of the time is virtually inevitable in comparative research. As a dimension of this, language skills matter and constant cross-checking of meaning will be important if one is working in a second or third language. Insider status is not unproblematic either. It is easy to ignore and to avoid questioning what is perceived as normal practice. In my own comparative experience as a researcher using qualitative methods to study pedagogy, I have often combined observation of lessons with teacher interviews about their life histories and their practice. This has allowed me to triangulate what I see with what they say about what they are doing and why. As an outsider, in this process I am likely to notice things that fade into the background as an insider. So, for example, when I asked in Russia why teachers always asked students to put their books on the same side of the desk at the end of a lesson, I was met with blank expressions: this is simply what was always done, including by their own teachers, and they hadn’t thought about it. This demonstrated the power of tradition and formula in this context, in a way that would have been more challenging for an insider to unpack. In the contemporary world, though, many researchers as well as their subjects have hybrid identities that blur the boundaries of insider and outsider. So, using myself as an example again, as a Canadian who has lived and worked in the UK for most of her adult life, which context is ‘inside’ for me? And when I lived and taught in Indonesia for three years, did I ever stop being an outsider? New ways of conceptualising this hybridity – such as ‘in between-ness’ (Milligan, 2016) – are helpful. There is also a growing emphasis – particularly in post-colonial contexts – on the importance of combining perspectives. This needs to be part of the conceptualisation of the project from the outset, not just in the data gathering stage, to ensure that the potential for qualitative nuance is not curtailed by inappropriate outside framing of the research.

265

CONTEMPORARY APPROACHES TO CASE STUDY RESEARCH: BEYOND PARADIGM WARS AND TOWARDS RICHER UNDERSTANDING Case study is one of the foundational research designs for qualitative comparative research (although I would argue that the label is overused and more researchers think they are doing case studies than actually are). Comparative case studies offer in-depth insights into two or more ‘cases’ that are comparative by virtue of being in different contexts. Comparative cases could be schools, local authority offices, adult education programmes, or educational inspectorates, for example. Recent methodological advances and insights have done much to break down boundaries between qualitative and quantitative approaches, and to promote a richer, more holistic understanding of context in case study research. Here I focus briefly and illustratively on the work of Charles Ragin (2014 [1987]) and Lesley Bartlett and Fran Vavrus (2017) as examples of contemporary approaches to case study which elevate the traditional comparative ‘case’. Ragin’s work helps to address the problem of policy relevance, which is frequently raised as a rebuff to comparative work, through innovative ways of inferring causal relationships. Determining causality is generally considered problematic without the use of quantitative methods such as randomised control trials (what Ragin (2014 [1987]) and others refer to as ‘gold standard’ causality methods). So, we cannot know unequivocally whether a pedagogical intervention led to better teaching practices by studying these separately and qualitatively, even if they are happening in the same place and one is intended to affect the other. However, using both a case-oriented and a variableoriented approach helps to address this challenge. Case-orientation allows sensitivity to complexity, which can be hard to sustain across a wide range of cases, while

266

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

variable-orientation aims to understand the parts of the whole and how they influence each other. Across several cases, patterns emerge from which the researcher can infer: if you do x, y is a likely outcome. Bartlett and Vavrus have presented a systematic way to deepen and elaborate the study of context in comparative case study, to ensure that the full potential of qualitative research is realised. Their original presentation of comparative case studies systematised the vertical nature of cases – that is, how they are nested in, for example, regional imperatives, national policy, and international agreements. Recent work by these authors has extended this approach to include not only horizontal comparison (across cases) and vertical comparison (across stratified levels), but also transversal comparison (across time). For example, the introduction of a pedagogical innovation compared horizontally across two sites would also include analysis of how this is nested in regional needs and priorities, national policies, and global agendas around how quality education is conceptualised. It would also take into consideration policies and practices that existed before the innovation was introduced, and how the innovation plays out over time.

CONCLUSIONS: GOOD PRACTICE IN CONTEMPORARY COMPARATIVE EDUCATION In the light of the above discussion, and based on the principles that underpin it, I would like to conclude by offering some advice to less experienced comparative researchers undertaking qualitative studies. Reflexivity is not the answer to everything, but it is important in all research, and is especially important in qualitative research, and is of special value in comparative studies, where cultural difference can be foregrounded and where ethnocentrism is a risk. It is important as a researcher to question

one’s own understandings of culture and context to avoid judging or essentialising. As part of this, reflecting on the extent of one’s own hybridity can help to understand diversity at scale and complexity within individuals. Acknowledging power relations is a further side to this process. Even as a student, one’s status as, for example, an educated young man from the Global North can set up relations that have ethical implications as well as implications for the validity of findings. It is also important to take a reflexive approach to the question of ‘whose knowledge’? As noted above, in comparative studies there is almost always a degree of outsider-ness. To what extent are you an outsider, and what assumptions are you carrying with you as such? Combining insider and outsider perspectives is an excellent way to strengthen qualitative studies’ capacity to capture lived experience and subjectivities – and it is also a great way to learn. With qualitative research, time matters for a number of reasons. Be prepared to work inductively and with open frames for a sustained period as you negotiate research questions and methods and then come to terms with data. Premature closure – for example, in how you pose the ‘problem’ behind your research, or in the kinds of questions you will ask in interview – will turn the researcher gaze in directions that may cloud judgement. Time in the field is also important, to gain trust and to learn and practise communication patterns, as well as to understand the vertical and transversal axes of what is being studied. In terms of analysis, another way to avoid premature closure is to analyse data in the language in which it is collected, rather than translating it first. Datasets from different contexts in the same language might on one level seem to be more comparable, and on multilingual teams where data are being shared, the non-translation strategy might not be ideal. In all cases, eventually some translation will be necessary for presenting the findings. However, where feasible, sustaining engagement with the data in the form in which it was originally

QUALITATIVE COMPARATIVE EDUCATION RESEARCH

communicated will hopefully avoid the loss of illocutionary meaning or the connotations of the original terms. Some standard advice for comparative educationists regarding units of analysis, functional, structural and cultural equivalences and insider/outsider perspectives remains applicable to both quantitative and qualitative studies. However, the importance of some of these is amplified in qualitative research. New methodological advances, along with a changing world and new ways of seeing it, demand new ways of working and new ways of thinking about the relationship between culture, education, and research. Finally, we should not underestimate how the richness of qualitative research – in combination with quantitative research, or alone – can inform policy and practice. PISA scores and other cross-national ‘big data’ only tell part of the story, and it is a rigid and limited one. Even causality is not out of reach for qualitative researchers, although it may not be the starting point for comparative research concerned with understanding the complexities of human experience in context.

REFERENCES Alexander, R. J. (2000). Culture and pedagogy: International comparisons in primary education. Oxford: Blackwell. Aman, R. (2014). Why interculturidad is not interculturality. Cultural Studies, 29(2), 205–228. Ball, S. J., Maguire, M., & Braun, A. (2012). How schools do policy: Policy enactment in secondary schools. London: Routledge. Bartlett, L., & Vavrus, F. (2017). Rethinking case study research: A comparative approach. New York: Routledge. Bray, M., & Thomas, R. (1995). Levels of comparison in educational studies: Different insights from different literatures and the value of multilevel analysis. Harvard Educational Review, 65(3), 42–49.

267

Carney, S. (2008). Negotiating policy in an age of globalisation: Exploring educational ‘policyscapes’ in Denmark, Nepal and China. Comparative Education Review, 53(1), 63–88. Crossley, M., & Vulliamy, G. (Eds.) (1997). Qualitative educational research in developing countries: Current perspectives. London: Routledge (originally Garland). Gautherin, J. (1993). Marc-Antoine Jullien (‘Jullien de Paris’) (1775–1848). Prospects: The Quarterly Review Of Comparative Education, XXIII(3/4), 757–773. Paris: UNESCO, International Bureau of Education. Hans, N. (2011 [1949]). Comparative education: A study of educational factors and traditions. London: Routledge. Hofstede, G. (2001). Culture’s consequences (second edition). London: Sage. Hufton, N., & Elliott, J. (2000). Motivation to learn: The pedagogical nexus in the Russian school. Some implications for transnational research and policy borrowing. Educational Studies, 26(1), 115–136. Jullien de Paris, Marc-Antione (1817). Esquisse et vues préliminaires d’un ouvrage sur l’éducation comparée, et séries de questions sur l’éducation. Paris. Milligan, L. O. (2016). Insider–outsider–­ inbetweener? Researcher positioning, participative methods and cross-cultural educational research. In M. Crossley, L. Arthur, & E. McNess (Eds.), Revisiting ­insdier/outsider research in comparative and international education (pp. 131–143). Oxford: Symposium. Phillips, D., & Schweisfurth, M. (2014). Comparative and international education: An introduction to theory, method and practice (second edition). London: Bloomsbury. Ragin, C. (2014 [1987]). The comparative method: Moving beyond qualitative and quantitative strategies. Oakland, CA: University of California Press. Robertson, S. L., & Dale, R. (2008). Researching education in a globalising era: Beyond methodological nationalism, methodological statism, methodological educationism and spatial fetishism. In J. Resnick (Ed.), The production of educational knowledge in the global era. Rotterdam: Sense.

268

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Robinson-Pant, A. (2000). Why eat green cucumbers at the time of dying? Women’s literacy and development in Nepal. ­Hamburg: UNESCO Institute for Education. Schweisfurth, M. (2014). Among the comparativists: Ethnographic perspectives. Comparative Education, 50(1), 102–111. Street, B. (1984). Literacy in theory and practice. New York: Cambridge University Press. Takayama, K., Sriprakash, A., & Connell, R. (2016). Towards a postcolonial comparative and international education. Comparative Education Review, 61.

Torrance, H. (Ed.) (2010). Qualitative research methods in education. London: Sage. Tsing, A. (2005). Friction: An ethnography of global connections. Princeton, NJ: Princeton University Press. Watkins, D., & Biggs, J. (Eds.) (1996). The ­Chinese learner: Cultural, psychological and contextual influences. Hong Kong: ­Comparative Education Research Centre. Winmer, A., & Schiler, N. G. (2002). Methodological nationalism and beyond: Nation state building, migration and the social sciences. Global Networks, 2(4), 301–334.

15 Methodological Challenges in Conducting International Research on Teaching Quality Using Standardized Observations Anna-Katharina Praetorius, Wida Rogh, Courtney Bell and Eckhard Klieme

The interest in studying education on an international level has been growing considerably over the last decades (Suter, this volume), often with a focus on similarities and differences in educational structures (e.g., comprehensive versus between-school tracking educational systems; see Dupriez, Dumay, & Vause, 2008) or the knowledge and skills of students in different educational systems (e.g., OECD, 2016). Initiated by studies such as TIMSS Video 1995 and 1999 (e.g., Stigler, Gonzales, Kawanaka, Knoll, & Serrano, 1999), one also can observe an increased interest in assessing educational processes across countries, particularly in how teaching and teaching quality vary across countries (see also Paine, Blömeke, & Aydarova, 2016). This is highly relevant as the ‘cultural dimension of teaching infuses the organization of classroom

space, the use of time, classroom interactions, discourse and the use of language, classroom activities, and the decision teachers make about how and what to teach when to whom’ (Paine et al., 2016, p. 732). Focusing on these cultural differences by comparing countries is assumed to be helpful as a basis for educational policy decisions in the participating countries as well as for more research-oriented reasons, such as understanding a single educational system more deeply by, for example, identifying similarities in norms and practices within a country, or examining issues of variation and generalizability in both teaching practices and teaching effectiveness (e.g., Blömeke & Paine, 2008; Caro, Lenkeit, & Kyriakides, 2016; Creemers, 2006; Shimizu & Kaur, 2013). Different approaches have been developed for analyzing teaching quality internationally,

270

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

among them teacher surveys (e.g., TALIS: Vieluf, Kaplan, Klieme, & Bayer, 2012) and student surveys (e.g., PISA: Scherer, Nilsen, & Jansen, 2016; see also Sälzer & Prenzel, this volume; van de Vijver, Jude, & Kuger, this volume), a range of qualitative approaches (e.g., LeTendre, Hofer, & Shimizu, 2003), as well as standardized observations (e.g., TIMSS-Video 1995; see Stigler et al., 1999). Standardized observations have the advantage that they are implemented by professionals or research assistants who are trained to carry out ratings in an objective, reliable, and valid way (see e.g., Bell et al., 2014; Hill & Grossman, 2013; Praetorius, Lenske, & Helmke, 2012). Due to training and certification, such observers or raters have seen a broad range of classrooms as a basis for comparison and judgment, and they are neither part of the teaching nor do they have an ongoing relationship to any of the classroom participants. Their ratings of teaching quality are thus often considered preferable to other measurement approaches, such as teacher or student surveys (Clare, Valdés, Pascal, & Steinberg, 2001; Helmke, 2009; Pianta & Hamre, 2009) that are completed by participants in the teaching. These advantages multiply at the international level, because training supports the development of a single common understanding among raters from all participating countries (c.f. the International Test Commission’s (ITC) guideline 3 (ITC, 2018)), which is not the case for students and teachers. There are, however, considerable methodological challenges in conducting such standardized observations reliably and validly on an international level (Paine et al., 2016). The current chapter focuses on providing an overview of these challenges and how they are dealt with in international observation-based1 studies on teaching quality. We start by providing an overview of major international studies on teaching quality using standardized observations (Section 2). We then focus on the conceptual and methodological

challenges of carrying out observation-based research on teaching quality internationally (Section 3) before drawing conclusions regarding future international observationbased studies on teaching quality (Section 4).

OVERVIEW OF INTERNATIONAL STUDIES ON TEACHING QUALITY USING STANDARDIZED OBSERVATIONS While there is a lot of international research on teachers, there are only few studies that focus on teaching itself (Schleicher, 2011). The number of international studies that use standardized observation-based measures for teaching quality is even smaller (see also Paine et  al., 2016). In Table 15.1, an overview of major studies is presented. The nature and variety of these empirical studies is described further in the text. The two studies that internationally received most attention among politicians, researchers, as well as the public are certainly the two TIMSS Video studies from 1995 and 1999. In these cross-sectional studies, eighthgrade mathematics and science instruction was investigated. The studies were conducted by an international team headed by Jim Stigler from the University of California, Los Angeles, under contract with the United States’ National Center for Education Statistics (NCES). TIMSS Video 1995 covered lower secondary education in mathematics in three countries (Germany, Japan and the United States). TIMSS Video 1999 covered eighth-grade mathematics and science teaching in seven countries (Australia, the Czech Republic, Hong Kong, Japan, the Netherlands, Switzerland, and the United States). The studies involved videotaping and analyzing teaching practices in 50–100 classrooms per country in TIMSS 1995 and 50–140 classrooms in TIMSS 1999. The longitudinal Pythagoras study was based on insights from the TIMSS Video

USA, Japan, Germany

Video

Random selection

231 lessons Mathematics

Countries

Data collection mode

Number of classes and lessons

  Lessons total

QuiP

2008–2009

Video

Video

  Classes per country 100 in Germany, 50 For Mathematics: 87 in 20 in Germany, 19 in in Japan, and 81 in Australia, 100 in the Switzerland the United States Czech Republic, 100 in Hongkong, 50 in Japanb, 78 in the Netherlands, 140 in Switzerland, 83 in USA

3

390 lessons

47 in Germany, 25 in Finland, and 31 in Switzerland

103 classes

TALIS Video

2017–2018

Live

Expected to be 85

Variesa

(Continued)

Expected to be 1360 lessons

Variesa

Random selection

Video

Argentina, Belarus, Belgium Chile, Colombia, (Flanders), Brazil, Canada, Chile, England, Germany, China, Cyprus, Denmark, Finland, Madrid, Mexico, Japan, Germany, India, Ireland, Japan, Shanghai Malaysia, the Netherlands, Nigeria, Norway, South Africa, Taiwan, Turkey, UK, USA

Variesa

OECD (2018)

International System for Teacher Teaching and Learning Observation and Feedback International Survey

ISTOF

Fischer et al. (2014) Muijs et al. (2018) Neumann et al. (2010) Teddlie et al. (2006) Geller et al. (2008)

Australia, China, Finland, Germany, Czech Republic, Switzerland Germany, Israel, Japan, Philippines, Singapore, South Africa, South Korea, Sweden, USA

1999–2001

Clarke et al. (2006a) Clarke et al. (2006b)

Learner’s Perspective Quality Instruction in Study Physics

LPS

Voluntary participation Purposeful sample Voluntary participation Variesa of locally defined ‘competent teachers’

Video

Germany, Switzerland

2002–2003

439 lessons in Science; 195 lessons 638 lessons in Mathematics

Random selection

Video

Australia, Czech Republic, Hongkong, Japanb, Netherlands, Switzerland, USA

1998–1999

1994–1995

School year

Klieme et al. (2009) Lipowsky et al. (2009)

Stigler et al. (1999) Hiebert et al. (2003) Stigler & Hiebert Roth et al. (2006) (1999)

Pythagoras

Main references

TIMSS Video 1999

Third International Trends in Mathematics Pythagoras Study Mathematics and and Science Video Study Science Video Study

TIMSS Video 1995

Study

Study name

Overview Category

Table 15.1  Overview of major international observation-based studies on teaching quality

8th grade

None

None

Mathematics

None

Subject

Content selection

Student assessment None

10

LPS

Pre-post design

Pythagorean theorem Algebraic word problems

Mathematics

Student work

Sweden, China, and South Korea: fixed lesson on Algebra

Mathematics

Switzerland: 8th grade 8th grade Germany: 9th grade

5

Pythagoras a

Varies

ISTOF

Pre-post design

Electrical energy and power

Physics

Variesa

Varies

a

Variesa

Finland: 9th grade Variesa Germany, Switzerland: 10th grade

1

QuiP

Pre-post design

Quadratic equation

Mathematics

Lower secondary

2

TALIS Video

Notes: a The aim of the ISTOF study was to develop an international instrument. The instrument has since then been used in several studies with different foci and samples. b Although Japan did not participate in the TIMSS 1999 Video study, the Japanese Mathematics data as part of the TIMSS 1995 Video Study were reanalyzed for the TIMSS 1999 Video Study.

Mathematics, Science

1

8th grade

TIMSS Video 1999

Grade level

TIMSS Video 1995

Study

  Lessons per teacher 1

Overview Category

Table 15.1  (Continued)

METHODOLOGICAL CHALLENGES IN CONDUCTING INTERNATIONAL RESEARCH

studies and focused on investigating the impact of mathematics instruction on students’ cognitive and motivational outcomes in Germany and the German-speaking part of Switzerland. It was conducted by the German Institute for International Educational Research (DIPF) and the University of Zurich, and covered 19 eighth-grade and 20 ninth-grade classrooms, respectively, in each country (Klieme, Pauli, & Reusser, 2009). Two topics were focused on: the introduction to the Pythagorean theorem and word problems in algebra. With a stronger focus on the learners and the unfolding of instruction over lessons, the Learner’s Perspective Study (LPS) was initiated by David Clarke, Christine Keitel, and Yoshinori Shimizu to examine eighthgrade mathematics instruction in 12 countries (Clarke et al., 2006a; Clarke, Keitel, & Shimizu, 2006b). Based on the methodology and research design initially developed by David Clarke in Australia (Clarke, 2001), the video study was conducted by research teams in the participating countries from Europe, Asia, and Africa (Clarke et al., 2006a). It is argued that the combination of participating countries allows for a ‘good representation to different European and Asian educational traditions, affluent and less affluent school systems, and mono-cultural and multi-cultural societies’ (Clarke et al., 2006b, p. 2). In total, three excellent teachers per country were videotaped over the course of 10 lessons. In order to gain deeper information on instructional quality and student learning in physics, research teams in several countries adapted and refined the TIMSS Video methodology. Based on national attempts (e.g., IPN study in Germany: Seidel et  al., 2006; the Bern Video study in Switzerland: Labudde, 2002), the international QuIP study (Fischer, Labudde, Neumann, & Viiri, 2014) on teaching in physics was conducted. The study covered physics instruction in Finland, Germany (North Rhine-Westphalia), and Switzerland (German-speaking region). The countries were selected based on specific

273

differences between the countries (i.e., differences in student achievement in PISA, differences in school system regarding tracking, and country size). Unlike TIMSS Video, but similar to the Pythagoras study, QuIP used a longitudinal design with pre- and post-tests. The topic of the video-recorded lessons was fixed to an instructional unit on electricity in ninth and tenth grade. In each class, one introductory double lesson (90 minutes) on the relation between electrical energy and power was video-recorded. The lessons were either recorded as a whole on one day (90 minutes) or on two different days (45 minutes each day), depending on the countries’ and schools’ timetables (Fischer et al., 2014). The QuiP study covered videotaping in 25–47 classrooms per country. In Germany, recruitment started with a representative list of 60 schools from one German state, stratified by school type, but several schools refused to participate, and others were added; therefore, Germany ended up with a non-representative sample of 47 volunteering schools. The Swiss and Finnish samples were also recruited on a voluntary basis within certain regions. While the aforementioned studies are based on video observations, other studies used live observations to investigate teaching and learning instead. Some of these studies have used the International System for Teacher Observation and Feedback (ISTOF), a protocol developed by an international group of researchers, practitioners, and education advisers from 20 countries (Teddlie, Creemers, Kyriakides, Muijs, & Yu, 2006). The group included experts from East Asia, Africa, Europe, and North and South of America, taking into account a diversity of cultural perspectives and settings (Muijs et  al., 2018) and preventing an ethnocentric dominance of established educational effectiveness factors (Reynolds, 2006). Although ISTOF was developed to be used for comparative purposes, most studies so far have used ISTOF in a single context (e.g., Spain: De la Iglesia Mayol & Roselló Ramon, 2018; for exceptions see Ko, 2010; Miao, Reynolds,

274

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Harris, & Jones, 2015). The majority of studies used the ISTOF in the evaluation of AngloSaxon teacher programs such as Inspiring Teaching (Sammons, Kington, LindorffVijayendran, & Ortega, 2014). Up to 80 classroom observations have been reported in these studies (e.g., Day et al., 2008; Devine, Fahie, & McGillicuddy, 2013; Ko, 2010; Sammons et al., 2014), while sometimes only a few teachers have been involved (e.g., four teachers in Ko, 2010). A more recent study in the context of international video studies is the longitudinal Teaching and Learning International Study (TALIS) Video study, scheduled to be completed in 2020 (see OECD, 2018). Initiated by the OECD and carried out by an international consortium (RAND, Educational Testing Service, and DIPF), the longitudinal study focuses on mathematics instruction in lower secondary classrooms in eight educational systems, namely Chile, Colombia, England, Germany, Japan, Madrid (Spain), Mexico, and Shanghai (China). The sampling design required stratified random samples of schools, with one teacher from each of the 85 schools participating in each educational system’s replication of the study’s design. The mathematics classes taught by each participating teacher were reviewed for eligibility (i.e., if the focal topic of the study was taught, a class was eligible) and one class per teacher was selected to be recorded for two lessons. The study focuses on the instructional unit of quadratic equations.

CHALLENGES OF CARRYING OUT INTERNATIONAL OBSERVATIONBASED RATINGS OF TEACHING QUALITY The fundamental idea of international research on teaching is to describe, understand, and explain the variety of ways teaching occurs across different systems. This requires, however, comparable measures

across countries – something that cannot be taken for granted and requires many careful decisions and related actions (see also Hoelscher, 2017). The challenges connected to these decisions and actions can be categorized using the 5 Ws and 1 H often referred to in journalism: Why? What? Where? Who? When? How? There is a large number of existing challenges related to these questions. The most pivotal ones are listed in Table 15.2. While all of the challenges in the Table apply to national as well as international observation-based research, challenges are enlarged on an international level, as flaws in certain steps of the research process limit international comparability. Given space considerations, in the following we focus on the categories that are particularly challenging in international observation-based research (italicized in Table 15.2) and present examples how international observation-based studies dealt with them.

The ‘Why’ Carrying out international observation-based research on teaching quality is complicated and cost-intensive. Clear and strong arguments must therefore be made regarding how international perspectives will add significantly to the insights national studies can reveal. Publications on the studies listed in Table 15.1 most frequently named goals such as (a) describing how teaching looks in different countries (e.g., TIMSS Video 1999: Stigler & Hiebert, 1999; TIMSS Video 1995: Stigler et al., 1999; LPS: Clarke, Keitel et al., 2006), (b) investigating the impact of different dimensions of teaching quality on student learning (e.g., Pythagoras: Klieme et  al., 2009; QuiP: Fischer et  al., 2014; TALIS Video: OECD, 2018), (c) the need to develop internationally valid observation instruments to prevent the use of highly context-specific observational instruments in other contexts (e.g., ISTOF: Muijs et al., 2018), and (d) testing and improving video-based methods with

METHODOLOGICAL CHALLENGES IN CONDUCTING INTERNATIONAL RESEARCH

275

Table 15.2  Overview of pivotal challenges in conducting international studies on teaching Challenge Category

Pivotal Challenges

Why?

• • • • • • • • • • • • • • • •

What?

Where?

Who? When? How?

Reasons for the need of international studies Purpose of the study (e.g., policy versus research orientation) Differences in curriculum and teacher education Subject, topic, and lesson selection Conceptualization of teaching quality Country selection Country region selection School type(s) selection School selection Teachers Students Dates of the study Distribution of data collection (e.g., across the school year) Data collection mode (live versus video-based) Rater selection Ensuring comparability

Note: The italicized challenges are specifically relevant for international studies using observations.

respect to reliability, validity, and feasibility (e.g., TALIS Video: OECD, 2017; see also TIMSS Video 1995: Stigler et al., 1999). One could argue that these arguments are rather vague and do not entirely explain the need for international studies and we therefore need more critical discussions on the specific advantages of international observationbased studies on teaching quality. Whatever the aim of a specific study, it is important to ensure that both the study design and the interpretation of results are closely aligned. For example, the 1995 TIMSS Video study was meant to describe cultural ‘scripts’ of mathematics teaching in three countries. Nonetheless, many recipients draw conclusions with respect to the extent to which these patterns in teaching were more or less effective. This can be classified as ecological fallacy, because no data on student achievement were available for the classrooms involved in the video study. Conclusions regarding effectiveness were based on the assumption that teaching quality would be better in countries that ranked higher in the TIMSS 1995 achievement study. Thus, Stigler and Hiebert (1999) focused on the Japanese teaching model because of Japan’s student success in

the achievement study. Later, they designed the TIMSS 1999 video study to test whether teaching quality is really associated with student achievement (Stigler & Hiebert, 2009). In fact, the new study showed that no specific teaching method or type of task was common across the six high-achieving countries included (i.e., all countries other than the US). As Stigler and Hiebert (2009, p. 34) concluded, the common denominator of teaching across all six countries was a more abstract principle, namely the ‘engagement of students in active struggle with core mathematics concepts and procedures’. Imitating the techniques of teaching in other cultures therefore does not seem to be the way to improve teaching quality. Furthermore, as we usually do not have controlled experimental studies in comparative educational research, one needs to be cautious with causal statements. This is particularly the case with cross-sectional data, where only relations between variables are investigated. But it also applies for longitudinal data, because causality can only be tested to a limited degree (see also Raudenbush & Bryk, 2002) due to the complex interplay of influencing variables. Some authors therefore suggest

276

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

using international educational data instead for generating hypotheses that can be tested in a next step within a selected country (e.g., Porter & Gamoran, 2002; Raudenbush & Kim, 2002). Interpreting international findings on teaching quality with respect to the aims named above is additionally complicated by the fact that transporting instructional insights from one country to the other does not work well (e.g., Elliott & Phuong-Mai, 2008; Schweisfurth, 2002). A quote from Sadler (1979) illustrates this challenge with a powerful metaphor: We cannot wander at pleasure among the educational systems of the world, like a child strolling through a garden, and pick off a flower from one bush and some leaves from another, and then expect that if we stick what we have gathered into the soil at home, we shall have a living plant. (p. 49)

The ‘What’ Teaching is highly complex, comprising a wide range of different aspects that occur in varied settings, thereby making it necessary to focus an international study on specific parts of it. Two especially relevant decisions for international observation-based studies on teaching quality concern (a) which school subject(s), topic(s), and which lesson(s) are focused on as well as (b) how teaching quality is conceptualized.

Subject, topic, and lesson selection Teaching depends heavily on the content taught (Hill & Grossmann, 2013). Deciding on the subject, the specific content to be observed, as well as the number of lessons therefore shapes what is observed to a large degree, especially on an international level. Observation-based studies on teaching quality could, in principle, be conducted in every school subject available. Even among national studies, however, mathematics is overrepresented (Klieme & Rakoczy, 2008). This may be due to the well-defined

goal structure in mathematics and the clarity of teaching processes (see also Najvar, Najvarová, & Janík, 2009), the dominance of cognitive learning (as opposed, for example, to value education or aesthetics), the existence of a stable curriculum, the high policy relevance of mathematics in most countries, the advanced level of research in Math education – or perhaps just the fact that research had focused on mathematics before. As can be seen from Table 15.1, the dominance of mathematics can also be observed on the international level; five of the reported seven studies focused on mathematics. As teaching might differ considerably across subjects, it is unclear to what extent the international observation-based findings we currently have regarding teaching quality transfer to other subjects. But also within a single subject, a large variety of topics is covered, requiring researchers to decide on whether they want to capture a broad range of topics or whether they want to focus on one specific topic for all participating classes and countries (see also Table 15.1: three studies focused on a single topic, in one study some participating countries decided to fix the topic, and three studies did not determine the topics of the covered video-taped lessons). Strong arguments exist for and against choosing a specific topic for international observationbased studies on teaching quality. Referring to TIMSS Video 1995, Stigler and colleagues (1999), for example, argued that ‘although at first blush it may seem desirable to sample particular topics in the curriculum in order to make comparisons more valid, in practice this is virtually impossible. Especially across cultures, teachers may define topics so differently that the resulting samples become less rather than more comparable. Randomization appears to be the most practical approach to insuring the comparability of samples’ (p. 7). For TALIS Video, Klieme et  al. (2018) argued, however, that students’ opportunity to learn effects are large and when topic is not controlled systematically, opportunities

METHODOLOGICAL CHALLENGES IN CONDUCTING INTERNATIONAL RESEARCH

to learn, rather than teaching practices, may shape the relationships between teaching and learning (see also Cogan & Schmidt, 2015). Therefore, a focus on teaching quality within a single topic clarifies cross-country comparisons, given the wide range of topics and variation from lesson to lesson in studies that allow topics to vary within subjects (The Bill and Melinda Gates Foundation, 2012). They argued further that focusing on a single topic also allows for better alignment between measures of teaching quality and student outcome measures, thereby making observational measures more ‘instructionally sensitive’ (Popham, 2007); this again enhances the power to detect relationships between teaching quality and student outcomes. The authors, nevertheless, also reported on how difficult it was to determine the topic in TALIS Video (Klieme et al., 2018). The topic was selected collaboratively with the participating countries and was one of only three topics that met the eligibility requirements (e.g., included in the national curriculum around the age of 15). Observation-based measures of teaching quality are costly and therefore also require a decision regarding how many lessons per teacher to observe. The decision on the number of lessons is critical as different aspects of teaching may be differently visible depending on the lessons chosen to be observed. This is especially concerning when curricula and teaching units are structured differently across education systems. National observation-based studies suggest that for some teaching quality aspects a single lesson is sufficient for a stable estimate of participating teachers whereas for others up to nine lessons are required (see, for example, Hill, Charalambous, & Kraft, 2012; Praetorius, Pauli, Reusser, Rakoczy, & Klieme, 2014). As visible from Table 15.1, many international observation-based studies on teaching quality only cover one or two lessons per teacher, suggesting that findings on the teacher level may be undermined by less than desirable levels of reliability. Stigler and

277

colleagues (1999), however, correctly note that for research focused on country-level findings, stability on the teacher level is not required. Independent of the question of how many lessons are needed for a stable teacher estimate, Clarke and colleagues (2006) questioned whether lessons are the appropriate unit of analysis for international observationbased studies on teaching quality; in the LPS study, teachers varied the structure of their lessons considerably depending on the topic. The authors therefore concluded that lesson events might be a more appropriate analysis level than the teacher or country level.

Conceptualization of teaching quality Ideally, international observation-based studies on teaching quality would be based on an internationally shared theory of teaching quality that can then be tested empirically, simultaneously taking into account which aspects can actually be assessed based on observations. Such a theory, however, does not exist. This is highly problematic for international research on teaching: ‘Lacking good theories … I fear that much of what passes as cross-national comparison will be based on hunch, myth, and uninformed secondary data analysis, rather than carefully crafted crossnational theories of education’ (Rowan, 2002, p. 345). What does exist is a large number of conceptualizations of teaching quality developed in certain cultural settings (for overviews see, for example, Capella, Aber, & Kim, 2016; Charalambous & Praetorius, 2018; Scheerens, Luyten, Steen, & Luyten-de Thouars, 2007). These conceptualizations differ largely across but also within cultural settings. For example, even within settings, conceptualizations vary in the surrounding conditions to which they refer (e.g., class size or joint lesson planning), the teaching goals prioritized, the extent to which they are subject specific, the grain size with which teaching is described, and the specific wording used to describe teaching. This diversity

278

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

makes international observation-based studies on teaching quality complicated because each study needs to develop an agreedupon conceptualization (for example, see Praetorius et  al., 2018) that balances the need for validity within each educational system and comparability across educational systems (Clarke, Wang, Xu, & AizikovitshUdi, 2012). Further, specific practices may be common in one culture but uncommon in another – necessitating that teaching quality definitions are broad enough to accommodate related, but somewhat variable, practices across cultures. Considering the approaches taken for developing a conceptualization of teaching quality in international observation-based studies, two main strategies can be distinguished which may also be used in combination: a top-down approach (i.e., using already existing conceptualizations, models, and literature), and a bottom-up approach (i.e., using the data material generated within the study). Top-down approaches carry out the conceptualization of teaching quality at the beginning of a study, prior to data collection. The kind of material gathered for developing the conceptualization is diverse, from prominent local conceptualizations in the participating countries (e.g., TALIS Video) to expert opinions of all participating countries (e.g., ISTOF). Materials are combined in various ways that can be located on a continuum – from combining only the teaching quality aspects all countries agree upon to combining all aspects mentioned by the participating countries. Both ends of the continuum have advantages and disadvantages. Only using the intersections of all participating countries allows focusing on the teaching aspects that are seen as more or less equally important in all countries; at the same time, the conceptualization may become rather narrow. Covering all aspects, on the other end, helps to prevent omitting important teaching aspects, but may lead to the inclusion of aspects which do not play a significant role

in some educational systems, or are not interpreted similarly across systems. The ISTOF study is an example of this topdown type of development. In this study, an elaborated process was used for coming up with an internationally valid conceptualization of teaching quality. Criticizing previously US-developed instruments for a lack of international validity, the ISTOF explicitly aimed at taking into account a broader range of cultural contexts (Teddlie et  al., 2006). Using an iterative, multiple step, and internetbased Delphi process, a panel of international experts discussed factors related to effective teaching and learning, coming up with 11 international components. These were then checked in a next step for the suitability to be used for observation-based assessments, and accordingly reduced to seven components. Bottom-up approaches are based on lesson material gathered during a study. The conceptualization work is therefore done after data collection and requires videotaped lessons instead of live observations. Based on watching lesson material, common categories are developed on which the lessons can be compared. An advantage of bottomup approaches is that interesting similarities and differences can be identified that would otherwise go unnoticed. Also, studies often cannot measure everything they want to and researchers must make choices about what to leave out and how to combine similar constructs. Building a definition of teaching quality from country data allows researchers to make such decisions according to the variation in teaching that exists in the participating countries. This may make it more likely that scales eventually developed from such a bottom-up conceptualization will adequately capture the variation that exists across the participating countries, rather than theorized variation in advance. The disadvantage is that, unless paired with other conceptualizations of teaching quality, such a bottomup approach may be bounded to the dataset and specific countries participating in the study. Additionally, due to the huge amount

METHODOLOGICAL CHALLENGES IN CONDUCTING INTERNATIONAL RESEARCH

of information contained in videotaped lessons, theoretically important issues may stay underrepresented. Such an approach may also simply reflect the researchers’ idiosyncratic teaching quality preferences, rather than a systematic understanding of the empirical research. In the TIMSS Video 1995 study, such a bottom-up approach was partly used.2 In this study, a list of important aspects (i.e., the Professional Standards for Teaching Mathematics) developed by a US professional board was used as a starting point, but then revised based on an analysis of field trial videos from all three participating countries (USA, Germany, and Japan) which resulted in codes to understand cultural differences in a better way (Stigler et al., 1999). In developing these codes, it was decided from the beginning to focus on the actions of the teachers as well as the mathematical content of the lesson (see Jacobs, Hollingsworth, & Givvin, 2007), both aspects that can be assessed via observations.

The ‘Where’ In order to learn as much as possible from international comparisons, the selection of countries in international studies on teaching quality – the ‘where’ – needs to be based on theoretical considerations derived from careful consideration of the goals of the study (the ‘why’) as well as what, specifically, is being measured (the ‘what’). Without a wellfounded set of participating countries, it is impossible to draw convincing hypotheses about similarities and differences in teaching quality. If sufficient theoretical evidence for selecting countries does not exist, van de Vijver and Leung (1997) suggest including three or more groups in an exploratory manner. Having a look at the countries selected for empirical studies in general, W.E.I.R.D. samples (i.e., western, educated, industrialized, rich, and democratic; Henrich, Heine, & Norenzayan, 2010) tend to be

279

overrepresented, leading to skewed and unrepresentative results. This problem is multiplied in international studies when drawing conclusions about general patterns across countries. Imagine, for example, countries in which there are insufficient resources for teaching (e.g., not enough rooms, no blackboard, etc.) compared to countries where resources exist abundantly. Only investigating countries with sufficient resources can easily lead to the (wrong) conclusion that resources do not matter for teaching. As Table 15.1 shows, the problem of W.E.I.R.D. samples also exists for international observation-based studies on teaching quality. The distorted selection of countries in international observation-based studies on teaching quality results from the fact that it is usually either based on existing research collaborations (e.g., Pythagoras study: Klieme et  al., 2009), the interest of individual researchers to participate in a study (e.g., ISTOF), the willingness and capability of countries to pay for the participation in a study (e.g., TALIS Video), and/or the purposeful selection of countries with high student achievement in addition to the main country of interest (e.g., TIMSS Video studies). Future international observation-based studies on teaching quality therefore need to reflect more critically on the selection of countries to draw as many meaningful conclusions as possible. In selecting the population of countries, we also need to consider that we usually interpret differences among educational systems as cross-cultural differences, not taking into account that national borders are not the same as ethnic, cultural, and social boundaries (Hanke & Boer, 2016; Øyen, 1990). Addressing or managing this problem in quantitative international studies remains a challenge. That challenge may not be solva­ ble, but at a minimum, research can clearly identify their samples so that researchers might subsequently understand the ways in which findings are specific to certain types of countries or sub-populations within and across countries.

280

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

The ‘When’ Dates of the study Thinking about the timing of data collection in international observation-based studies on teaching quality, the first issue that comes to one’s mind is the year(s) in which data collection took place. Based on analyses of instruction in the 18th and 19th centuries, it is clear that teaching changes over time (e.g., Ackerberg-Hastings, 2014). To what extent this also affects the way we can interpret findings from international observationbased studies on teaching quality that are more than 20 years old (e.g., TIMSS Video 1995) is an open question that should be addressed empirically by comparing teaching quality in countries also over time. Claims about teaching in any country or across countries should be understood as claims that are bound to the timeframe in which the teaching and learning was sampled and therefore, subject to change as teaching changes over time.

Distribution of data collection Not only the year(s) of data collection, but also the distribution of data collection across the school year (over months, week days, and times during the day) might play a role for the interpretation of teaching quality in different countries. The quality of classroom management may be lower, for example, at the beginning of the school year as norms are just being determined. It is an open empirical question whether the spread needs to be even across countries or whether it does not make much of a difference that, for example, German and US videos in TIMSS Video 1995 were taped over an eight-month period starting at the beginning of the school year, whereas the videos in Japan were taped within four months at the end of the school year (see Stigler et al., 1999). Finally, the ‘when’ question does not only play a role for observational data collection

but also for the collection of data on student learning: we are usually interested in the lasting effects of teaching quality on student learning. International observationbased studies on teaching quality, however, only measure short-term effects of teaching on student outcomes, if any (see also Table 15.1). This is highly problematic as we cannot necessarily assume that short-term and long-term effects are the same.

The ‘How’ The largest challenges obviously relate to how the observations are carried out. The most important ones for international studies can be categorized under mode of data collection, rater selection, and ensuring comparability.

Mode of data collection Observations of teacher and student behavior can be carried out live or based on video recordings. Video recordings are often preferred (for example, see Fischer & Neumann, 2012; Jacobs et  al., 2007; Janík & Seidel, 2009) due to (a) technical features (e.g., replay-options of recorded lessons; multiple cameras) that should help to enhance reliabilities, (b) the fact that focusing on multiple teaching variables simultaneously is too complex, (c) the option that video recordings can be re-used to guide professional development or to carry out additional data analyses that may have little to do with the original data collection purposes, and (d) the advantage that videos concretely show what instruction looks like in specific contexts, which is particularly interesting from an international perspective. Whether videos are entirely advantageous compared to live observations is, however, an open question. Research on the extent to which live compared to video-based observations make a difference is rare. At least the study by Casabianca and colleagues (2013) indicates that if one is not interested in trend analyses

METHODOLOGICAL CHALLENGES IN CONDUCTING INTERNATIONAL RESEARCH

over time, the observation mode does not make a significant difference with respect to inferences about teaching. If one decides for video-based observations, camera issues are the next challenge to deal with: how many of them and where should they be focused? As can be seen from Table 15.1, there is no consensus on the number of cameras (it varies from one to three). What is quite clear, however, is the focus of the cameras. If there is only one camera, it usually follows the actions by the teacher, at the same time aiming to cover significant parts of the students (e.g., TIMSS Video 1995; TALIS Video). Two cameras allow also for a whole-class camera (e.g., TIMSS Video 1999, Pythagoras, and QuiP), and with three cameras, it is additionally possible to follow individual students (e.g., LPS). Only what is captured on the videos can be used for the rater judgments. Training camera people alike across participating countries and incorporating control procedures are therefore very important (e.g., Stigler et al., 1999). In TALIS Video, for example, the international consortium worked iteratively with educational systems over almost nine months to establish shared data quality standards around videos, audio, transcript, and subtitles. In all cases, countries submitted videos they felt met quality standards and then the international consortium annotated those videos in order to further discuss shared quality standards. Countries then captured additional video data and the international consortium worked together iteratively with them to build the capacity to capture video that would enable raters to accurately and reliably score the lessons. In most educational systems this iteration took between three to five videos. With a decision for video data collection, another challenge needs to be faced: the protection of participants’ data has increasingly gained importance over the last decade in many countries. This is especially true for video data as this is considered personally identifiable information for which

281

specifically strict data protection rules apply, especially in the European Union as well as in the US. Conducting an international video study therefore requires the study to comply with all the regulations of the countries participating in a study (e.g., on how to deal with unconsented students) as well as meeting the partly considerably different needs and norms of countries (e.g., on how to report the securing of data protection to students, parents, and teachers).

Rater selection One of the unique characteristics of observation-based research is that the data generated do not directly come from the participants as they do, for example, in surveys. In observation studies it comes from raters who watch and interpret the behavior of teachers and students. Several studies have shown that neither watching nor interpreting behavior with respect to teaching quality can be done in the reliable and valid ways we prefer (e.g., Hill, Charalambous, & Kraft, 2012; Praetorius et  al., 2012). Human cognition is errorprone – even when raters are trained properly (see also the next section). Raters also seem to differ in the extent to which they provide reliable and valid ratings (e.g., The Bill and Melinda Gates Foundation, 2012; Pietsch, & Tosana, 2008; Praetorius, 2014). Selecting suitable raters is therefore crucial for observation-based research. Some characteristics of raters that support the quality of their ratings could be their experience with teaching in general as well as in the specific subject and content area of the study, their alignment with the particular views of teaching and learning inherent in the observation protocol, as well as their capability to follow the rules in the training material instead of their subjective theories of good instruction (e.g., Cash, Hamre, Pianta, & Myers, 2012; Lawson & Cruz, 2017). For international studies, raters additionally need sufficient fluency in the international communication language.

282

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

It is unclear whether raters in international observation-based studies are indeed sufficiently suited for carrying out the ratings, because these details are often missing in publications. Sufficient inter-rater agreement indicates that raters might be suited, but it unfortunately only reveals information on reliability but not validity. For international studies, an additional complication is cultural background, which also influences strongly how raters perceive and interpret teaching behavior (e.g., Miller & Zhou, 2007). One could argue that raters therefore ideally are assigned to rate the lessons from their own countries as they know best how to interpret certain behavior; this is the approach chosen in most studies so far (e.g., TIMSS Video 1995 and 1999, TALIS Video, ISTOF). Another argument could, however, be that it is important to have a crossed design with raters rating also other countries’ lessons as this allows estimating the extent to which cultural observation biases play a role for their judgments (see van de Grift, Chun, Maulana, Lee, & Helms-Lorenz, 2016). One of the only studies in which such a crossed design has been implemented was conducted by Clausen, Reusser, and Klieme (2003). The authors let raters re-rate German and Swiss video data from the TIMSS Video studies; they could show that German and German Swiss raters did not differ significantly in their ratings. As these two cultural contexts were, however, rather similar, further studies are needed testing for such effects in more diverse contexts.

Ensuring comparability Comparability issues relate to the observational instrument(s) used, the rater training, as well as the data collection itself. One of the fundamental prerequisites for conducting international research on teaching quality is achieving sufficient comparability of the observational instruments across educational systems. This not only requires a shared conceptual understanding of teaching quality across educational systems (see

above), but also a congruent understanding of the indicators used for measuring the concepts of interest – which is highly challenging in a multicultural and multilingual context. For achieving such congruence, several issues need to be taken into account, starting with the selection of items (e.g., selected items are relevant to all educational systems involved) as well as its defined scale points and observational foci (e.g., a focus on either the frequency or the quality of teaching aspects carries an equal meaning across educational systems). Also, the linguistic appropriateness of selected items requires special attention in international studies (e.g., were items translated literally or have they been adapted to the local contexts? Does changing the wording also change the meaning of an item?), requiring several revision circles in some cases (van der Vijver & Leung, 1997; Phillips & Schweisfurth, 2014). For the collaborative work regarding item generation and instrument development, all material, including the observed lessons themselves, have to be translated or subtitled and transcribed, if applicable. To prevent meaning shifts during this procedure, a laborious, cyclical quality ensuring process has been established in some international observation-based studies on teaching quality (e.g., ISTOF: Muijs et al., 2018), including translations from English into the host language as well as independent back translations. Once the observational instrument is ready to be used, comparability efforts regarding its implementation need to be undertaken (Bell & Jones, 2018; Hill, Charalambous, Blazar, McGinn, Kraft, Beisiegel et  al., 2012; Hill, Charalambous, & Kraft, 2012). Thus, training materials, the training, as well as the certification requirements need to be implemented in a way that ensures that all observers have a similar understanding of the constructs to be rated. As for international studies usually a large number of raters is required, decisions need to be made regarding whether the training takes place for all raters in one place (e.g., TIMSS Video 1999) or whether a

METHODOLOGICAL CHALLENGES IN CONDUCTING INTERNATIONAL RESEARCH

train-the-trainer model is used which requires master raters of all countries to be trained in one place, who then train the raters in their own country (e.g., TALIS Video). Not only initial training but also continuous monitoring of raters internationally is needed to avoid rater drift over time. This can be achieved through double scoring and the use of calibration and/or validation videos (ITC, 2018; McClellan, Atkinson, & Danielson, 2012). Collecting observational data requires external people and/or video cameras to be present during teaching. The extent to which the presence of a camera changes instruction (so-called reactivity effects) has rarely been investigated, Thus potentially limiting the validity of observational research on instruction. Very initial evidence from a small UK sample (using student and teacher perceptions as well as teacher eye tracking) indicates that the presence of technical recording tools seems not to considerably impact teaching quality, but it does impact the emotions teachers and students experience in the classroom (see Praetorius, McIntyre, & Klassen, 2017). Another study using video diaries also indicates that individual student behavior might be impacted by the presence of a video camera during instruction, partly depending on the students’ social background (see Noyes, 2004). Studies on how this looks internationally are solely based on teacher reports (see e.g., TIMSS Video 1995 Stigler et al., 1999). In TIMSS Video 1995, for example, Japanese teachers reported to be more nervous than their German and US colleagues. Also the teacherperceived quality of the videotaped lesson differed for a considerable amount of Japanese teachers from the lessons they usually teach (around 40% of teachers in Japan compared to around 20% in Germany and the US). Once the data is available, one needs to check whether all the actions taken indeed have led to comparable measures across countries (i.e., supporting a reasonable degree of measurement invariance). Although crucial to any comparative study (Braeken & Blömeke, 2016; Miller, Mont, Maitland, Altman, &

283

Madams, 2011; van de Vijver & Leung, 1997), we are only aware of one international observation-based study that has actually tested measurement invariance for measures of teaching quality (see van de Grift et al., 2016), comparing the Netherlands with Korea. The authors could show scalar measurement invariance for their measures, allowing for comparisons of regression coefficients as well as means across both countries. As questions regarding the best approaches to calculating various tests of measurement invariance evolve, methodological advances will play a critical role in addressing these issues in the future.

CONCLUSION Taking an international perspective on issues of teaching quality is often assumed to be ‘an important and necessary part of the quest to understand and improve the science, art or craft of teaching’ (Alexander, 1999, p. 149). The potential of such research is indeed large, which can, for example, be seen based on the TIMSS studies. According to Paine and colleagues (2016), these studies have led to curriculum changes in many countries. For being able to have such effects, however, challenges need to be dealt with as effectively as possible. Summing up the current evidence on challenges in conducting international educational research on teaching quality, two aspects are especially striking. First, although we know about the challenges in theory, they are often not sufficiently taken into account when carrying out empirical studies (see also Hoelscher, 2017). Second, for most of the challenges, scholars argue that we need more systematic and theory-driven decisions in the future (Elliott & Phoung-Mai, 2008; Hoelscher, 2017; Øyen, 1990; Rowan, 2002). Establishing strict standards for training, calibrating, validating, and monitoring raters would help address the reliability and validity challenges associated with observational ratings. Thoroughly

284

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

developed hypotheses on differences among educational systems as well as optimal study designs and analytical approaches could face criticism of lacking theory-driven research and unreliable findings. It should be self-evident that international study results need to be interpreted with caution, taking all the challenges and restrictions mentioned into account (e.g., do not treat non-representative findings as representing a certain educational system; do not talk about teaching variables affecting student outcomes if the design is cross-sectional; do not compare means if measurement invariance is not given). This is, however, not always the case in a competitive global environment heavily driven by public debates and political interests. As international observation-based research on teaching is a rather new enterprise, we should place more emphasis on developing stricter standards for design and interpretation in the future.

Notes 1  The term ‘observation-based’ is used in this chapter solely with respect to standardized observations. 2  In TIMSS Video 1995 and 1999, not only bottomup but also top-down strategies were used.

REFERENCES Ackerberg-Hastings, A. (2014). Mathematics teaching practices. In A. Karp & G. Schubring (Eds.), Handbook of the history of mathematics education (pp. 525–540). New York: Springer. Alexander, R. (1999). Culture in pedagogy, pedagogy across cultures. In R. Alexander, P. Broadfoot, & D. Phillips (Eds.), Learning from comparing: New directions in comparative education research, contexts: classrooms and outcomes (Vol. 1, pp. 149–181). Oxford: Symposium Books. Bell, C. A., & Jones, N. D. (2018, January). Observation protocols are not sheets of

paper: Lessons learned from the measurement of teaching. Invited panel presentation at the Institute for Education Sciences annual principal investigators meeting, Arlington, VA, USA. Bell, C. A., Qi, Y., Croft, A. J., Leusner, D., McCaffrey, D. F., Gitomer, D. H., & Pianta, R. C. (2014). Improving observational score quality. In T. J. Kane, K. A. Kerr & R. C. Pianta (Eds.), Designing teacher evaluation systems: New guidance from the measures of effective teaching project (pp. 50–97). San Francisco, CA: Jossey-Bass. Blömeke, S., & Paine, L. (2008). Getting the fish out of the water: Considering benefits and problems of doing research on teacher education at an international level. Teaching and Teacher Education, 24, 2027–2037. Braeken, J., & Blömeke, S. (2016). Comparing future teachers’ beliefs across countries: Approximate measurement invariance with Bayesian elastic constraints for local item dependence and differential item functioning. Assessment & Evaluation in Higher Education, 41(5), 733–749. Capella, E., Aber, L., & Kim, H. Y. (2016). Teaching beyond achievement: Perspectives from developmental and education science. In D. Gitomer & C. Bell (Eds.), The handbook of research on teaching (5th edition, pp. 249– 349). Washington, DC: American Educational Research Association. Caro, D. H., Lenkeit, J., & Kyriakides, L. (2016). Teaching strategies and differential effectiveness across learning contexts: Evidence from PISA 2012. Studies in Educational Evaluation, 49, 30–41. Casabianca, J. M., McCaffrey, D. F., Gitomer, D. H., Bell, C. A., Hamre, B. K., & Pianta, R. C. (2013). Effect of observation mode on measures of secondary mathematics teaching. Educational and Psychological Measurement, 73(5), 757–783. Cash, A. H., Hamre, B. K., Pianta, R. C., & Myers, S. S. (2012). Rater calibration when observational assessment occurs at large scale: Degree of calibration and characteristics of raters associated with calibration. Early Childhood Research Quarterly, 27(3), 529–542. Charalambous, C., & Praetorius, A.-K. (2018). Studying mathematics instruction through

METHODOLOGICAL CHALLENGES IN CONDUCTING INTERNATIONAL RESEARCH

different lenses: Setting the ground for understanding quality more comprehensively. ZDM Mathematics Education, 50, 355–366. Clare, L., Valdés, R., Pascal, J., & Steinberg, J. (2001). Teachers’ assignments as indicators of instructional quality in elementary schools (CSE Technical Report No. 545). Los Angeles, CA: National Center for Research on Evaluation. Clarke, D. (Ed.) (2001). Perspectives on practice and meaning in mathematics and science classrooms (Mathematics Education Library). Dordrecht: Springer. Clarke, D., Emanuelsson, J., Jablonka, E., & Mok, I. A. C. (Eds.) (2006). Making connections: Comparing mathematics classrooms around the world. Rotterdam: Sense Publishers. Clarke, D., Keitel, C., & Shimizu, Y. (Eds.) (2006). Mathematics classrooms in twelve countries: The insider’s perspective. Rotterdam: Sense Publishers. Clarke, D., Wang, L., Xu, L., & Aizikovitsh-Udi, E. (2012, January). International comparisons of mathematics classrooms and curricula: The validity–compatibility compromise. Conference Paper. Retrieved from: www.researchgate.net/publications/ 304165884. Clausen, M., Reusser, K., & Klieme, E. (2003). Unterrichtsqualität auf der Basis hochinferenter Unterrichtsbeurteilungen: Ein Vergleich zwischen Deutschland und der deutschsprachigen Schweiz. Unterrichtswissenschaft, 31(2), 122–141. Cogan, L. S., & Schmidt, W. H. (2015). The concept of opportunity to learn (OTL) in international comparisons of education. In K. Stacey & R. Turner (Eds.), Assessing mathematical literacy: The PISA experience (pp.  207–216). Cham, Switzerland: Springer International. doi: 10.1007/978-3-319-10121-7_10 Creemers, B. (2006). The importance and perspectives of international studies in educational effectiveness. Educational Research and Evaluation, 12(6), 499–511. Day, C., Sammons, P., Kington, A., Regan, E., Ko, J., Brown, E. et al. (2008). Effective classroom practice: A mixed method study of influences and outcomes. End of Award Report. Swindon: ESRC.

285

De La Iglesia Mayol, B., & Rosselló Ramon, M. R. (2018). Aplicación del sistema internacional de observación y feedback docente (istof-ii) en un contexto educativo noanglosajón. Revista Iberoamericana sobre Calidad, Eficacia y Cambio en Educación, 16(1), 89–104. Devine, D., Fahie, D., & McGillicuddy, D. (2013). What is ‘good’ teaching? Teacher beliefs and practices about their teaching. Irish Educational Studies, 32(1), 83–108. https://doi.org /10.1080/03323315.2013.773228. Dupriez, V., Dumay, X., & Vause, A. (2008). How do school systems manage pupils’ heterogeneity? Comparative Education Review, 52, 245–273. Elliott, J. G., & Phuong-Mai, N. (2008). Western influences on the East, Eastern influences on the West: Lessons for the East and West. In J. Elliott & N. Phuong-Mai (Eds.), What the West can learn from the East: Asian perspectives on the psychology of learning and motivation (pp. 31–58). New York: Information Age Publishing. Fischer, H. E., & Neumann, K. (2012). Video analysis as a tool for understanding science instruction. In D. Jorde & J. Dillon (Eds.), Science education research and practice in Europe. Retrospective and prospecctive (pp. 115–139). Rotterdam: Sense Publishers. Fischer, H. F., Labudde, P., Neumann, K., & Viiri, J. (Eds.) (2014). Quality of instruction in physics: Comparing Finland, Germany and Switzerland. Münster: Waxmann. Geller, C., Olszewski, J., Neumann, K., & Fischer, H. E. (2008). Unterrichtsqualität in Finnland, Deutschland und der Schweiz: Merkmale der Tiefenstruktur von Physikunterricht und der Zusammenhang zur Leistung. In D. Höttecke (Ed.), Gesellschaft für Didaktik der Chemie und Physik: Kompetenzen, Kompetenzmodelle, Kompetenzentwicklung (pp. 398–398). Münster: LIT Verlag. Hanke, K., & Boer, D. (2016, September). Workshop on ‘The dos and don’ts in crosscultural research’. Leipzig, Germany: 50th Kongress der Deutschen Gesellschaft für Psychologie. Helmke, A. (2009). Unterrichtsqualität und Lehrerprofessionalität: Diagnose, evaluation und Verbesserung des Unterrichts. Seelze: Klett-Kallmeyer.

286

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2), 61–83. Hiebert, J., Gallimore, R., Garnier, H., Givvin, K. B., Hollingsworth, H., & Jacobs, J. et  al. (2003). Teaching mathematics in seven countries: Results from the TIMSS 1999 video study. Washington, DC: US Department of Education, National Center for Education Studies. Hill, H. C., Charalambos, Y. C., Blazar, D., McGinn, D., Kraft, M. A., Beisiegel, M., & Lynch, K. (2012). Validating arguments for observational instruments: Attending to multiple sources of variation. Educational Assessment, 17, 1–19. Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: Teacher observation systems and a case for the generalizability study. Educational Researcher, 41, 5–64. Hill, H. C., & Grossman, P. (2013). Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems. Harvard Educational Review, 83(2), 371–384. Hoelscher, M. (2017). ‘It’s the method, stupid’: Interrelations between methodological and theoretical advances: The example of comparing higher education systems internationally. Research in Comparative & International Education, 12(3), 347–359. International Test Commission. (2018). ITC guidelines for the large-scale assessment of linguistically and culturally diverse populations. Retrieved from: https://www.intestcom.org/files/guideline_diverse_populations. pdf. Jacobs, J. K., Hollingsworth, H., & Givvin, K. B. (2007). Video-based research made ‘easy’: Methodological lessons learned from the TIMSS Video Studies. Field Methods, 19(3), 284–299. Janík, T., & Seidel, T. (Eds.) (2009). The power of video studies in investigating teaching and learning in the classroom. Münster: Waxmann. Klieme, E., Pauli, C., & Reusser, K. (2009). The Pythagoras study: Investigating effects of teaching and learning in Swiss and German mathematics classrooms. In T. Janík & T. Seidel (Eds.), The power of video studies in

investigating teaching and learning in the classroom (pp. 137–160). Münster: Waxmann. Klieme, E., & Rakoczy, K. (2008). Empirische Unterrichtsforschung und Fachdidaktik: Outcome-orientierte Messung und Prozessqualität des Unterrichts. Zeitschrift für Pädagogik, 54(2), 222–237. Klieme, E., Reiss, K., Bell, C. A., Praetorius, A.K., Stecher, B., & Opfer, D. (2018, April). The role of subject matter in measuring teaching across countries. Paper presentation at the annual meeting of the American Educational Research Association, New York. Ko, J. Y. (2010). Consistency and variation in classroom practice: A mixed-method investigation based on case studies of four EFL teachers of a disadvantaged secondary school in Hong Kong. PhD thesis, University of Nottingham. Labudde, P. (2002). Lehr-Lern-Kultur im Physikunterricht: eine Videostudie [Teaching and learning culture in physics instruction: A video study]. SNF Project Proposal. Bern: University of Berne. Lawson, J. E., & Cruz, R. A. (2017). Evaluating special educators’ classroom performance: Does rater ‘type’ matter? Assessment for Effective Intervention, 43(4), 227–240. LeTendre, G. K., Hofer, B. K., & Shimizu, H. (2003). What is tracking? Cultural expectations in the United States, Germany, and Japan. American Educational Research Journal, 40, 43–89. Lipowsky, F., Rakoczy, K., Pauli, C., DrollingerVetter, B., Klieme E., & Reusser, K. (2009). Quality of geometry instruction and its shortterm impact on students’ understanding of the Pythagorean Theorem. Learning and Instruction, 19, 527–537. McClellan, C., Atkinson, M., & Danielson, C. (2012, March). Teacher evaluator training and certification: Lessons learned from the Measures of Effective Teaching project. Practitioner series for teacher evaluation. Retrieved from www.issuelab.org/resource/ teacher-evaluator-training-certification-lessons-learned-from-the-measures-of-effective-teaching-project.html. Miao, Z., Reynolds, D., Harris, A., & Jones, M. (2015). Comparing performance: A crossnational investigation into the teaching of mathematics in primary classrooms in

METHODOLOGICAL CHALLENGES IN CONDUCTING INTERNATIONAL RESEARCH

E­ ngland and China. Asia Pacific Journal of Education, 35(3), 392–403. Miller, K., Mont, D., Maitland, A., Altman, B., & Madans, J. (2011). Results of a cross-national structured cognitive interviewing protocol to test measures of disability. Quality & Quantity, 45, 801–815. Miller, K., & Zhou, X. (2007). Learning from classroom video: What makes it compelling and what makes it hard. In B. Barron, R. Pea, R. Goldman-Segall, & S. J. Derry (Eds.), Video research in the learning sciences (pp. 321– 334). New York: Routledge. Muijs, D., Reynolds, D., Sammons, P., Kyriakides, L., Creemers, B. P. M., & Teddlie, C. (2018). Assessing individual lessons using a generic teacher observation instrument: How useful is the international system for teacher observation and feedback (ISTOF)? ZDM Mathematics Education, 50(3), 395–406. Najvar, P., Najvarová, V., & Janík, T. (2009). Lesson structure in different school subjects in the Czech Republic, Orbis Scholae, 3(2), 113–127. Neumann, K., Fischer, H. E., Labudde, P., & Viiri, J. (2010). Physikunterricht im Vergleich: Unterrichtsqualität in Deutschland, Finnland und der Schweiz. In D. Höttecke (Ed.), Gesellschaft für Didaktik der Chemie und Physik: Entwicklung naturwissenschaftlichen Denkens zwischen Phänomen und Systematik. Berlin: LIT Verlag. Noyes, A. (2004). Video diary: A method for exploring learning dispositions. Cambridge Journal of Education, 34, 193–209. OECD (2016). PISA 2015 results (Vol. I): Excellence and equity in education, PISA. Paris: OECD Publishing. Retrieved from: http:// dx.doi.org/10.1787/9789264266490-en. OECD (2017). Teaching and Learning International Survey. TALIS 2018. Video study and global video library on teaching practices. Paris: OECD Brochure, www.oecd.org/education/school/TALIS-2018-video-study-brochure-ENG.pdf. OECD (2018). What does teaching look like?: A new video study. Teaching in Focus, No. 20, Paris: OECD Publishing, https://doi. org/10.1787/948427dc-en. Øyen, E. (Ed.) (1990). Comparative methodology: Theory and practice in international social research. London: Sage.

287

Paine, L., Blömeke S., & Aydarova, O. (2016). Teachers and teaching in the context of globalization. In D. Gitomer & C. Bell (Eds.), The handbook of research on teaching (5th edition, pp. 717–786). Washington, DC: American Educational Research Association. Phillips, D., & Schweisfurth, M. (2014). Comparative and international education: An introduction to theory, method, and practice. London: Bloomsbury Academic. Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38(2), 109–119. Pietsch, M., & Tosana, S. (2008). Beurteilereffekte bei der Messung von Unterrichtsqualität: Das Multifacetten-Rasch-Modell und die Generalisierbarkeitstheorie als Methoden der Qualitätssicherung in der externen Evaluation von Schulen. Zeitschrift für Erziehungswissenschaft, 11, 430–452. Popham, W. J. (2007). Instructional sensitivity of tests: Accountability’s dire drawback. Phi Delta Kappan, 89(2), 146–155. Porter, C. A., & Gamoran, A. (2002). Methodological advances in cross-national surveys of educational achievement. Washington, DC: National Academy Press. Praetorius, A.-K. (2014). Messung von Unterrichtsqualität durch Ratings. Münster: Waxmann. Praetorius, A.-K., Klieme, E., Bell, C. A., Qi, Y., Witherspoon, M., & Opfer, D. (2018, April). Country conceptualizations of teaching quality in teaching and learning International survey video: Identifying similarities and differences. Paper presentation at the annual meeting of the American Educational Research Association AERA, New York. Praetorius, A.-K., Lenske, G., & Helmke, A. (2012). Observer ratings of instructional quality: Do they fulfill what they promise? Learning and Instruction, 22(6), 387–400. Praetorius, A.-K., McIntyre, N., & Klassen, R. M. (2017). Reactivity effects in video-based classroom research: An investigation using teacher and student questionnaires as well as teacher eye-tracking. Zeitschrift für Erziehungswissenschaft, Sonderheft, 32, 49–74. Praetorius, A.-K., Pauli, C., Reusser, K., Rakoczy, K., & Klieme, E. (2014). One lesson is all you

288

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

need? Stability of instructional quality across lessons. Learning and Instruction, 31, 2–12. Raudenbush, S. W., & Kim, J. S. (2002). Statistical issues in analysis of international comparisons of educational achievement. In A. C. Porter & A. Gamoran (Eds.), Methodological advances in cross-national surveys of educational achievement (pp. 267–294). Washington DC: National Academy Press. Raudenbush, S. W., & Kim, J. S. (2002). Hierarchical linear models (2nd edition). Thousand Oaks, CA: Sage. Reynolds, D. (2006). World class schools: Some methodological and substantive findings and implications of the International School Effectiveness Research Project (ISERP). Educational Research and Evaluation, 12(6), 535–560. Roth, K. J., Druker, S. L., Garnier, H., Lemmens, M., Chen, C., Kawanaka, T., & Gonzales, P. (2006). Teaching science in five countries: Results from the TIMSS 1999 Video Study. Washington, DC: National Center for Education Statistics. Rowan, B. (2002). The ecology of school improvement: Notes on the school improvement industry in the United States. Journal of Educational Change, 3, 283–314. Sadler, M. (1979). Selections from Michael Sadler: Studies in world citizenship. Liverpool: Dejall & Meyorre. Sammons, P., Kington, A., Lindorff-Vijayendran, A., & Ortega, L. (2014). Inspiring teaching: What we can learn from exemplary practitioners. Reading, UK: CfBT. Scheerens, J., Luyten, H., Steen, R., & Luytende Thouars, Y. (2007). Review and metaanalyses of school and teaching effectiveness. Enschede: Department of Educational Organisation and Management, University of Twente. Scherer, R., Nilsen, T., & Jansen, M. (2016). Evaluating individual students’ perceptions of instructional quality: An investigation of their factor structure, measurement invariance, and relations to educational outcomes. Frontiers in Psychology, 7(110). Schleicher, A. (2011). Lessons from the world on effective teaching and learning environments. Journal of Teacher Education, 62(2), 202–221.

Schweisfurth, M. (2002). Democracy and teacher education: Negotiating practice in The Gambia. Comparative Education, 38(3), 303–314. Seidel, T., Prenzel, M., Rimmele, R., Dalehefte, I. M., Herweg, C., Kobarg, M., & Schwindt, K. (2006). Blicke auf den Physikunterricht: Ergebnisse der IPN Videostudie. Zeitschrift für Pädagogik, 52, 799–821. Shimizu, Y., & Kaur, B. (2013). Learning from similarities and differences: A reflection on the potentials and constraints of crossnational studies in mathematics. ZDM Mathematics Education, 45(1), 1–157. Stigler, J., Gonzales, P., Kawanaka, T., Knoll, S., & Serrano, A. (1999). The TIMSS-Videotape classroom study. Technical Report. Los Angeles, CA: University of California. Stigler, J., & Hiebert, H. (2009). Closing the teaching gap. Phi Delta Kappa, 91(3), 32–37. Stigler, J., & Hiebert, J. J. (1999). The teaching gap: Best ideas from the world’s teachers for improving education in the classroom. New York: Free Press. Teddlie, C., Creemers, B., Kyriakides, L., Muijs, D., & Yu, F. (2006). The international system for teacher observation and feedback: Evolution of an international study of teacher effectiveness constructs. Educational Research and Evaluation, 12(6), 561–582. The Bill and Melinda Gates Foundation (2012). Gathering feedback for teaching: Combining high quality observations with student surveys and achievement gains. Seattle, WA: Author. van de Grift, J. C. M., Chun, S., Maulana, R., Lee, O., & Helms-Lorenz, M. (2016). Measuring teaching quality and student engagement in South Korea and The Netherlands. School Effectiveness and School Improvement: An International Journal of Research, Policy and Practice, 28(3), 337–349. van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis for cross-cultural research (Vol. 1). Cross-cultural psychology series. Thousand Oaks, CA: Sage. Vieluf, S., Kaplan, D., Klieme, E., & Bayer, S. (2012). Teaching practices and pedagogical innovation: Evidence from TALIS. Paris: OECD Publishing.

16 The Measurement and Use of Socioeconomic Status in Educational Research J . D o u g l a s W i l l m s a n d L u c i a Tr a m o n t e

I. DEFINITION OF SOCIOECONOMIC STATUS Sociologists studying social stratification have been concerned with how individuals and families vary in their status of origin and how they access and control material and social resources. Definitions of socioeconomic status (SES) generally refer to the relative position of a family or individual on an hierarchical social structure, based on their access to, or control over, wealth, prestige, and power (Mueller & Parcel, 1981). SES is treated as a social outcome to be explained by social and political forces that maintain social status from one generation to the next, and the strategies people use to occupy certain positions in a social hierarchy. For at least nine decades, operational definitions of SES have been based primarily on the ‘big three’ – family income, educational attainment, and the occupational status of the head of the household (National Center for

Education Statistics, 2012). However, sociologists tended to favour occupation-based measures of SES as the most reliable and valid. Their arguments were that occupation in industrialized societies largely determined salaries and wages and that it was an indicator of authority and control over people and resources. Therefore, occupation was viewed as the best indicator of status and prestige (Blau & Duncan, 1967). Moreover, family income was not considered a viable indicator of SES as it varied considerably in the short term for many families (Goldberger, 1989). Several researchers have called for extending the common measures of SES to include social and cultural capital, arguing that the success of dominant groups in maintaining their economic and political power depends on their strategic use of social and cultural capital (Collins, 1971; Lareau & Weininger, 2003). Tramonte and Willms (2010) distinguished between two forms of cultural capital – one that is static, associated with ‘highbrow activities’ and cultural practices, and one that is dynamic,

290

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

associated with the cultural interactions between children and their parents. A similar argument can be made to include neighbourhood and school-level variables in a composite SES variable, as living in a high-status neighbourhood or attending a high-status school affirms and supports middle- and upper-class social values and behavioural norms. Social scientists in other fields, including child development, education, population health and psychology, have attempted to define and measure SES as an explanatory variable to account for variability in educational, health and other social outcomes. For example, Green (1970) defined SES as ‘the relative position of a person, family, or neighborhood in an hierarchy which maximally reflects differences in health behavior’ (p. 816). Researchers in these fields recognized the need to incorporate other factors in a measure of SES. They integrated occupational status with measures of home possessions, number of rooms in the home, extracurricular and cultural activities, parents’ educational attainment, and father’s occupation (Sims, 1927). Ganzeboom, De Graaf, and Treiman (1992) developed a model-based approach in which occupational status mediates the relationship between education and income. Studies in the field of education placed greater emphasis on parental education. This stemmed from empirical observations of school performance of children from different social milieus, which found that students who performed poorly in school or on achievement tests came from families where parents were less educated, had lower incomes, and worked in less prestigious jobs. For example, a meta-analysis of studies conducted between 1990 and 2000 found that SES was measured primarily by categories of parental educational attainment, and only secondarily by parental occupational status, family income and home resources (Sirin, 2005). This chapter considers the use and measurement of SES in studies of educational outcomes, including large-scale international

assessments and national assessments. The next section discusses two different approaches that underlie the development of SES measures, one which views SES as a stand-alone construct that may or may not be related to various outcomes and another which identifies the components of SES based on their relationship to a particular outcome. Section III discusses the measurement of SES, especially as it pertains to large-scale international studies, and introduces five methodological issues: the key informant, the level of aggregation, measurement invariance, strategies to set cut-points for assessing poverty, and the handling of missing data. Section IV discusses the use of SES in large-scale international studies. The chapter concludes with a discussion of the implications of the measurement and use of SES for educational policy and research.

II. REFLECTIVE VERSUS FORMATIVE APPROACHES Edwards and Bagozzi (2000) described two different models of the nature and direction of the causal relationship between constructs and measures. A reflective approach assumes that an underlying construct causes various observable indicators. For example, selfesteem may cause students’ responses to various Likert questions pertaining to their feelings of self-worth (Bollen, 2002). Most constructs used in the social sciences, including SES, are based on a reflective approach. With this approach, one can use techniques such as factor analysis to determine the relative weighting assigned to various indicators, and common psychometric approaches such as item response theory (IRT) to determine which indicators maximally contribute to the reliability of the measured construct. This is the approach taken with the Programme for International Student Assessment (PISA) measure of SES, referred

The Measurement and Use of Socioeconomic Status in Educational Research

to as economic, social, and cultural status (ESCS). It is comprised of three factors: occupational status of the parent with the highest occupational status, educational attainment of the parent with the highest educational attainment, and an index of home possessions which is considered a proxy for family income. Factor analysis is used to extract the first principal component, which is standardized on the Organization for Economic Co-operation and Development (OECD) population of countries as the measure of ESCS. Public health research generally adopts a reflective approach as well, with the goal of explaining how levels of inequality and variation in social context affect health outcomes. Oakes and Rossi (2003) provided a model which includes three latent constructs comprising SES: material, human, and social capital. In their model, these three constructs cause various observable indicators or scale items. Their argument is that SES measures for public health need to capture more of the social context than those that include only income, education and occupational position. A formative approach to the measurement of SES selects its indicators based on a presumed causal relationship to a particular outcome. It assumes that the observable indicators cause or affect the latent variable. For example, one can use multivariate regression techniques to identify potential indicators of SES that are strongly related to students’ performance on a mathematics test. The SES measure would include those measures that had large and statistically significant regression weights. These weights would be used to determine the relative weighting for each indicator. Those advocating this approach maintain that SES itself derives its meaning from its relationship to valued outcomes. However, others argue that the formative approach is circular: if one chooses variables that predict the outcome, develops a composite measure of SES comprised of those predictor variables, and then uses SES to describe its relationship to the outcome, little is gained by having the SES measure (National Centre

291

for Education Statistics, 2012). Moreover, in a study that incudes multiple outcome measures, the formative approach would call for multiple measures of SES.

III. THE MEASUREMENT OF SES The measurement process entails the assignment of numbers to categories of ‘realworld’ observations. Generally, an ‘instrument’, such as a set of survey questions, is used to collect data on what we observe in the real world. These data are assumed to be related to some latent or unobserved construct, which exists only as part of a theory (Wilson, 2005). In our case, the latent construct is SES, and we engage in the measurement process because we wish to make certain decisions based on people’s scores on the underlying construct. Establishing validity for the use of SES requires one to specify the sequence of inferences and assumptions that takes one from the observations derived from the survey questions to the proposed interpretation and uses of the SES measure (Kane, 2013). Thereafter, one needs to validate the sequence of inferences, providing the reasoning underlying the inferences and assumptions, along with documentation of the procedures and in some cases empirical evidence to back up the claims (Kane, 2006). This chapter does not attempt to do this for any measure of SES. It does, however, consider the first few steps in making an interpretative argument and sets out some of the most important uses of SES in large-scale studies. The first inference involved in the development of an SES measure is scoring, which entails making assumptions about the relevant weights to assign to each observation or indicator of SES. Given the variation in definitions of SES, the literature is replete with differing measures of SES, each with its own scoring rules. We maintain that the best place to start is with the ‘big three’ (Duncan,

292

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Featherman, & Duncan, 1972; National Center for Education Statistics, 2012) – income, occupation, and education – and strive to develop a scoring procedure that is ultimately defensible for the intended purposes or decisions to be made based on the composite measure. The second inference requires one to make a case that the sample of observations adequately represents the universe of observations that represent the latent construct, SES. Researchers in the field of child development have striven to unpack SES; that is, identify the activities that parents bring to bear on children’s development that are associated with the correlation between SES and children’s developmental outcomes. They have stressed the importance of parents’ investments in their children’s human capital (Becker, 1964) and access to cultural capital that children need to be successful at school (Farkas, 2003; Levin & Belfield, 2002). The challenge in validating the interpretative argument is to ensure one has a clear, coherent definition of SES. The challenges associated with providing empirical evidence to establish the validity of a measure of SES are compounded by several methodological issues. These are discussed below.

Key Informant In most educational studies, information for the SES variables is collected with contextual questionnaires administered to students. Most youth do not know their family income (Entwislea & Atone, 1994); therefore, in the large international studies, the student questionnaires seldom include questions about family income. Instead, most studies include a measure of home possessions. To some extent, measures of home possessions stand as a proxy for family income, but they are also viewed as possessions that support a child’s educational development. Having books in the home or a quiet place to study,

for example, are related to children’s academic performance. Thus, the measures of home possessions typically comprise a mix of items that include both a formative and reflexive approach. Researchers have questioned whether students provide valid responses regarding their parents’ education. Generally, they have found that students’ reports of parental education have relatively low correlations with parents’ self reports, compared with their reports of parents’ occupations. They also tend to have more missing data (Lien, Friestad, & Klepp, 2001; Looker, 1989; Schulz, 2005). The measurement of parental occupation can also be problematic. In PISA, students are asked two open-ended questions: ‘What is your mother’s (or father’s) main job? (e.g., school teacher, kitchen-hand, sales manager)? Please write in the job title.’, and ‘What does your mother (or father) do in her (or his) main job? (e.g., teaches high school students, helps the cook prepare meals in a restaurant, manages a sales team). Please use a sentence to describe the kind of work she (or he) does or did in that job.’ The responses are coded to a four-digit code established by the International Labour Office (2012). The resulting codes are then mapped to the international socioeconomic index of occupational status (ISEI) (Ganzeboom & Treiman, 2003). This procedure yields reliable data for most students; however, it is problematic for students with low literacy skills. Also, the approach does not provide a level of status for stay-at-home parents who are working as the primary caregiver in the home (Tramonte & Willms, 2018).

Level of Aggregation SES is often measured at various levels of aggregation, including individuals, families, neighbourhoods, schools, communities and census units (e.g., postal codes and census tracts). Conceptually, one can envisage that

The Measurement and Use of Socioeconomic Status in Educational Research

the measurement process would place units above the individual level, such as schools, on a status hierarchy. This is typically the case in student surveys when the SES measure pertains to the SES of a student’s family. However, at higher levels of aggregation, such as the school level, researchers usually aggregate individual-level SES data to a higher level and treat the new variable as a contextual variable. For example, the mean SES of a school is commonly used, but not as an hierarchy of schools in the same way as SES at the individual level is conceived. As a contextual variable, it may be that other aggregates, such as the percentage of students with very low SES, are better suited for the research question or intended use of the data (Willms, 2010).

Measurement Invariance For large-scale national studies and comparative international studies such as PISA and Trends in International Mathematics and Science Study (TIMSS), we would like to have a single measure of SES that is consistent across sub-populations – for students in rural and urban settings, for immigrant and non-immigrant students, for students in low-, middle- and high-income countries, and so on. This would facilitate interpretation and reporting and enable the analyst to readily make comparisons among sub-­ populations and countries. Wu, Li, and Zumbo (2007) describe measurement invariance: ‘An observed score is said to be measurement invariant if a person’s probability of an observed score does not depend on his/ her group membership, conditional on the true score. That is, respondents from different groups, but with the same true score, will have the same observed score’ (p. 2). Achieving measurement invariance for a measure of SES is challenging because many of the plausible indicators are not measurement invariant. For example, owning livestock can be a good indicator of wealth in

293

rural areas, especially in low-income countries, but it is not a good indicator in urban areas. A further challenge is to identify indicators that stretch the SES scale at the lower end. For example, having a level of education ‘at or below primary education’ is not fine-grained enough to capture differences in status and opportunities among people considered to be living in poverty. In developing the measure of SES for PISA for Development (PISA-D),1 Tramonte and Willms (2018) considered indicators of low SES that had been used in a number of national and international large-scale comparative studies of students’ educational outcomes and adult literacy.2 For each potential indicator, they examined the extent of measurement variance using a statistical technique called differential item functioning (DIF). The aim was to identify items that had low measurement variance (i.e., low DIF) among countries. The selected indicators were then used in the pilot study for PISA-D. When the pilot study data were available, they examined DIF for the relevant indicators. Figure 16.1 provides an example. The probability of the presence of a finished floor in a dwelling had relatively small DIF among the PISA-D countries, while having piped water varied considerably. The main questionnaire for PISA-D includes a small number of SES indicators that are relatively invariant across countries. These include, for example, the composition of the floor in the home, whether the family had to share a toilet facility, or had a table for family meals.

Setting One or More Cut-points to Define Poverty For PISA-D, Tramonte and Willms (2018) considered two approaches for setting one or more cut-points to define poverty: a ‘­ population-based approach’ and a ‘scalebased approach’. A population-based approach examines the distribution of scores in the full population and sets the cut-point

294

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Figure 16.1  Differential item function curves for two indicators of SES

based on some criteria, such as a percentile or the presence of a certain number of home possessions. For example, one could examine the distribution of SES scores for a country and set the cut-off for poverty at the 15th percentile. A problem with this approach is that the prevalence of families living in poverty is predetermined – in this case 15% – and it does not vary over time or across countries. Given this, one cannot examine trends over time or compare countries. A scale-based approach sets the cut-point based on the item difficulty of particular items or on the scale’s response categories. For example, in their analysis of national data on school achievement in Brazil, Willms, Tramonte, Duarte, and Bos (2012) used IRT to estimate the difficulty associated with a number of SES indicators. They found that having a washing machine had a scaled score of −0.465. Twenty-nine percent of the population had scaled scores on the SES scale that were less than or equal to −0.465. These students were considered to be living in poverty. One can also appeal to one or more outcomes to help choose a defensible cut-point. For example, 2,500 grams is the cut-point for considering children to be of low birth weight (LBW). To a large extent, this cutpoint was derived from research showing

the relationship of LBW to various problems that children with birth weights below 2,500 grams are prone to experience, such as infections and respiratory, neurological, and gastro-intestinal problems. Similarly, The Learning Bar’s measures of anxiety and depression (The Learning Bar Inc., 2018) each have two cut-points to establish moderate and severe levels of anxiety and depression. The cut-points were established by considering the comorbidity of anxiety and depression with health-related measures of vulnerability using data from the National Longitudinal Study of Children and Youth (NLSCY) (Bagnell, Tramonte, & Willms, 2008). For some outcomes the relationship with SES is weaker at higher levels of SES. This has been the case for some health outcomes in the United States; after a certain level of SES, the returns to increasing SES are marginal (Epelbaum, 1990). Willms (2006) framed this phenomenon as ‘the hypothesis of diminishing returns’, which holds that there are weaker effects of SES on social outcomes above some SES threshold. One might predict, for example, that above a certain level of SES there would be little or no increase in students’ academic achievement associated with SES. However, Willms and Somers (2001) found that in several Latin

The Measurement and Use of Socioeconomic Status in Educational Research

American countries there were increasing returns for SES; that is, the relationships between children’s reading and mathematics scores and parental education were curvilinear, with a weak outcome–SES relationship at low levels of SES and a stronger relationship at high levels of SES. They suggested that there may be some minimum level of parental education necessary for children to benefit from elementary schooling in low- and middle-income countries. Thus, there may be a ‘tipping point’ at which children can maximally benefit from rising levels of SES. If one adopts a formative approach to the choice and weighting of variables in an SES measure, then it may be that the indicators affecting schooling outcomes at very low levels of SES differ from the indicators affecting schooling outcomes at high levels of SES. One might also hypothesize that in the least economically developed countries, the SES indicators that affect performance at the lower end of the scale are strongly predictive of whether children make the critical transition from learningto-read to reading-to-learn. This opens up the possibility of having two distinct measures of SES, one that includes indicators relevant to children who are living in poverty, and another that includes indicators for children whose SES is low-to-moderate, but above the ‘tipping point’.

Missing Data The scoring procedures for calculating SES at the individual level requires one to devise a strategy to estimate SES for families when only one parent is present. In most studies, a large percentage of students are missing data for their fathers. In PISA, for example, data are collected for the mother’s and father’s educational attainment, measured in years of education; the mother’s and father’s occupation, which is converted to a status index; and a measure of home possessions, considered a

295

proxy for economic, social, and cultural capital. The problem of missing data for fathers, or in some cases for mothers, is circumvented by calculating the highest level of education and the highest level of occupation of the parents living in the home. If only one parent is present, the score for that parent is used. The model assumes that it is the highest level of parental SES that is the most relevant for the intended interpretations and uses of the SES measure. Empirical evidence to support this assumption would strengthen the validity of this scoring procedure. Taken together, these methodological issues can raise questions about the validity of any measure of SES used in large-scale international studies. Our view is that the same issues arise in the development of any construct used in studies based on students’ reports. The discerning researcher must consider the interpretations and uses of the construct and ask whether any of the particular methodological issues pose a threat to its validity (Hannum, Liu, & Alvarado-Urbana, 2017; Rutkowski & Rutkowski, 2010). The next section discusses the common uses of SES in large-scale monitoring efforts.

IV. THE USES OF SES IN LARGE-SCALE INTERNATIONAL STUDIES Educational monitoring systems can provide information on a school system’s progress in meeting its educational goals. They can enable administrators to set realistic, attain­ able goals, and assess progress during a specified time frame. Monitoring systems can also provide information on strategies that aim to achieve educational goals. Most strategies pertain to how financial or personnel resources are allocated or involve changing a structural feature of schooling that may constrain progress in meeting educational goals. Structural features refer to long-­ standing practices of schools and school systems concerning how schooling is organized, what is

296

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

taught, and how instruction is delivered. The age when children start school, the grade levels at each stage of schooling, the formal curriculum, and the language of instruction are examples of structural features. Monitoring systems can also establish an infrastructure for research that affords evidence on whether a particular strategy or intervention is effective. Measures of SES can play a role in educational monitoring in at least four ways: 1 By providing a context for setting attainable goals and measuring progress towards them; 2 In assessing the equality of outcomes and the equity of provision among advantaged and disadvantaged groups; 3 For considering the potential of interventions aimed at improving student outcomes and reducing inequalities; and, 4 By strengthening the validity of research studies aimed at assessing the impact of strategies and interventions.

Each of these roles is discussed below.

Figure 16.2  Socioeconomic gradient for Peru

Setting Realistic, Attainable Goals and Measuring Progress Socioeconomic gradients One of the key ‘tools of the trade’ for characterizing the educational performance of a schooling system is the socioeconomic gradient or ‘learning bar’ (Willms, 2018). The socioeconomic gradient is derived from a regression of an outcome on SES and SESsquared. It depicts the relationship between a schooling outcome and socioeconomic status for individuals in a specific jurisdiction, such as a school, a province or state, or a country (Willms, 2006, 2018).3 Figure 16.2 shows the socioeconomic gradient for reading for Peru, based on PISA data for 2015. The scale for the Y-axis on the left side is the continuous scale for reading performance, which has a mean of 500 and a standard deviation of 100 for all students in participating OECD countries. The scale for the Y-axis on the right side

The Measurement and Use of Socioeconomic Status in Educational Research

describes the six levels of reading achievement used in PISA. The X-axis is the PISA measure of SES, which was scaled to have a mean of zero and a standard deviation of one for the OECD countries. The gradient line is drawn from the 10th to the 90th percentiles of the SES scores. For Peru, the 10th and 90th percentiles for SES are −2.56 and 0.67 respectively. Therefore, 80% of the Peruvian students have SES scores in this range. The graph also includes a scatterplot of students’ reading and SES scores for a sample of 5,000 students. The gradient can be summarized with three components: The level of the gradient is defined as the expected score on the outcome measure for a student with a specified level of SES. In this example, SES was standardized on the OECD mean and standard deviation. The level for Peru is 437.8, which indicates that the expected score for a Peruvian student whose SES score was the OECD average has an expected reading score of 437.8. The slope of the gradient indicates the extent of inequality attributable to SES. Steeper gradients indicate greater inequality, while more gradual gradients indicate less inequality.4 The slope for the Peruvian gradient is 36.5 points on the PISA reading proficiency scale. This result indicates that for each one standard deviation increase in SES, the expected score increases by 36.5 points. The strength of the gradient indicates the proportion of the variation in the outcome that is attributable to SES. Generally, one uses R-squared as the measure of strength, which in this case is 0.252, indicating that 25.2% of the variation in reading scores is attributable to SES. The goal for most jurisdictions is to improve overall performance and reduce in­equalities, or to ‘raise and level the learning bar’. The gradient can be seen as a first step in characterizing the performance of a school system. Thus, it is useful even if its slope is

297

shallow and its strength is weak. It can be used to set realistic goals, defined in terms of effect sizes for both the level of the outcome and the extent of inequality. For example, a reasonable goal for a three-year period might be to improve average scores by 0.10 of a standard deviation while reducing the size of the gap in the outcome between the low- and high-performing sub-populations by 0.05 of a standard deviation.

School profiles Another useful tool is the school profile (Willms, 2018). It is simply a scatter-plot of the schools’ average outcome scores and their average SES scores. Figure 16.3 provides the school profile for reading for Peru, based on PISA data for 2015. Each dot represents a school, with the size of the dots proportional to the school enrolment. The dots can also have different colours or shapes to represent different types of schools, such as public and private schools. School profiles give an indication of the extent to which students are segregated into different types of schools, based on their outcome scores or their SES. Some school systems select students into different types of schools based on their ability or achievement at particular stages in their school career. These school systems tend to be vertically segregated (Willms, 2010). In most school systems, students with differing levels of SES are also separated into different schools. Some of this separation occurs through residential segregation in large cities, and when there are marked differences in SES between urban and rural schools. The presence of a strong private sector also contributes to separation based on SES. This is called horizontal segregation (Willms, 2010). The proportion of variance in the outcome measure that is between schools is an indicator of vertical segregation; the proportion of variation in SES that is between schools is an indicator of horizontal segregation (Willms, 2006).

298

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Figure 16.3  School profile for Peru

Assessing Equality of Outcomes and Equity of Provision Equality refers to differences among subpopulations in the distribution of their educational outcomes while equity refers to differences among sub-populations in their access to the resources and schooling processes that affect schooling outcomes (Willms, 2015). Measures of SES and poverty are useful for assessing the equality of outcomes and equity of provision. The distinction is characterized by the path model shown in Figure 16.4 (modified from Willms et al., 2012). Equality refers to differences in student outcomes, such as academic achievement associated with SES, or differences in student outcomes between high- and low-SES students, while equity refers to differences in students’ access to key measures of educational provision, such as quality instruction. SES gradients can be used for this purpose. When the

outcome measure and the variable denoting the student sub-­population are dichotomous variables, and one is concerned, for example, with the extent to which there is equality and equity in learning for students living

Figure 16.4  A model for assessing equality, equity and school effects

The Measurement and Use of Socioeconomic Status in Educational Research

in poverty, the estimation of relative risk and population attributable risk – two statistics commonly used by epidemiologists – can provide a useful summary for a range of outcomes and across multiple jurisdictions (e.g., see Nonoyama-Tarumi & Willms, 2010).

Identifying Interventions to Raise and Level the ‘Learning Bar’ Willms (2006, 2018) describes five types of intervention that can be implemented by a jurisdiction. The relationships between student outcomes and SES, which are depicted with the socioeconomic gradients and school profiles, can help discern which type of intervention or combination of interventions is most likely to raise and level the learning bar. The five types are as follows.

Universal interventions A universal intervention strives to improve the outcomes of all students in a jurisdiction. Curriculum reforms, reducing class size, changing the age-of-entry into kindergarten, or increasing the time spent on reading instruction are all universal interventions as they are targeted towards all students, irrespective of their SES.

Performance-targeted interventions An intervention targeted towards students with low levels of performance on an outcome is a performance-targeted intervention. For example, the Early Years Evaluation is used in several countries to identify the developmental skills of children aged 3–6 years as they prepare for and make the transition to formal schooling (The Learning Bar Inc., 2011). The data collected are used to classify students’ learning needs into three groups, based on their scores in five domains. The classification provides teachers with an indication of the type and amount of support required for each child.

299

As with universal interventions, the intervention does not depend on the students’ SES; only their scores on the assessment determine which students receive additional support. A ­performance-targeted intervention can also be at the school level; that is, it may provide extra support or services to low-performing schools, irrespective of their SES.

Risk-targeted interventions A risk-targeted intervention aims to provide additional support or resources for children deemed to be vulnerable. In this case, SES is often used as a measure of vulnerability. The distinction between a risk-targeted and an outcome-targeted intervention is that a risk-targeted intervention selects and intervenes with children who are deemed at risk, rather than those who have already been identified as having a poor developmental outcome. A Head-Start preschool programme for children from low-income families is a good example of a risk-targeted intervention. Risk-targeted interventions can also be implemented at the school level; for example, one might introduce a new reading programme into low-SES schools.

Compensatory interventions A compensatory intervention provides additional educational resources to students from low-SES backgrounds or students deemed to be at risk for other reasons. Like risk-targeted interventions, compensatory interventions target children who are considered to be at risk for one or more developmental outcomes. However, compensatory interventions are not focused on directly improving a particular outcome, such as reading or mathematics skills. Instead, they strive to improve the child’s socioeconomic circumstances with a view to improving all their educational outcomes. Providing free breakfast or lunch programmes or free textbooks for low-SES students are compensatory interventions.

300

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Inclusive interventions An inclusive intervention strives to include marginalized or disadvantaged children into mainstream schools. SES can play a role in inclusive interventions as well, as they intend to reduce horizontal segregation. The redrawing of school catchment boundaries in a school district with the goal of reducing SES segregation is an example. This framework is especially important in the analysis of data from large-scale international studies, as it can shift the focus away from the ranking of countries towards using data to inform national policies (Willms, 2018).

Strengthening the Validity of Experiments The data collected in an educational monitoring system, which includes measures of SES, can be used to design strong experiments to assess the effects of particular interventions. SES can be used to define a stratum for stratified random sampling (Särndal, Swensson, & Wretman, 1992). The population of potential participants is divided into SES groups (e.g., based on quintiles of SES) and then random samples are collected from each stratum. The mean SES of schools can also be used to define strata, depending on the nature of the experiment. SES can also be used in a randomized block design. For example, students can be divided into SES groups and then students within each SES group are randomly assigned to treatment conditions associated with the intervention. (Matts & Lachin, 1988) Using one or both of these approaches allows one to obtain more precise estimates of an intervention effect for students with differing levels of SES. Moreover, it allows one to discern whether there are treatmentby-SES interactions; that is, does the effect of an intervention differ for students depending on their SES.

V. IMPLICATIONS FOR EDUCATIONAL POLICY AND RESEARCH Validity When developing a new measure of SES, we advocate the use of Kane’s (2006) approach to establish a framework that links the SES measure to its intended uses. We considered four uses for educational policy: providing a context for setting attainable goals and measuring progress towards them; assessing the equality of outcomes and the equity of provision among advantaged and disadvantaged groups; determining the potential of interventions aimed at improving student outcomes and reducing inequalities; and strengthening the validity of research studies aimed at assessing the impact of strategies and interventions. The choice of factors to include in a measure is only one step in the interpretative argument. Unfortunately, there is no ‘gold standard,’ other than to say that occupational position, level of education, and family income are used most often. These factors are faithful to the Mueller and Parcel (1981) definition of SES as the relative position of a family or individual on an hierarchical social structure, based on their access to, or control over, wealth, prestige, and power. If the intent is to use a reflexive approach, this is a defensible way to begin. Thereafter, depending on the educational outcome of interest, one can include other factors, such as measures of social and cultural capital. The scoring procedure, which requires one to specify weights for each factor, can take many forms. At an aggregate level, for a school or community, for example, the SES scores do not vary substantially with differing weighting schemes. Using a principal component factor analysis provides a defensible strategy. Further steps in establishing validity would depend more directly on the use of SES for making policy decisions. One might ask questions such as: Does the measure provide

The Measurement and Use of Socioeconomic Status in Educational Research

results that are consistent with other economic measures, such as gross national income? Can we make inferences about a wide range of educational outcomes? Are results consistent across a range of contexts? Do they vary over time or with the age of the students? Some of these inferences can be validated with further analyses of the data, and others may require separate validation studies.

CONCLUSIONS AND RECOMMENDATIONS SES is a summary measure that is used in several different ways in making educational policy decisions. Most of these decisions tend to be relatively low stakes in the sense that they do not involve making decisions about individuals that directly affect their life course. The exceptions are SES-targeted and compensatory interventions, as they determine which children receive extra support or resources. For those kinds of decisions, we recommend conducting analyses using the constituent components of SES and adopting a more formative approach with a particular outcome in mind. Also, we recommend setting the cut-point quite high, such that more students receive the intervention. For most types of educational intervention that aim to improve student outcomes, the extra resources expended for students who are marginally vulnerable are not wasted. In large-scale international studies, when the goal is to explain variation in students’ outcome scores, the use of the constituent components of SES is almost always preferable to using a summary measure of SES. The use of the separate SES factors, in a regression analysis for example, can be used alongside other measures, such as gender, ethnicity, and various aspects of social and cultural capital. An exception to this rule is that in multilevel analyses, when one is including aggregate measures at the school or community level in the analysis to assess

301

composition effects, the correlations among the constituent SES factors tend to be quite high, such that there are problems of multicollinearity. In such analyses it is practical to use an aggregate measure of SES at the school or community level. Many research questions involve making comparisons among a large number of jurisdictions, such as countries or regions within a country, across an array of outcome measures, or changes in the level of an outcome over time. In those cases, we recommend using a summary measure of SES as it enables one to readily identify patterns in the results. This is perhaps the real strength of using SES in large-scale international studies as they enable one to assess inequalities in outcomes and inequities in the provision of the classroom and school resources.

ACKNOWLEDGEMENTS The authors are appreciative of Danielle Durepos, Beth Fairbairn and Alma Lopez for their comments on earlier drafts of this chapter.

Notes 1  In 2014, the OECD embarked on an ambitious study called PISA for Development (PISA-D). Its aim is to increase the relevance of PISA assessments in low- and middle-income countries by extending and modifying the achievement tests such that more accurate estimates of performance can be obtained for students scoring at the lower end of the scales. Also, the contextual questionnaires were enhanced to better capture the factors that are relevant in low- and middleincome countries, as well as capture data on the lower levels of education and lower levels of income and wealth. Moreover, the countries participating in PISA-D called for a measure of poverty as it pertains to children’s development. 2  These included the Literacy Assessment and Monitoring Programme (LAMP), Laboratorio Latinoamericano de Evaluación de la Calidad de la Educación study (LLECE), the Multiple Indicators Cluster Survey (MICS), Programme d’analyse des

302

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

systèmes éducatifs de la CONFEMEN (PASEC), and Southern and Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ). 3  The discussion of the socioeconomic gradient and school profile are based on Learning Divides: Using Monitoring Data to Inform Educational Policy (Willms, 2018). 4  The analysis involved a regression of reading scores on SES and SES-squared. In this case, the curvilinear component associated with SESsquared was very small and not statistically significant. When the relationship is curvilinear, one can report the slope when SES is zero.

REFERENCES Bagnell, A., Tramonte, L., & Willms, J. D. (2008). The prevalence of significant mental health problems among Canadian youth and their comorbidity with cognitive and health problems. Ottawa: Human Resources and Skills Development Canada. Becker, G. (1964). Human capital. New York: National Bureau of Economic Research. Blau, P. M., & Duncan, O. D. (1967). The American occupational structure. New York: Wiley and Sons. Bollen, K. A. (2002). Latent variables in psychology and the social sciences. Annual Review of Psychology, 53(1), 605–634. Collins, R. (1971). Functional and conflict theories of educational stratification. American Sociological Review, 36, 1002–1019. Duncan, O. D., Featherman, D. L., & Duncan, B. (1972). Socioeconomic background and achievement. New York: Seminar Press. Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of the relationship between constructs and measures. Psychological Methods, 5, 155–174. Entwislea, D. R., & Atone, N. M. (1994). Some practical guidelines for measuring youth’s race/ethnicity and socioeconomic status. Child Development, 65, 1521–1540. Epelbaum, M. (1990). ‘Sociomonetary patterns and specifications’. Social Science Research, 19(4), 322–347. Farkas, G. (2003). Cognitive skills and noncognitive traits and behaviors in stratification processes. Annual Review of Sociology, 29, 541–562.

Ganzeboom, H. B. G., De Graaf, P. M., & Treiman, D. J. (1992). A standard international socio-economic index of occupational status. Social Science Research, 21(1), 1–56. Ganzeboom, H. B. G., & Treiman, D. J. (2003). Three internationally standardised measures for comparative research on occupational status. In J. H. P. Hoffmeyer-Zlotnik & C. Wolf (Eds.), Advances in cross-national comparison: A European working book for demographic and socio-economic variables (pp. 159–193). New York: Kluwer Academic Press. Goldberger, A. S. (1989). Economic and mechanical models of intergenerational transmission. The American Economic Review, 79, 504–513. Green, L. W. (1970). Manual for scoring socioeconomic status for research on health behavior. Public Health Reports, 85, 815–827. Hannum, E., Liu, R., & Alvarado-Urbana, A. (2017). Evolving approaches to the study of childhood poverty and education. Comparative Education, 53(1), 81–114. International Labour Office (2012). International standard classification of occupations, ISCO-08. Retrieved from: www.ilo.org/ public/english/bureau/ stat/isco/ (accessed November 2016). Kane, M. (2006). Validation. In R. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Washington, DC: ACE-NCME. Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 173. Lareau, A. M., & Weininger, E. B. (2003). Cultural capital in educational research: A critical assessment. Theory and Society, 32, 567–606. Levin, H. M., & Belfield, C. R. (2002). Families as contractual partners in education. UCLA Law Review, 49, 1799–1824. Lien, N., Friestad, C., & Klepp, K.-I. (2001). Adolescents’ proxy reports of parents’ socioeconomic status: How valid are they? Journal of Epidemiology & Community Health, 55(10): 731–737. doi:10.1136/jech.55.10.731. Looker, E. D. (1989). Accuracy of proxy reports of parental status characteristics. Sociology of Education, 62(4), 257–276. doi:10.2307/ 2112830.

The Measurement and Use of Socioeconomic Status in Educational Research

Matts, J., & Lachin, J. (1988). Properties of permuted-block randomization in clinical trials. Control Clinical Trials, 9, 327–344. Mueller, C. W., & Parcel, T. L. (1981). Measures of socioeconomic status: Alternatives and recommendations. Child Development, 52, 13–30. National Center for Education Statistics (2012). Improving the measurement of socioeconomic status for the national assessment of educational progress: A theoretical foundation. Retrieved from: http://nces.ed.gov/ nationsreportcard/pdf/researchcenter/Socioeconomic_Factors.pdf Nonoyama-Tarumi, Y., & Willms, J. D. (2010). The relative and absolute risks of disadvantaged family background and low levels of school resources on student literacy. Economics of Education Review, 29(2), 214–224. Oakes, J. M., & Rossi, P. H. (2003). The measurement of SES in health research: Current practice and steps toward a new approach. Social Science and Medicine, 56, 769–784. Rutkowski, L., & Rutkowski, D. (2010). Getting it ‘better’: The importance of improving background questionnaires in international large-scale assessment. Journal of Curriculum Studies, 42(3), 411–430. doi:10.1080/ 00220272.2010.487546 Särndal, C. E., Swensson, B., & Wretman, J. (1992). Model assisted survey sampling. New York: Springer. Schulz, W. (2005). Measuring the socioeconomic background of students and its effect on achievement on PISA 2000 and PISA 2003. Online Submission. San Francisco, CA: ERIC. http://eric.ed.gov/?id=ED493510 Sims, V. M. (1927). The measurement of socioeconomic status. Bloomington, IL: Public School Printing Co. Sirin, S. R. (2005). Socioeconomic status and academic achievement: A meta-analytic review of research 1990–2000. Review of Educational Research, 75(3), 417–453. The Learning Bar Inc. (2011). Early Years Evaluation [Measurement instrument]. Retrieved from www.thelearningbar.com/

303

The Learning Bar Inc. (2018). www.thelearningbar.com/ Tramonte, L., & Willms, J. D. (2010). Cultural capital and its effects on education outcomes. Economics of Education Review, 29(2), 200–213. Tramonte, L., & Willms, J. D. (2018). New measures for comparative studies of lowand middle-income countries. Manuscript submitted for publication. Willms, J. D. (2006). Learning divides: Ten policy questions about the performance and equity of schools and schooling systems. Report prepared for UNESCO Institute for Statistics. Willms, J. D. (2010). School composition and contextual effects on student outcomes. Teachers College Record, 112(4), 1008–1037. Willms, J. D. (2015). Equality, equity and educational prosperity. Report prepared for UNESCO. Willms, J. D. (2018). Learning divides: Using monitoring data to inform educational policy. Montreal: UNESCO Institute for Statistics. Willms, J. D., & Somers, M.-A. (2001). Family, classroom and school effects on children’s educational outcomes in Latin America. International Journal of School Effectiveness and Improvement, 12(4), 409–445. Willms, J. D., Tramonte, L., Duarte, J., & Bos, S. (2012). Assessing educational equality and equity with large-scale assessment data: Brazil as a case study. Washington, DC: InterAmerican Development Bank. Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Lawrence Erlbaum. Wu, A. D., Li, Z., & Zumbo, B. D. (2007). Decoding the meaning of factorial invariance and updating the practice of multi-group confirmatory factor analysis: A demonstration with TIMSS Data. Practical Assessment Research & Evaluation, 12(3), 1–26.

This page intentionally left blank

PART IV

Lessons from International Comparisons of Student Behavior

This page intentionally left blank

17 Early Childhood Care and Education in the Era of Sustainable Development: Balancing Local and Global Priorities Abbie Raikes, Dawn Davis and Anna Burton

Early childhood care and education (ECCE) is emerging as a critical strategy for addressing lifelong inequities in learning and wellbeing. Scientific findings and backing from global funders have greatly accelerated integration of ECCE into country systems. The push forward on ECCE also highlights tensions that arise as countries interpret and apply global themes to a national context. Below we discuss the global forces influencing the design and implementation of the Sustainable Development Goals for ECCE, and we provide three examples of the balance between global and local influences on modalities of ECCE, the design of child development and service quality standards, and measurement. We conclude with future directions, including the importance of global and local partnerships in supporting the next generation of ECCE implementation.

THE SUSTAINABLE DEVELOPMENT GOALS: CONTEXT AND IMPLICATIONS FOR ECCE The start of the Sustainable Development Goals (SDGs) signals a shift in global expectations for children’s education, with ambitious goals for increased participation and achievement through post-secondary education. With new emphasis on equity and access to education for all children, the agenda builds on previous successes in expanding access to education while also addressing the extreme disparities in educational outcomes such as achievement of proficiency in reading and mathematics. These disparities divide children by race, region, family income, and gender, and are notable in all parts of the world (Global Education Monitoring Report, 2017a). It is estimated

308

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

that 43% of children in low- and middleincome countries age 5 and under are at risk of not meeting their development potential due to stunting and extreme poverty (Black et al., 2017). A key strategy for closing the gap in learning is to focus on early childhood development, which has great promise in reducing lifelong inequities in learning, health and overall well-being (Black et  al., 2017). Children’s development prior to school is critical for learning once school begins. Acknowledgement of the importance of early childhood development was evident in previous global development agendas such as Education for All (UNESCO, 2000), and in the Millennium Development Goals (MDGs), where emphasis was placed on reducing undernutrition and child mortality. The SDGs built on these previous commitments by including early childhood development in Goal 4 of the SDGs, focused on global education. Goal 4, Target 4.2 states that all children should start school having experienced quality early childhood education and services so that development is ‘on track’ at the start of formal schooling. Although early childhood development is mentioned in Goal 4 of the SDGs, early childhood development is integral to achievement of the SDGs’ primary goals of sustainable development and a more equitable world. Through support of young children’s development, goals on lifelong health and educational outcomes become easier to achieve (Raikes, Yoshikawa, Britto, & Iruka, 2017), because early childhood development is both dependent upon and contributes to lifelong health, nutrition, and educational outcomes (Black et al., 2017). The inclusion of early childhood development into the SDGs was heralded as a significant achievement for young children and their families, especially the most disadvantaged (Daelmans et al., 2017). Now that the goals have been ratified, attention is turning to implementation, and ensuring that countries follow through on commitments made to achieve the SDGs.

Below, a description of the SDGs and its impact on ECCE is offered. Next, we address the balance of global and local influence on the implementation of the SDGs, through the lens of three elements of ECCE systems: modalities of ECCE; the design of child development and service quality standards; and measurement.

The Purpose and Structure of the SDGs The SDGs are intended to coordinate across global organizations, including funding agencies, multilateral organizations and nongovernmental organizations, and encourage action at the country level through the prioritization of actions to promote equity and by creating accountability through reporting on goals. The breadth and complexity of the SDGs far exceeds that of previous global initiatives, such as the MDGs and Education for All (Raikes et al., 2017), and outlining an innovative new agenda is also likely to pose challenges to many countries in implementation. Although all 193 countries signed on to the SDGs, there is no formal process for ensuring accountability with the global agenda. Instead, countries are encouraged to take part and to share successes and challenges with the global community. Given the wide and open-ended structure of the SDGs, how will country action be encouraged? One powerful mechanism for encouraging action is the designation and reporting of common metrics to measure progress. Each of the 169 targets has been assigned at least one indicator for monitoring global progress, some of which are fully developed at present, but many require further development to agree on indicator definitions and expand data collection to include all countries (United Nations Statistical Commission, 2017). A focus on equity necessitates the prioritization of a globally-comparable measurement; without being able to reliably compare between and within countries, it

EARLY CHILDHOOD CARE AND EDUCATION IN THE ERA OF SUSTAINABLE DEVELOPMENT 309

is not possible to fully track equity in education. Global indicators ideally reflect widespread consensus on globally-relevant definitions of each target and its translation into measurement. In this way, debates on measurement outline fundamental questions regarding the appropriate balance between globally-relevant concepts and indicators and local interpretation. While previous global agendas also focused on education, the leadup to the SDGs created a more coordinated and far-reaching message on the urgency of investing in education and measuring results. Efforts such as the Learning Metrics Task Force (Force, L. M. T., 2013), initiated prior to the SDG adoption, demonstrated the process and appeal of articulating shared agendas to drive progress in measurement across organizations. Since the adoption of the SDGs, the Global Alliance for Monitoring Learning, hosted by UNESCO Institute for Statistics, has initiated work on creating globally-comparable indicators of learning to measure development across countries. The intention is to create new global indicators of learning within the next 3–5 years. The globally-comparable indicators for Target 4.2 include both an indicator of the ‘percent of children developmentally on track’ in health, psychosocial and cognitive development, and the percent of children with access to organized learning. Indicators of early health and nutrition status are included in other targets.

development and learning prior to school means across countries. But at present, there is no global definition of what ‘developmentally on track’ means for children prior to the start of school, even as global standards for learning in primary grades emerge (e.g., UNESCO Institute for Statistics, 2016), and especially when looking across high- and low-income countries (Gove & Black, 2016). The scientific basis of child development has moved forward considerably since Education for All, and there are clearly common patterns and influences on how children develop that can serve as the basis for global action and country-level implementation (Black et  al., 2017). However, as countries move into SDG implementation, the question of how universally-applicable science translates into ­country-level early child development (ECD) programmes and policies is not yet answered. In all, the SDGs present a new opportunity for early childhood development to be prioritized as a primary goal for investment, but also presents challenges to the field of early childhood stakeholders, experts and researchers to better define the balance between universal and locally-specific elements of early childhood policies, programmes and measurement. In ECCE, this balance manifests in unique ways from other goals. First, a global push for ECCE can result in more cohesive and unified advocacy and increased funding. At the same time, this push may also increase pressure to standardize the content, structure and modalities of ECCE to satisfy the expectations of global funders and govECCE’s Placement in the SDGs ernment ministers. Such standardization How will this new education agenda and the increases the extent to which the constructs introduction of many new globally-­ are comparable across countries, but it may comparable indicators influence local educaresult in imposing a degree of commonality tion practice? Relevant across all areas of that reverberates through early childhood syseducation, this question is important for the tems in unexpected ways. Second, the placedesign and implementation of early childment of ECCE within the broader education hood care and education (ECCE). Although goal of the SDGs sets the stage for building global measurement is just one point of early childhood programmes through the implementation of the SDGs, measurement education sector; yet protecting and nurturworks to create a common language, by ing young children require involvement from articulating definitions of what optimal child many different sectors, including health,

310

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

water, sanitation, and hygiene, social protection and nutrition. Third, ECCE that can be effective in ensuring children are ‘developmentally on track’ comes in many forms and modalities, including home visiting for children and families beginning before or at birth, c­ommunity-based preschool settings, and formal pre-primary settings, and focusing on education alone can mask the diversity of effective approaches. The wide range of early childhood services reflects the need for integration across health, nutrition and education services, and presents unique considerations for effectively implementing the SDGs. Because children’s development can be supported through several types of programme (Britto et  al., 2017; Rao, Sun, Chen, & Ip, 2017), the breadth and diversity in early childhood interventions is considerably larger than the formal pre-primary settings that often come to mind in relation to ‘school readiness’. Fourth, understanding the cultural influences on how ECCE is defined and delivered are critical for its success (Shonkoff, Radner & Foote, 2017); what works in one setting may not work in another. These tensions should be placed against the realities of ECCE. Many countries have early childhood care and education in place now, but do not have the infrastructure in place to achieve equity by ensuring access to quality ECCE for all children, and access to pre-primary education is strongly tied to family income in many countries (UNESCO, 2017a). The need to integrate services and the integration of health, nutrition and education requires building systems that work across sectors (Kagan & Kauerz, 2012). There are several elements to address as part of early childhood systems, including defining standards for early childhood care and education and defining a curriculum, training teachers, implementing monitoring systems, supporting early childhood programmes through financing and management, and measuring the effectiveness of programmes and interventions (Rossiter, 2016). There is presently great diversity in how these themes

are addressed in various countries, and a mix of types of provision, including private, ­community-based and public preschools and care facilities. Within the context of the SDGs and its new priorities for widespread implementation of ECCE and globally-comparable measurement, this chapter outlines three examples of how global and local priorities may be balanced in the next era of early childhood education: first, the role of standards within ECCE systems, given the breadth of modalities and the necessity of integrating health and nutrition programmes; second, cultural and contextual influences on the definition and implementation of ECCE; and finally, measurement, or tracking progress towards goals. Before outlining these issues, an overview of ECCE at the start of the SDGs is provided.

STATUS OF EARLY CHILDHOOD CARE AND EDUCATION AT THE START OF THE SDGs In nearly every country, young children are frequently cared for and educated outside parental care, beginning at birth and extending throughout the early childhood years. Enrolment in pre-primary education has grown substantially over the last decade and is estimated at almost 80% of all children entering primary school (UNESCO, 2017a). This often includes exposure to both education and care. Because more than 60% of all mothers worldwide work outside the home, there is a great need for out-of-home care for young children (Samman et  al., 2016). At most, half of these young children participate in early childhood care and education for a few hours a day (Samman et  al., 2016) and little is known about how these children spend the rest of their time. Most at risk are the 35 million young children in low- and middle-income countries who are left alone for several hours a day (Samman et  al., 2016); family or care by other young

EARLY CHILDHOOD CARE AND EDUCATION IN THE ERA OF SUSTAINABLE DEVELOPMENT 311

children is another common solution. Taken overall, the high number of children who are left alone with other children or do not have access to any pre-primary education demonstrates that many countries are presently failing to provide young children with safe, stimulating environments that support their development. The poor conditions experienced by millions of children should be placed within the context of the tremendous growth in robust science on early childhood development, leading to a deeper understanding of the universal mechanisms that influence child development. Over the last decade, neurological science has continued to document the importance of early childhood development in setting trajectories for health and learning throughout life (Engle et al., 2011; Grantham-McGregor et al., 2007; Richter et al., 2017). The science of neurobiological development has served as one mechanism for identifying the universally-relevant elements of child development. For example, a strong body of evidence demonstrates the biological, neurological and cognitive implications of exposing young children to prolonged stress through adversity (Shonkoff et  al., 2012). These universal mechanisms lend weight to the idea of a global science of early childhood development and justify country action to provide holistic programmes, with special emphasis on reaching those children most at risk. Exposure to risk, however, is just one part of the picture. Child, family and community resilience stems from strong relationships between and among children and their caregivers, and ECCE can also promote resilience. Although there is variation in how children are cared for by culture and context, there are many common themes in what characterizes strong, resilient families and communities (Masten, 2014), including a focus on promoting emotionally-nurturing connections between children and their ca­regivers, reliance on a cultural belief system, and for children, biological predisposition to

environmental responsiveness also plays a role. Similarly, children everywhere benefit from quality ECCE, although the potential benefits may be especially pronounced for some children. Overall, evidence of the effectiveness of various types of intervention has continued to grow, with strong evidence outlining the importance of preschool on children’s cognitive development in low- and middle-income countries (Rao et al., 2017) as well as in high-income countries, and on the role of home visiting and other programmes to promote children’s development even in the poorest of conditions. The effects of early childhood intervention have been reported to be greater for children experiencing greater adversity in some countries (e.g., Alderman & Vegas, 2011; Burger, 2010; Heckman & Masterov, 2007; Naudeau, Martinez, Premand, & Filmer, 2011), perhaps especially for the importance of pre-primary education in low- and middle-income countries versus higher-income countries (Rao et  al., 2017). Children living in low- and middleincome countries are more likely to have poorer scores on cognitive and social emotional development tasks than their peers in higher-income countries, and the difference in scores can be largely attributed to poor environments and a lack of access to early childhood education (McCoy et  al., 2017). For example, in Vietnam, there has been an increase in the number of children enrolled in preschool, but resources to support learning in the home have not increased over the past years (Giang et al., 2016), which can lead to muted effects of preschool on learning. Therefore, there are universal patterns in child development and diversity in the types of risk children experience in various parts of the world (Black et  al., 2017), the environmental conditions that influence development, and in children’s responsiveness to interventions. While the global goal of ensuring all children are prepared to learn was deemed relevant across countries through the United Nations process, there is variation in when and how developmental achievements

312

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

are manifested (McCoy et  al., 2017). For example, the Millennium Cohort Study found delays in gross motor development of white infants compared to non-white infants that were unexplained by family income, and also differences in achievement in fine motor development attributed to socio-economic status (SES) (Kelly, Sacker, Schoon, & Nazroo, 2006). Documentation of these types of inequities underscores the importance of developing a global measurement system that captures key developmental milestones while also considering and adapting to the cultural intricacies of each country. Although the SDGs encourage coordinated global action in early childhood development, the text that accompanies the SDGs acknowledges the importance of country-level viewpoints to define and measure goals (de la Mothe Karoubi, & Espey, 2016), lending weight to the idea that the goals can serve as a heuristic, to be further clarified and defined within country settings.

ECCE SYSTEMS AS A VEHICLE FOR IMPLEMENTATION How global influences will affect local settings can be understood by dissecting the early childhood systems that serve as the vehicle for implementation and looking across the web of influences on ECCE systems using a systems approach. An early childhood system refers to a set of coordinated and interwoven services and supports that ensures children and families receive timely and appropriate services, typically a mix of government and private provision. Kagan and Roth (2017) suggest examining ECCE systems through the lens of systems theory (Laszlo, 1996; von Bertalanffy, 1968), along with new institutional theory and complex adaptive theory (Joachim & May, 2010), with a focus on three macro-level system outcomes of quality of programmes and supports, equity, and sustainability. Drawing

from the ecological model stating that different levels of society work together to ensure that the child has the supports necessary for healthy development (Bronfenbrenner, 1977), the idea of ECCE ‘systems’ communicates the importance of many layers of integrated action to achieve quality ECCE. The wide range of services required for healthy development and the many organizations involved in delivering them has made early childhood a splintered field from the start (Kagan & Kauerz, 2012). Increased investments in ECCE in the USA, for example, have necessitated the development of early childhood systems that can provide coherence across services and populations of children and their families (Kagan & Kauerz, 2012), especially through provision of twogeneration models that support both children and families (Yazejian et al., 2017). The idea of ‘early childhood systems’ has been applied to global discussions of scaling up effective programmes for early childhood (Richter et al., 2017). In Turkey (Gören Niron, 2013) and in Latin America (Schady, 2015), it is seen as a way of conceptualizing the roles of government, non-profit and private sectors in early childhood provision. The health and strength of these systems is critical for the design and delivery of effective ECCE services. The World Bank has developed a framework for conceptualizing early childhood systems across countries through the Systems Approach for Better Education (SABER; World Bank, 2016). Results of the Systems Approach for Better Education-Early Childhood Development (SABER-ECD) have been used to identify stages of systems building in early childhood care and education across a range of low- and middle-income countries. While the ‘glue’ for these systems can vary by country, a common focus on child outcomes is often one uniting element across programmes and policies (e.g., StriveTogether; Grossman, Lombard, & Fisher, 2014). However, some have posited that an emphasis on ‘outcomes’ in early childhood development can come

EARLY CHILDHOOD CARE AND EDUCATION IN THE ERA OF SUSTAINABLE DEVELOPMENT 313

at the expense of a focus on processes, or the importance of a strong system that can holistically address the needs of children and families through high-quality ECCE staff (Pearson et al., 2017). This tension is likely to increase more in coming years, as measurement of child outcomes is explicitly outlined as part of the SDGs. Results to date have suggested common themes in early childhood systems across many countries, for example, the challenges of developing and implementing policies across health, nutrition, social protection and education sectors, and under-developed monitoring and evaluation systems that provide inconsistent information on quality and access to early childhood services (World Bank, 2016). The earlier these systems are in place the more effective they will be for children. For example, children who received nutrition supplements prior to 18 months showed higher levels of cognitive development than their counterparts once reaching pre-primary school (Ip et al., 2017). Despite the value of integrated systems, market-based countries, including highincome countries, may be especially prone to investing in a wide range of services without any overarching system in place to ensure coordination (Halfon et al., 2009). For countries that are successful in building systems, some of the success may be attributable to underlying cultural beliefs about equity; for example, Sweden often tops charts of comparisons between countries on early childhood development, perhaps reflecting beliefs about the responsibility of government to help individuals achieve their potential (Bremberg, 2009). Comparisons across countries are more difficult due to the diversity in beliefs about the purpose and role of government in early childhood systems. The effects of early childhood programmes vary between developed countries and developing countries (e.g., Rao et  al., 2017), demonstrating the importance of noting the role of context on both systems and the impact of programmes.

Three elements of early childhood systems will be highlighted here as the basis for understanding points of similarity and difference in early childhood systems across countries: (1) modalities of ECCE; (2) the standards that governments set to regulate the quality and availability of ECCE; and (3) the measurement that governments use to monitor quality and children’s development within countries. These three elements are also important to consider in the context of measuring and monitoring the quality and equity within ECCE and the SDGs.

Modalities of Early Childhood Care and Education ECCE starts before birth by ensuring access to health care and nutrition, and supporting parents as children’s first teachers, and continues through to the age of 8 years. ECCE is delivered through a variety of mechanisms, including parent education, home visiting, community-based programmes, and formal, school-based pre-primary settings. In almost every country, the mix of types of ECCE provision reflects a range of goals for young children’s experiences and development, from basic ‘care’ that does not explicitly focus on education to formal pre-primary settings that may be run through education systems (Kagan, Engle, & Britto, 2005). The range of options reflects the underlying need for early childhood care and education that arises when child care is needed. The high percentage of women working across the world has led to arrangements to care for children that include family (including other young children) as caregivers, informal or community-based provision and, more recently, increased investments in formal, school-based pre-primary settings. The diversity in early childhood systems is a notable contrast from primary and secondary education, which has a more consistent structure across countries. Most countries have a mix of public and private provision of ECCE, and in some

314

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

countries, more children attend private than public settings, thus influencing the degree of government involvement in ECCE. Terms refer to more than one type of provision. For example, ‘pre-primary education’ can refer to children at birth in some countries, and in others only refers to the one or two years of education immediately preceding enrolment in formal education (Kagan, Engle, & Britto, 2005). How this diversity plays out in the context of the new global education agenda is yet to be fully determined. Many countries invest in early childhood to address issues in equity in learning, but if some types of care offer considerably lower quality and fewer learning opportunities than other types of care, then equity is not promoted through early childhood (and inequities could be even further exacerbated). In many countries, children most at risk may be those who typically receive the lowest-quality services (e.g., Biersteker, Dawes, Hendricks, & Tredoux, 2016). This differential access to quality care, with higher-income families accessing highquality care, can result in further inequity and widen the achievement gap in later years. In sum, the range of providers, modalities and goals for early childhood programmes can provide responsiveness to family needs, but can also lead to chaotic, incoherent systems that are difficult to manage and evaluate for effectiveness (Kagan, & Kauerz, 2012). The lack of emphasis on building integrated systems that incorporate many types of care and take into account home as well as school environments diminishes the possible positive impacts of ECCE interventions on children’s development (Vargas-Barón, 2015).

ECCE ‘Standards’ to Guide Delivery of Services To provide some degree of coherence across early childhood systems and to emphasize the holistic nature of young children’s development, several efforts have been initiated to support countries in developing standards

that support the holistic development of young children. Standards can refer to basic regulations, such as health and safety standards, that comprise the groundwork for licensing in many countries; standards designed to ensure a certain level of quality, such as minimum standards for reliance on a curriculum; and workforce standards such as teacher education and professional development. Standards and regulations can also be the starting point for creating professional development systems, and for ensuring that all children receive care and education that meets basic standards for quality and safety (Kagan, Britto, & Engle, 2005). Notable among these efforts are the ‘Early Learning Development Standards’ (ELDS), which outline a process for countries to develop locally-relevant standards for early childhood education that can serve as a unifying force across types of provision. Kagan, Britto, & Engle (2005) report three common themes across countries participating in the ELDS: first, growing interest and commitment to early childhood education programmes; second, recognition that access to services alone is not enough to ensure children benefit, and that attention to quality and equity is critical; and third, fragmentation of services inhibits the ability for countries to reach their goals on early childhood. Institution of standards can help bring coherence to early childhood systems by introducing a common set of goals to guide implementation and quality monitoring.

Child development and learning standards While countries may share an overall goal of promoting equity through access to quality ECCE, there is also diversity in the specific goals for young children’s development that emerge between and within countries. As countries more intentionally articulate and formulate goals for early childhood systems, the commonalities and differences in how healthy development is defined for young children becomes more apparent. Existing

EARLY CHILDHOOD CARE AND EDUCATION IN THE ERA OF SUSTAINABLE DEVELOPMENT 315

research demonstrates a wide diversity in goals for early childhood education as articulated by teachers, parents and stakeholders (Prochner, Cleghorn, Kirova, & Massing, 2014). The goals reflect a balance between universal notions of early childhood education and care, serving as the basis for SDGs and other global initiatives, and the local interpretations and attitudes about early childhood that are central to effective teaching in a given context. For example, LeVine (2003) describes two distinct notions of early childhood care and education: the ‘pediatric’, which focuses on health, survival, and well-being, and the ‘pedagogical’, which focuses on education and encouragement of young children’s development (Prochner et al., 2014). These two perspectives on early childhood care and education receive inconsistent attention in global literature focusing on quality in early childhood settings, for example, yet are likely to have a strong influence on how early childhood care and education is defined and delivered in various settings. For example, teachers may need to adopt different strategies for engaging parents when parents are not members of the dominant culture and therefore do not necessarily share key assumptions on how and why young children should be educated (Chan & Ritchie, 2016). Teachers might also focus on supporting caregivers’ understanding of child development and encourage them to be a part of the development of children (Rao et al., 2014). The Early Learning Development Standards process, initiated by early childhood experts in the United States and with global expansion supported by UNICEF, provides an example of how universal elements of child development can be integrated with country perspectives to develop locally-­ relevant standards for child development. The content of the ELDS also provides a window into the degree of commonality across standards in many countries, a starting point for evaluating the feasibility of global measurement. In the East Asia region, for example,

each country tailored the standards to fit their purposes, resulting in shared constructs of social/emotional, physical, cognitive and language development, but not necessarily containing the same items. As well, some countries included moral/spiritual development and creativity as important domains of learning (Miyahara & Meyers, 2008), and included specific elements of cultural knowledge that are unique to each country. When completed, the ELDS are intended to inform teacher training, curricula and screening tools to identify developmental delays in young children (Lokuketagoda, Thalagala, Fonseka, & Tran, 2016). Analyses of common content areas in the ELDS in the United States revealed commonality in views of child development across states (Scott-Little, Kagan, & Freblow, 2006). A similar exercise between countries participating in the UNICEF ELDS initiative would yield important insights on the extent to which there are common content areas across countries as well.

Service quality standards Standards are also frequently set regarding achievement of basic quality standards in ECCE, which could be especially important to guarantee quality provision for disadvantaged children. Standards may be developed that address basic health and safety, licensing or registration of programmes, education or training standards for caregivers, or go beyond basic levels to include standards related to structural elements (such as materials and space, parent handbook and policies) and process (such as use of a curriculum, quality of interactions, etc.). Children’s participation in ECCE only leads to positive outcomes if programmes are of a good quality (Britto et  al., 2011; UNESCO, 2017b). While the SDGs include ‘quality’ as part of Target 4.2, the quality of ECCE is not consistently addressed in many countries (Anderson, Raikes, Kosaraju, & Solano, 2017; UNESCO, 2017b). Quality in ECCE has been defined as attention to several

316

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

levels, beginning with community alignment around the goals and purpose of early childhood programmes, support for infrastructure and personnel, and, most critically, emphasis on interactions between children, caregivers, families and staff (Boller et  al., 2011). Quality is also defined by programmes’ abilities to holistically address the needs of children and families. To date, cross-country comparisons of quality have often focused on indicators such as teacher–child ratios, teacher qualifications and the percentage of children with access to pre-primary education (UNESCO, 2017b). Yet these indicators do not apply to all types of ECCE and show inconsistent associations with quality (Neuman, Josephson, & Chua, 2015), suggesting that more in-depth and process-­ oriented indicators of quality may be required to fully capture quality in ECCE. Ideally, standards are informed by ­contextually-grounded research on what matters most for children’s learning in ECCE settings – said another way, what types of environment are most critical for helping children achieve desired outcomes. In early research focused on aspects of quality associated with children’s development, distinctions were made between ‘structural’ quality, which refers to the physical environment, teacher qualifications and access to materials, and process quality, which refers to the interactions and learning experiences children have at the school or facility. Structural quality promotes child outcomes through its effect on caregiver–child interactions, or process quality, which in turn predicts child outcomes in the United States (NICHD Early Child Care Research Network, 2002). Across 10 countries, structural quality indicators such as teacher–child ratios were inconsistent predictors of children’s development (Montie et al., 2006). In sum, evidence has generally supported the conclusion that process quality or caregiver–child interaction is the more reliable predictor of child outcomes. In keeping with science on the interconnected nature of young children’s

development, as noted above, a recent review found that comprehensive interventions that address health, nutrition and education lead to the greatest impact on children’s development when compared to programmes focused on one element alone (Rao et al., 2017). The elements of pedagogy that may be most strongly associated with positive impacts on children’s learning and development include providing stimulating activities and experiences for young children, ultimately increasing their primary school readiness (Moore, Akhter, & Aboud, 2008). Likewise, providing a rich and stimulating environment can help to increase children’s cognitive outcomes as compared to their peers (Aboud & Hossain, 2011). Ultimately, experiences that help to engage children and also stimulate learning help children make developmental gains (Kagitcibasi et  al., 2009). Research examining the cross-country gains through comprehensive early childhood education programmes is necessary to strengthen evidence on a global level, and further define what elements of ECCE can be especially powerful influences on child development. Findings on pedagogical and environmental quality in ECCE point in three main directions. First, quality of pre-primary settings matters for child development. However, second, some degree of provision may improve children’s development, especially in situations of extreme adversity (e.g., Rao et  al., 2012; Rao et  al., 2017), although the specific components of this provision have not yet been fully defined in research. Third, indicators of quality show somewhat inconsistent associations with child development across countries, suggesting that context and local conditions are likely to have a relatively strong impact on what is most critical for child outcomes. Although there is not a common definition of quality, studies from many countries have demonstrated that gradations of quality, defined by such indicators as teacher training, resources invested, and ratios in pre-primary settings, are associated with improved child outcomes both in the

EARLY CHILDHOOD CARE AND EDUCATION IN THE ERA OF SUSTAINABLE DEVELOPMENT 317

short and long term (Araujo, Carneiro, CruzAguayo, & Schady, 2016; Araujo, Dormal, & Schady, 2017). These indicators of quality in preschool have been associated with better PISA test performance across developed countries in some studies (Schütz, 2009), but not all (Montie et  al., 2006). The diversity in types of setting, goals for child development and context for programmes may make it difficult to generate global conclusions on quality. Training of the ECCE workforce can encourage alignment around goals for ECCE, by introducing ECCE professionals to a common curricula or pedagogical methods. A recent review found three main categories of ECCE providers, including certified health professionals, certified education professionals and non-certified para-professionals. Across countries, the ECCE field suffers from a lack of trained professionals, and ECCE professionals also may have low status (as cited by Chiparange & Saruchera, 2016; Neuman, Josephson, & Chua, 2015; Pearson et  al., 2017; Rodríguez, Banda, & Namakhoma, 2015; Sun, Rao, & Pearson, 2015). Moreover, although ECCE is most effective when delivered in an integrated fashion, few professionals have opportunities to receive training across sectors. In many countries, especially in low- and middleincome countries, training opportunities for early childhood professionals are not widely available (see Pearson et  al., 2017, for a review), and even in countries with training requirements for ECCE professionals, experiences are far from uniform (Early & Winton, 2001). Consistent and effective training may be a route towards increasing coherence in early childhood systems, but its reach could be limited. As well, the risk that training will inadvertently discourage the expression of cultural values and perspectives should also be acknowledged. Looking across, there are common elements in early childhood systems in many countries that resonate across countries and with the global dialogue on ECCE. These

common elements include similarities in the types of ECCE that are available, the demands on working parents and the reliance on standards to bring coherence to a diverse and complex set of priorities and goals for ECCE. At the same time, the diversity in the types of ECCE and the various goals that stakeholders have for ECCE create a complex landscape in most countries, with overall themes in common across countries but perhaps little that is directly analogous between countries. In many ways, the global agenda can set a frame for the discussion of ECCE, but both theoretical work and empirical findings suggest there is no substitute for local dialogue on the goals and priorities for ECCE, which in turn should help inform country responses to global calls for action. The complex dialogues that countries are having around ECCE and the creation of a framework for ensuring equity can be led by the SDGs, with room to adapt and measure outcomes based on the unique needs of the context.

Global and Local Balance in Measurement Countries are encouraged to participate in the SDGs through several mechanisms, including measurement. As part of the SDGs, indicators have been identified to track progress towards goals (Raikes et  al., 2017). A small set of these indicators are ‘global’, or are intended to be collected by all countries, and comparable across countries. National and regional indicators are also part of the monitoring framework. They are not intended to produce globally-­ comparable data, but instead to encourage national and regional action by producing a wider set of locally-relevant indicators (UNESCO Institute of Statistics, 2016). Internationally, the ability to track and monitor gains in learning between and within countries with globally-comparable data is a key lever of the SDGs (Raikes et al., 2017). There are several challenges inherent in

318

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

producing globally-comparable data on early childhood development. Perhaps the most central is the underlying assumption that there are aspects of early childhood development that are universally applicable and relevant and can be accurately measured across countries (UNESCO, 2017b). Another core assumption is that early education systems are similar in their approaches and content, in order to make meaningful comparisons on the extent to which equity is evident between and within countries. To produce relevant and meaningful globallycomparable data on early childhood development, it is necessary to share assumptions across stakeholders and experts on what ‘developmental potential’ looks like at the start of school, and what supports children need to achieve their developmental potential. Discussions on the global measurement of early childhood development began several years ago (Landers & Kagitbasci, 1990), and identified several key issues, including: the difficulty of adapting norms across countries; ensuring cultural applicability; and the importance of discriminating whether measures are intended to screen children, monitor across populations, or evaluate programmes. Thirty years after the dialogue began, several tensions associated with defining and producing globally comparable data on early childhood development are still apparent (Raikes, 2017). Most centrally, because there is no globally-agreed definition of ‘developmentally on track’ for children at the start of school, there is little consensus on which to base a measure. As described above, there are variations in the content and structure of standards that govern curricula and quality in ECCE, leading to wide variation at the ages at which children are exposed to ideas and activities that could be translated into measurement. At the same time, since the discussions on global measurement began, the scientific study of child development has advanced enough to outline universal elements of child development, especially

indicators of neurological development, which can presumably serve as the basis for global measurement. When used for national monitoring, the content and structure of tests should be suited for policy-making in relation to early childhood development, adding another dimension of policy relevance to early childhood assessments. Any measure used across a population must be short and efficient, especially if it is intended for use in low- or middle-income countries where resources for early childhood programmes (and particularly for evaluation and monitoring) are scarce. There must also be a separation in measures that are intended to look at individual assessment and population-level data (Rao & Pearson, 2015). The focus of the assessment must be made clear because each type is structured in a different way and looks at different information. Is creating a global measure of early childhood development a good idea? Evaluating global measurement of early childhood development begins with two basic steps: first, determining whether there are common constructs that are relevant across countries; and second, ensuring that the measurement used to assess those constructs is culturally sensitive and policy-relevant (UNESCO, 2017b). Studying the effect of measurement and data on country policies to promote equity is another important component. There is a small but growing body of research to date that empirically examines the feasibility and technical strength of global early childhood measurement (Raikes et  al., 2017). Taken overall, results suggest some degree of comparability between countries. The Multiple Indicator Cluster Surveys: Early Child Development Index (MICS ECDI); (Loizillon et  al., 2017), for example, demonstrates weak alignment with national standards and unclear relevance to child development in all countries (McCoy et  al., 2016), but also shows some evidence of construct validity, as demonstrated by associations with parenting and other predictors of child development

EARLY CHILDHOOD CARE AND EDUCATION IN THE ERA OF SUSTAINABLE DEVELOPMENT 319

(Frongillo, Kulkarni, Basnet, & de Castro, 2017). Regional assessments developed for use in Asia (the East-Asia Pacific Early Child Development Scales; Rao et al., 2017) and Latin America (the Regional Project on child Development Indicators (PRIDI); Inter-American Development Bank, 2015) show strong psychometric properties when used across countries. Some work indicates that many short tests may produce valid and reliable data (Rubio-Codina et  al., 2016), with some domains, such as motor development, easier to assess than others. There is also wide diversity in the purposes for which measures are developed, with some measures being more appropriate for populationlevel monitoring than others, such as those used to screen children for developmental delays and/or other diagnostic purposes. However, the emphasis on globally-­ comparable data risks masking the diversity in early childhood development across countries, as well as the range of approaches and types of interventions that can promote children’s development. Substantial work documents the cultural influences on young children’s development and the goals and nature of ca­regiving (e.g., Rogoff, 2003). Consequently, there is also diversity in how programmes are designed and delivered, and no clear set of standards for what should be included in preprimary curricula. Although there is clear agreement that children should develop basic learning competencies by the end of primary school, the pathways for achieving those outcomes, beginning in early childhood, are left open to country preferences and goals. To explore and address the tensions between global and local measurement of early childhood development and quality in learning environments, the Measuring Early Learning Quality and Outcomes project (MELQO) (UNESCO, 2017b) convened measurement experts and stakeholders to identify common constructs and develop measurement tools that could serve as a starting point for national adaptation. The members of the consortia concluded that

child development may have elements that could reliably be measured across countries, and they were less sure that quality could be defined and measured in a consistent way across countries. Based on integration of existing measures, MELQO’s work led to open-source modules for measuring child development and the quality of learning environments that have been used in several lowand middle-income countries.1 Developed as a suite of tools, the MELQO modules include both parent/teacher and direct assessments of child development and learning for children in pre-primary or at the start of primary school, and observational modules to assess the quality of learning environments as well as teacher/director surveys. Modules reflect the consensus among expert groups on the key constructs that are thought to be globally relevant, along with guidelines for national adaptation. The resulting scales were then tested in several countries, along with documentation of an extensive national adaptation process that was intended to ensure that both global and local priorities for children’s development and quality of learning environments were measured. Experiences using these scales to date suggest several themes. First, that there is a good deal of concordance between ‘global’ definitions of child development and learning at the start of school and national standards, but there are notable exceptions in areas such as cultural and environmental knowledge and moral and spiritual development. These are often included in national standards but are quite difficult to measure consistently and are not appropriate for global-comparability. Second, while national standards may outline goals and expectations for quality in children’s learning environments, in many classrooms, basic levels of quality are not consistently reached. Finally, capacity for sustained and reliable measurement of early childhood development and quality of learning environments in many countries is not present. An investment in research capacity, coupled with investments in ministry capacity for ongoing

320

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

measurement, is required to create locallyrelevant measures that can inform progress towards the SDGs.

CONCLUSIONS With the start of the SDGs, the emphasis on early childhood development has the potential to create strong, effective early childhood systems that can ensure the lifelong wellbeing of millions of children. While the global agenda has the potential to set the stage for country action and has identified several areas in which countries should invest, it is important to focus first on the needs of the countries, which may or may not point in the direction of ‘global’ solutions. This is perhaps especially important for low- and middleincome countries which are dependent on donors to develop and fund early childhood systems, and thus may find incentives for adopting specific policies and programmes. Future areas for research and action include the following. First, across all countries, but especially in low- and middle-income countries, capacity for basic research on child development and for applied research on highquality educational environments should be a high priority for investment (Raikes et al., 2017). While international efforts can be a starting point for setting standards and developing measurement systems, locally-relevant research is needed to inform the development of these systems and ensure that the standards and measurement are focused on the most critical elements of children’s development. This investment in local capacity helps to ensure sustainability and continuity in the efforts to increase quality and access to ECCE programmes. Local researchers and partners are best positioned to identify and address areas of particular need and concern within the country. Often there are individuals working to expand ECCE, but they have limited experience or knowledge in how to do so in a systematic and effective way.

Second, efforts should be made to build communities of practice across countries that can contribute to a ground-up, shared dialogue on the most important elements of child development and to find points of connection across countries, both in terms of creation of workable ECCE systems and to support implementation across modalities. Investing in local partners in order to address the needs and sustain development towards these goals will be much easier if they have others who have been through a similar process and faced similar challenges to offer support and guidance. By doing so, the approach becomes more systematic and lays the foundation for globally-comparable procedures and outcomes. Finally, the question of what a ‘global agenda’ means should be bidirectional: the experiences of countries participating in the SDGs should be documented and shared with global organizations repeatedly throughout the next 15 years, so that there is greater clarity on all sides as to the value and the challenges of global education agendas. ECCE development does not occur in a vacuum; it requires the engagement of many stakeholders, partners and collaborators across sectors, agencies and countries. Moreover, connections between high-, middle- and low-income countries are needed to encourage dialogue on these issues, and to avoid the tendency to group countries’ experiences based on income alone. As the SDGs unfold, several initiatives have been developed to help address these priorities. One necessary step is to align global definitions of ECCE with realities on the ground. When viewing early childhood development from the lens of education systems, there is often emphasis on formal, pre-primary settings that ‘educate’ young children. This division is reflected in the International Standard Classification of Education, which categorizes early childhood services that specifically differentiates programmes that ‘care’ for young children from those that ‘educate’ young children

EARLY CHILDHOOD CARE AND EDUCATION IN THE ERA OF SUSTAINABLE DEVELOPMENT 321

(UNESCO Institute for Statistics, 2011). As a result, there is more global information available on participation in formal, preprimary systems (e.g., UNESCO Institute for Statistics, 2016) than on other types of ECCE. While this distinction may be useful for policymakers, from a child’s point of view, this distinction is less meaningful in that care and education are inextricably linked in early childhood development. As well, children in many parts of the world still do not have access to formal pre-primary education, despite the increases in investment, meaning that millions of children experience environments that blend care and education (or in many cases, inadequately address both). Second, efforts should be made to include all types of ECCE in global measurement. Reflecting the importance of all kinds of early childhood development services for young children, UNICEF’s MICS survey (Loizillon et  al., 2017) includes a question on participation in any kind of early childhood development programme. Even with this broader definition, however, disparities in access are notable across and within countries. The global education agenda should prioritize expanding definitions of ECCE to include informal learning environments. Moreover, it should develop these definitions in response to the mix of programmes and priorities that parents articulate within different countries. A promising development is the inclusion of stimulating and supportive home environments in the measurement agenda for the SDGs. Together, the wealth of existing science and the momentum to implement early childhood programmes and policies can wield significant progress for young children. To maximize on this potential, the balance between global action and local implementation is delicate and requires a new kind of engagement and cooperation between researchers, implementers and policymakers. While it is challenging to work across such diversity in viewpoints and experiences, the resulting work is likely to take the science and practice

of ECCE to a new level, from which many children and families will benefit.

Note 1  Tools are available at: ecdmeasure.org

REFERENCES Aboud, F. E., & Hossain, K. (2011). The impact of preprimary school on primary school achievement in Bangladesh. Early Childhood Research Quarterly, 26(2), 237–246. doi:10.1016/j.ecresq.2010.07.001 Alderman, H. (Ed.). (2011). No small matter: The impact of poverty, shocks, and human capital investments in early childhood development. Washington, DC: The World Bank. Anderson, K., Raikes, A., Kosaraju, S., & Solano, A. (2017). National early childhood care and education quality monitoring systems. Washington, DC: Brookings Institution. Retrieved from https://www.brookings. edu/research/national-early-childhoodcare-and-education-quality-monitoring-­ systems/ Araujo, M. C., Carneiro, P., Cruz-Aguayo, Y., & Schady, N. (2016). Teacher quality and learning outcomes in kindergarten. The Quarterly Journal of Economics, 131(3), 1415–1453. Araujo, M. C., Dormal, M., & Schady, N. (2017). Child care quality and child development. Washington, DC: Inter-American Development Bank. Biersteker, L., Dawes, A., Hendricks, L., & Tredoux, C. (2016). Center-based early childhood care and education program quality: A South African study. Early Childhood Research Quarterly, 36, 334–344. Black, M. M., Walker, S. P., Fernald, L. C., Andersen, C. T., DiGirolamo, A. M., Lu, C., & Devercelli, A. E. (2017). Early childhood development coming of age: Science through the life course. The Lancet, 389(10064), 77–90. Bremberg, S. (2009). A perfect 10: Why Sweden comes out on top in early child development programming. Paediatrics & Child Health, 14(10), 677–680.

322

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Britto, P. R., Lye, S. J., Proulx, K., Yousafzai, A. K., Matthews, S. G., Vaivada, T., & MacMillan, H. (2017). Nurturing care: Promoting early childhood development. The Lancet, 389(10064), 91–102. Britto, P. R., Yoshikawa, H., & Boller, K. (2011). Quality of early childhood development programs in global contexts: Rationale for investment, conceptual framework and implications for equity. Social Policy Report, 25(2). Bronfenbrenner, U. (1977). Toward an experimental ecology of human development. American Psychologist, 32(7), 513. Burger, K. (2010). How does early childhood care and education affect cognitive development? An international review of the effects of early interventions for children from different social backgrounds. Early Childhood Research Quarterly, 25(2), 140–165. Chan, A., & Ritchie, J. (2016). Parents, participation, partnership: Problematising New Zealand early childhood education. Contemporary Issues in Early Childhood, 17(3), 289–303. Chiparange, G. V., & Saruchera, K. (2016). Preschool education: Unpacking dilemmas and challenges experienced by caregivers – a case of private sectors in Mutare urban-­ Zimbabwe. Journal of Education and Practice, 7(13), 129–141. Daelmans, B., Darmstadt, G. L., Lombardi, J., Black, M. M., Britto, P. R., Lye, S., & Richter, L. M. (2017). Early childhood development: The foundation of sustainable development. The Lancet, 389(10064), 9–11. de la Mothe Karoubi, E., & Espey, J. (2016). Indicators and a monitoring framework for Financing for Development (FfD). Retrieved from: http://unsdsn.org/wp-content/uploads/ 2016/03/Final-FfD-Follow-up-and-reviewpaper.pdf. Early, D. M., & Winton, P. J. (2001). Preparing the workforce: Early childhood teacher preparation at 2- and 4-year institutions of higher education. Early Childhood Research Quarterly, 16(3), 285–306. Engle, P. L., Fernald, L. C., Alderman, H., ­Behrman, J., O’Gara, C., Yousafzai, A., & Iltus, S. (2011). Strategies for reducing inequalities and improving developmental

outcomes for young children in low-income and middle-income countries. The Lancet, 378(9799), 1339–1353. Force, L. M. T. (2013). Toward universal learning: Recommendations from the learning metrics task force. Montreal and Washington DC: UNESCO Institute of Statistics and Center for Universal Education at Brookings. Retrieved from: https://www.brookings. edu/wp-content/uploads/2016/06/LTMF-­ RecommendationsReportfinalweb.pdf Frongillo, E. A., Kulkarni, S., Basnet, S., & de Castro, F. (2017). Family care behaviors and early childhood development in low-and middle-income countries. Journal of Child and Family Studies, 26(11), 3036–3044. https://doi.org/10.1007/s10826-017-0816-3 Giang, K. B., Oh, J., Kien, V. D., Hoat, L. N., Choi, S., Lee, C. O., & Van Minh, H. (2016). Changes and inequalities in early birth registration and childhood care and education in Vietnam: Findings from the Multiple Indicator Cluster Surveys, 2006 and 2011. Global Health Action, 9(1), 29470. Gören Niron, D. (2013). An integrated system of early childhood education and care governance in Turkey: Views of policy-makers, practitioners, and academics. Early Years, 33(4), 367–379. Gove, A., & Black, M. M. (2016). Measurement of early childhood development and learning under the Sustainable Development Goals. Journal of Human Development and Capabilities, 17(4), 599–605. Grantham-McGregor, S., Cheung, Y. B., Cueto, S., Glewwe, P., Richter, L., Strupp, B., & International Child Development Steering Group (2007). Developmental potential in the first 5  years for children in developing countries. The Lancet, 369(9555), 60–70. Grossman, A. S., Lombard, A., & Fisher, N. (2014). StriveTogether: Reinventing the local education ecosystem. Harvard Business School Case, 314–031. Halfon, N., Russ, S., Oberklaid, F., Bertrand, J., & Eisenstadt, N. (2009). An international comparison of early childhood initiatives: From services to systems. Journal of Developmental & Behavioral Pediatrics, 30(5), 471–473. Heckman, J. J., & Masterov, D. V. (2007). The productivity argument for investing in young

EARLY CHILDHOOD CARE AND EDUCATION IN THE ERA OF SUSTAINABLE DEVELOPMENT 323

children. Applied Economic Perspectives and Policy, 29(3), 446–493. Inter-American Development Bank (2015) Regional Project on Child Development Indicators (PRIDI). Retrieved from https://webimages.iadb.org/education/Instrumentos/ Marco_Conceptual.pdf Ip, P., Ho, F. K. W., Rao, N., Sun, J., Young, M. E., Chow, C. B., … & Hon, K. L. (2017). Impact of nutritional supplements on cognitive development of children in developing countries: A meta-analysis. Scientific Reports, 7. Jochim, A. E., & May, P. J. (2010). Beyond subsystems: Policy regimes and governance. Policy Studies Journal, 38(2), 303–327. Kagan, S. L., Britto, P. R., & Engle, P. (2005). Early learning standards: What can America learn? What can America teach? Phi Delta Kappan, 87(3), 205–208. Kagan, S. L., & Kauerz, K. (Eds.) (2012). Early childhood systems: Transforming early learning. New York: Teachers College Press. Kagan, S. L., & Roth, J. L. (2017). Transforming early childhood systems for future generations: Obligations and opportunities. International Journal of Early Childhood, 49(2), 137–154. Kagitcibasi, C., Sunar, D., Bekman, S., Baydar, N., & Cemalcilar, Z. (2009). Continuing effects of early enrichment in adult life: The Turkish Early Enrichment Project 22 years later. Journal of Applied Developmental Psychology, 30(6), 764–779. doi:10.1016/j. appdev.2009.05.003 Kelly, Y., Sacker, A., Schoon, I., & Nazroo, J. (2006). Ethnic differences in achievement of developmental milestones by 9 months of age: The Millennium Cohort Study. Developmental Medicine & Child Neurology, 48(10), 825–830. Landers, C., & Kagitcibasi, C. (1990). Measuring the psychosocial development of young children: The Innocenti Technical Workshop. Florence, Italy. Retrieved from www. ecdgroup. com/download/aa1mpdyi.pdf Laszlo, E. (1996). The systems view of the world: A holistic vision for our time. Cresskill (NJ): Hampton Press. LeVine, R. A. (2003). Childhood socialization: Comparative studies of parenting, learning and educational change. Hong Kong: University of Hong Kong.

Lokuketagoda, B. U., Thalagala, N., Fonseka, P., & Tran, T. (2016). Early development standards for children aged 2 to 12 months in a low-income setting. SAGE Open, 6(4), 2158244016673128. Loizillon, A., Petrowski, N., Britto, P., & Cappa, C. (2017). Development of the early childhood development index in MICS surveys. UNICEF: New York, NY, USA. Masten, A. S. (2014). Global perspectives on resilience in children and youth. Child Development, 85(1), 6–20. McCoy, D. C., Peet, E. D., Ezzati, M., Danaei, G., Black, M. M., Sudfeld, C. R., … & Fink, G. (2016). Early childhood developmental status in low-and middle-income countries: national, regional, and global prevalence estimates using predictive modeling. PLoS Medicine, 13(6), e1002034. McCoy, D. C., Peet, E. D., Ezzati, M., Danaei, G., Black, M. M., Sudfeld, C. R., … & Fink, G. (2017). Correction: Early childhood developmental status in low- and middle-income countries: National, regional, and global prevalence estimates using predictive modelling. PLoS Medicine, 14(1), e1002233. Miyahara, J., & Meyers, C. (2008). Early learning and development standards in East Asia and the Pacific: Experiences from eight countries. International Journal of Early Childhood, 40(2), 17–31. Montie, J. E., Xiang, Z., & Schweinhart, L. J. (2006). Preschool experience in 10 countries: Cognitive and language performance at age 7. Early Childhood Research Quarterly, 21(3), 313–331. Moore, A. C., Akhter, S., & Aboud, F. E. (2008). Evaluating an improved quality preschool program in rural Bangladesh. International Journal of Educational Development, 28(2), 118–131. doi:10.1016/j.ijedudev.2007.05.003 Naudeau, S., Martinez, S., Premand, P., & Filmer, D. (2011). Cognitive development among young children in low-income countries, In H. Alderman (Ed.), No small matter: The impact of poverty, shocks, and human capital investments in early childhood development (pp. 9–50). Washington, DC: The World Bank. Neuman, M. J., Josephson, K., & Chua, P. G. (2015). A review of the literature: Early childhood care and education (ECCE) personnel

324

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

in low-and middle-income countries’, Early Childhood Care and Education Working Papers Series 4. Paris: UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000234988? posInSet=1&queryId=3c00d841-776f-4f208a4c-a50d212f0561 NICHD Early Child Care Research Network (2002). Child-care structure → process → outcome: Direct and indirect effects of childcare quality on young children’s development. Psychological Science, 13(3), 199–206. Pearson, E., Hendry, H., Rao, N., Aboud, F., & Horton, C., Siraj, I., Raikes, A., & Miyahara, J. (2017). Reaching expert consensus on training different cadres in delivering early childhood development at scale in low-resource contexts: Technical report. London: Department for International Development, UK. Prochner, L., Cleghorn, A. A., Kirova, A., & Massing, C. (2014). Culture and practice in early childhood teacher education: A comparative and qualitative study. International Journal of Multidisciplinary Comparative Studies, 1(1), 18–34. Raikes, A. (2017). Measuring of early childhood development. European Journal of Education, 52(4), 511–522. Raikes, A., Yoshikawa, H., Britto, P. R., & Iruka, I. (2017). Children, youth and developmental science in the 2015–2030 global Sustainable Development Goals. Social Policy Report, 30(3). Rao, N., Sun, J., Pearson, V., Pearson, E., Liu, H., Constas, M. A., & Engle, P. L. (2012). Is something better than nothing? An evaluation of early childhood programs in Cambodia. Child Development, 83(3), 864–876. Rao, N., & Pearson, E. (2015). Assessment of child development across cultures. Assessment and Development Matters, 7(3), 7–9. Rao, N., Sun, J., Chen, E. E., & Ip, P. (2017). Effectiveness of early childhood interventions in promoting cognitive development in developing countries: A systematic review and meta-analysis. Hong Kong Journal of Paediatrics (New Series), 22(1), 14–25. Rao, N., Sun, J., Ng, M., Becher, Y., Lee, D. Ip, P., & Bacon-Shone, J. (2014). Validation, finalization and adaptation of the East AsiaPacific Early Childhood Development Scales (EAP-ECDS). Bangkok: UNICEF, East and Pacific Regional Office.

Richter, L. M., Daelmans, B., Lombardi, J., Heymann, J., Boo, F. L., Behrman, J. R., & Bhutta, Z. A. (2017). Investing in the foundation of sustainable development: Pathways to scale up for early childhood development. The Lancet, 389(10064), 103–118. Rodríguez, D. C., Banda, H., & Namakhoma, I. (2015). Integrated community case management in Malawi: An analysis of innovation and institutional characteristics for policy adoption (link is external). Health Policy and Planning, 30(suppl. 2), 74–83. Rogoff, B. (2003). The cultural nature of human development. Oxford: Oxford University Press. Rossiter, J. (2016). Scaling up access to quality early education in Ethiopia: Guidance from international experience. Young Lives Policy Paper 8, p. 24. Rubio-Codina, M., Araujo, M. C., Attanasio, O., Muñoz, P., & Grantham-McGregor, S. (2016). Concurrent validity and feasibility of short tests currently used to measure early childhood development in large scale studies. PLoS One, 11(8), e0160962. Samman, E., Presler-Marshall, E., Jones, N., Bhatkal, T., Melamed, C., Stavropoulou, M., & Wallace, J. (2016). Women’s work: Mothers, children and the global childcare crisis, a summary. Retreived from https://bernardvanleer.org/app/uploads/2016/07/EarlyChildhood-Matters-2016_12.pdf Schady, N. (2015). The early years: Child wellbeing and the role of public policy. New York: Springer. Schütz, G. (2009). Does the quality of pre-­ primary education pay off in secondary school? An international comparison using PISA 2003. Ifo Working Paper, No. 68, Munich: Ifo Institute – Leibniz Institute for Economic Research at the University of Munich. Scott-Little, C., Kagan, S. L., & Frelow, V. S. (2006). Conceptualization of readiness and the content of early learning standards: The intersection of policy and research? Early Childhood Research Quarterly, 21(2), 153–173. Shonkoff, J. P., Richter, L., van der Gaag, J., & Bhutta, Z. A. (2012). An integrated scientific framework for child survival and early childhood development. Pediatrics, 129(2), e460–72. doi: 10.1542/peds.2011-0366

EARLY CHILDHOOD CARE AND EDUCATION IN THE ERA OF SUSTAINABLE DEVELOPMENT 325

Shonkoff, J. P., Radner, J. M., & Foote, N. (2017). Expanding the evidence base to drive more productive early childhood investment. The Lancet, 389(10064), 14–16. Sun, J., Rao, N., & Pearson, E. (2015). Policies and strategies to enhance the quality of early childhood educators. Background paper for EFA Global Monitoring Report. Paris: UNESCO. Retrieved from: https://unesdoc. unesco.org/ark:/48223/pf0000232453 UNESCO (2000). Education for all: Global synthesis. Paris: UNESCO. https://unesdoc. unesco.org/ark:/48223/pf0000120058?posI nSet=5&queryId=d5904126-b399-4af0bbd9-2d9c71ac6e31 UNESCO (2017a). Accountability in education: meeting our commitments; Global education monitoring report, 2017/8. Paris: UNESCO. http://unesdoc.unesco.org/ images/0025/002593/259338e.pdf UNESCO (2017b). Overview of MELQO – ­Measuring Early Learning Quality and Outcomes. Washington, DC: UNESCO. UNESCO Institute for Statistics (2011). International standard classification of education. Paris: UNESCO. Retrieved from http://uis. unesco.org/sites/default/files/documents/ international-standard-classification-of-­ education-isced-2011-en.pdf

UNESCO Institute for Statistics (2016). The cost of not assessing learning outcomes. Paris: UNESCO. Retrieved from: http://uis.unesco. org/sites/default/files/documents/the-cost-ofnot-assessing-learning-outcomes-2016-en_0. pdf United Nations Statistical Commission (2017). https://undocs.org/A/RES/71/313 Vargas-Barón, E. (2015). Policies on early care and education: Their evolution and some findings. Paris: UNESCO. Retrieved from http:// www.education2030-africa.org/images/ talent/Atelier_melqo/Policies_on_early_­ childhood_care_and_education_-­_ their_­ evolution_and_some_impacts.pdf von Bertalanffy, L. (1968). General System Theory: Foundations, development, applictions. New York: Braziller. World Bank (2016). SABER: Early childhood development. Washington, DC: World Bank Global Education Practice. Retrieved from: http://wbgfiles.worldbank.org/documents/ hdn/ed/saber/supporting_doc/brief/SABER_ ECD_Brief.pdf Yazejian, N., Bryant, D. M., Hans, S., Horm, D., St Clair, L., File, N., & Burchinal, M. (2017). Child and parenting outcomes after one year of Educare. Child Development, 88(5), 1671–1688.

18 Equity of Access to Pre-Primary Education and Long-Term Benefits: A Cross-Country Analysis G e r a rd F e r re r- E s t e b a n , L a r r y E . S u t e r and Monica Mincu

INTRODUCTION An early investment in education is likely to give more returns to society than a late investment (Heckman, Stixrud, & Urzua, 2006). One of the most comprehensive and articulated approaches comes from crossdisciplinary research from economics to developmental psychology and neurobiology. Those studies reveal that a series of common principles could explain the effects of the early schooling environment on the development of human abilities (Knudsen et al., 2006). Early childhood education environments are a good place to foster the social and cognitive development of children, an experience with a unique influence on the development of skills and the brain’s maturation (Knudsen et  al., 2006). Starting from this cross-disciplinary framework, this chapter aims to understand, through cross-country data, to what extent formal schooling received during early childhood makes a significant difference in development for regular school

achievement. Specifically, we explore how equitable is the access to early childhood education as well as the long-term associations between preschool experience and the students’ cognitive outcomes. First, we explore the students’ background factors associated with access to pre-primary education, that is, to what extent some key characteristics of the family background are associated with greater or lesser access to pre-primary education. Our study confirms that the probability of access to pre-primary education is higher among students of high socio-economic status. In addition, this probability varies depending on the immigrant background of the students with a low socioeconomic status. Second, we analyze to what extent attending pre-primary education is associated with obtaining better academic outcomes in the subsequent levels of the school system, specifically among 15-year-old students, who are typically enrolled in the lower secondary education. Our results confirm that the attendance

EQUITY OF ACCESS TO PRE-PRIMARY EDUCATION AND LONG-TERM BENEFITS

to pre-primary education is significantly associated with learning, especially for those students with low socio-economic status. All students benefit from having attended, but students with low socio-economic status seem to benefit the most, even if they attended for a short period of time. In the following paragraphs we will review the main results of the research literature on the impact of early childhood education on cognitive and non-cognitive outcomes, the conditions of implementation of effective pre-primary education programs, and the effects of early environments for disadvantaged students. Then we address the main research questions, the method and empirical strategies, the results and the key conclusions. We finally discuss the results in terms of policy implications.

BACKGROUND In this section we present the main empirical evidences with respect to the impact of early childhood education on both the cognitive and the social non-cognitive outcomes of students. We also present the main conclusions of the research on the incidence of early experiences on the cognitive and social development of the most disadvantaged students, alluding to students from less well-off families and students with an immigrant background. The research literature on the effects of participating in pre-primary education confirms that children who have attended early childhood education before entering primary education have higher achievement later in school. Evidence from correlational analysis of cross-sectional surveys indicates that preprimary education is associated with better results and that the relationship is significant and robust. With data from the PISA study, the OECD confirm that the 15-year-old students who attended early childhood education tend to obtain better results than those who did not attend, once the socio-economic status of the

327

students is accounted for (OECD, 2013c). In a complementary way, the Education Endowment Foundation (EEF) estimates, after reviewing numerous meta-analyses on pre-primary education, that being exposed to early childhood education interventions or programs could provide up to five months of additional academic progress (Education Endowment Foundation, 2018). The impact of pre-primary education on the academic success of adolescents is even the most important compared to other cognitive or social non-cognitive outcomes. However, early childhood education services have been associated in a robust way with the development of children’s social skills (Camilli, Vargas, Ryan, & Barnett, 2010). For instance, a significant number of experimental studies have addressed the impact of early childhood development programs on non-cognitive and social outcomes, using a comprehensive set of adolescent outcomes. Early childhood education programs have a long-term impact on non-cognitive outcomes, such as the social deviance, social participation, cognitive development, involvement in criminal justice, family well-being, and social-emotional development (Manning, Homel, & Smith, 2010). It is also important that we pay attention to the conditions of the provision of pre-primary education and its association with impact on student ability. This is relevant as many countries have a tradition of differentiated provisions, with different emphases on the type and approach of early childhood education. As regards the conditions that may favor the positive impact of pre-primary education programs, experimental research indicates that the quality of the early educational experiences affects the level of benefits reported. In fact, aspects such as the training of teachers and the quality of the interaction between teachers and children seem to have more impact than the length of provisions or the eventual change of the physical learning environment (Education Endowment Foundation, 2018). For example, some studies have provided evidence that confirms that high-quality education programs

328

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

during the preschooling years generates positive results in language and mathematics between the ages of 7 and 11 years, in comparison with those children who did not attend or attended low-quality preschool education (Sylva et al., 2013). Another crucial factor to consider when evaluating the influence of pre-primary education is the differential access of students according to their socio-economic background as well as the differential effects that exposure to pre-primary education have on children’s cognitive and non-cognitive outcomes. One of the conclusions of the OECD is that enrollment rates are higher among students who come from wealthy families than for students from disadvantaged backgrounds (OECD, 2013b). An important body of literature contains a number of studies that have analyzed the effects of early childhood education on disadvantaged children. Those studies show how exposure in pre-primary education may promote the attainment of all children, but especially those with disadvantaged backgrounds (Belfield, Nores, Barnett, & Schweinhart, 2006). Using the database EPPSE (Effective Pre-school, Primary and Secondary Education) from England, Hall et  al. (2013) found that attending to pre-primary education could be a protective factor for those students who are socially vulnerable to the extent that the early childhood educational service is focused on promoting language and ‘soft skills’. In this same sense, the EEF, in its synthesis of the main experimental research findings, points out that attending programs and interventions of early childhood education is particularly beneficial for children who come from socially disadvantaged families (Education Endowment Foundation, 2018). Nevertheless, the experimental research also points out that the positive impact of early childhood education programs tends to be reduced over time, which means that the role of these programs in reducing the in­equalities in subsequent years is rather limited (Education Endowment Foundation, 2018).

Another social group that in many countries is in a situation of inequality, showing numerous shortfalls in terms of learning but also of access to early childhood education, is the group of immigrant students. This is a group that in 2012 showed throughout the OECD countries an average of 69% of enrollment in early childhood education programs, which is about seven percentage points less than the native children (OECD & EU, 2015). This percentage contrasts with the rate of attendance in pre-primary education observed in countries where it is free. In these countries the rate is around 90%, which means that the differences in access with respect to native children is negligible (OECD & EU, 2015). Apart from the differences related to access to pre-primary education, literature also points out that, depending on the country, there are significant learning gaps between first-generation, second-generation, and native children. In a study conducted in 10 countries through three large international studies, the author pointed out that immigrant children tended to achieve lower scores (Schnepf, 2004). In general, one of the factors that explain learning differences, in particular in the USA and the UK, is the language spoken at home and language skills in the host country’s language (OECD, 2006). This same result is the one found by other research studies focused on analyzing differences in language performance as well as in other cognitive and behavioral domains. One key finding is that the main gap is created between immigrant and native students in terms of linguistic skills, especially when the language spoken at home is different from that of the host country (Washbrook et al., 2012). The analysis to be presented in this chapter will re-examine this finding with analysis of survey data of 15-year-old students who have provided retrospective reports of their childhood attendance in schools.

EQUITY OF ACCESS TO PRE-PRIMARY EDUCATION AND LONG-TERM BENEFITS

329

RESEARCH OBJECTIVES

EMPIRICAL STRATEGIES

The research objectives are focused on the analysis of access and outcomes of attending pre-primary education. First, we want to explore the students’ background factors associated with access to pre-primary education, that is, to what extent some key characteristics of the family context are associated with greater or lesser access in pre-primary education. Second, we want to analyze the extent to which attending pre-primary education is associated with obtaining better academic outcomes in the following levels of the school system, specifically among 15-yearold students, who typically are enrolled in the lower secondary education.

Strategy 1: Background Factors Related to Access to Pre-Primary Education

DATA AND METHOD Data According to the two main research objectives, analyses are carried out using one or more waves of the PISA study, the international survey coordinated by the Organization for Economic Cooperation and Development (OECD) that began in 2000 with a cyclic character of three years. The PISA study has the main objective of collecting comparable information on the outcomes in reading, science and mathematics of 15-year-old students. In all the waves, the main competences evaluated have been reading comprehension, mathematics and scientific competence. In each edition, one of these areas of competence becomes the main evaluation area and is analyzed in depth: reading comprehension, waves 2000 and 2009; mathematics, waves 2003 and 2012; scientific competence, waves 2006 and 2015. The main advantage of this study is that it provides rich information about students and their socio-economic and cultural context, as well as the school they attend. All this information can be linked to their academic performance to draw conclusions in terms of efficiency and equity in school systems.

To explore the relationships between access to pre-primary education and factors of students’ background, logistic regression models with a binary response variable and multinomial logistic models with a nominal response variable have been estimated using the 2012 wave of the PISA study.1 The aim of the models is to determine to what extent certain background factors are associated with access to preprimary education. Since we are estimating parameters of models to predict an event from the students’ past, we made a strong assumption according to which, between 3 and 15 years of age of the children, some individual characteristics – such as immigrant background or socio-economic status (SES)2 – are invariable or almost invariable over time. The baseline equation for the logistic regression models is the following: logit {Pr ( preprimaryisc = 1| X isc )} = α + γ 1 ESCS   ic,s + γ 2 immig  ic,s

+ γ 3 Fi c,s + γ 4 Ssc + γ 5 ( ESCS   ic,s *immig  ic,s ) + δ c + ε ic,s

(1) where preprimaryisc is the response variable that indicates whether the student i, in school s, in country c, attended pre-primary education. In the logistic models, the response variables related to access took two values: participation to pre-primary education (value  1) with respect to non-participation (value 0). In addition, it is also relevant to see the possible interactions between the family background factors that are associated with an increase or a decrease in the probability of attending pre-primary education. The aim here is whether there is a differential access of students with different immigrant background

330

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

(first-generation and second-generation students and native students) depending on the socio-economic background of the family. Given the nonlinear nature of the logistic analysis, results are interpreted in two different ways: odds ratios and average marginal effects. The odds ratios measure the probability of y = 1 (attendance to pre-primary education), relative to the probability that y = 0 (non-attendance); p is the probability that y  =  1 divided by the probability that y = 0 (1-p). On the other hand, the average marginal effects are expressed as percentages and indicate how the increase of x is associated with the increase or decrease of the probability of y = 1 (attendance). As for the independent variables, marginal effects are interpreted in comparison with the reference – base – value in the case of dummy variables, while they are interpreted as a 1-unit change in the scale of the continuous explanatory factors. Alternatively, we performed multinomial logistic models to explore the strength of the effect of factors that are associated with the probability of attending pre-primary education for either short-term or long-term intervals. The baseline equation for the multinomial logistic regression models is the following: mlogit  {Pr ( preprim _ time isc = j |Wisc )} = α + γ 1ESCS   ic,s + γ 2 immig   ic,s + γ 3 Fic,s + γ 4S sc + γ 5

( ESCS

 c   i ,s

+ δ c + ε ic,s

*immig   ic,s )

(2)

where preprim_timeisc indicates whether the students attended early childhood education for one year or less, more than a year, or never. Here we take advantage of the fact that the 15-year-old students reported the time they attended pre-primary education in the PISA student questionnaire. Multinomial logistic models are characterized by having nominal response variables. In this case, the response variable has three categories (j = 1,2,3): non-attendance (value 1),

attendance for one year or less (value 2), and participation for more than one year (value 3). For an appropriate identification of the model, one of the three sets of coefficients is arbitrarily normalized to 0 (so there are j-1 sets of estimated coefficients). The other coefficients are interpreted in reference to the base outcome. We have explored the probability of attending pre-primary education for one year or less β(2) with respect to not attending (β(1) = 0), and attending pre-primary education for more than one year β(3) with respect to not attending (β(1) = 0).

Strategy 2: Participation in PrePrimary Education and Academic Performance The second empirical strategy aims to analyze the relationship between participation in pre-primary education and the long-term cognitive outcomes of students. Different models have been estimated in which the students’ academic scores have been regressed with the level of participation in pre-primary education, holding constant some relevant factors that may be involved in this association at the individual and school level. The objective of these models is to observe variations in the academic performance of students based on variations in levels of participation in early childhood education. The education function to explore the relationship between performance and participation in early childhood education is expressed in the following baseline equation: performanceic,s = α + γ 1 Preprimary1ci ,s + γ 2 Preprimary 2ci ,s + γ 3 Fi c,s + γ 4 Ssc + δ c + ε ic,s (3)

where performanceic,s is the response variable that indicates the students’ performance of student i in school s and country c in the PISA tests. Student performance is here a function of several characteristics at school

EQUITY OF ACCESS TO PRE-PRIMARY EDUCATION AND LONG-TERM BENEFITS

and individual level, plus a country fixedeffect. In the case of continuous factors, the coefficients will be interpreted as score variations per one-point increase of the explanatory variable. In the case of categorical factors, the coefficients will indicate score variations associated with a category with respect to a reference category.

EXPLANATORY AND CONTROL VARIABLES As the explanatory variables of interest of equation (2), we used the index of socioeconomic and cultural status (ESCS), provided in the OECD PISA dataset, and the information regarding the immigrant background of the students. In equation (3), both variables were used as control variables. Regarding the variable relative to immigrant students, we consider that its inclusion in the function, as both explanatory and control factor, is relevant insofar as they belong to a typically disadvantaged social groups which, in early ages, are likely to show important deficits, especially in terms of language skills (Washbrook et al., 2012). In fact, it has been said that early childhood education is crucial to reduce this gap, thanks both to free access (OECD & EU, 2015) and to the exposure to early childhood education (Crul & Schneider, 2009). On the other hand, we use the ESCS index since it allows us to grasp the complexity of the social and family reality of the students. For instance, it allows us not to underestimate the effects of the students’ social origin, which happen when working with only one component of social origin (Bukodi & Goldthorpe, 2013). The ESCS index for PISA 2012 included an index of home possessions (comprising items related to family wealth, cultural possessions and home educational resources, as well as books in the home), the highest parental occupation, and the highest parental education expressed as years of schooling (OECD, 2014).

331

However, we are also aware that the PISA dataset covers a great variety of cultural and social national and subnational realities. The meaning of the ESCS index is very likely different from country to country, and PISA currently includes a number of low-income countries where the majority of the population has a low status, so results could be heavily weighted by such populations. In addition, concerns have been raised with the current index, including highly variable reliability by country, poor model-to-data consistency on several subscales and poor cultural comparability (Rutkowski & Rutkowski, 2013). This has been the reason why we have decided to include in the analyses only the OECD countries. We also included the interaction terms (ESCSic,s * immigic,s) to explore to what extent the probability of attending pre-primary education by immigrant background depends on the socio-economic background of the student family (equation (2)). As the explanatory variables of interest of equation (3), we used two dummy variables which capture the frequency with which the student attended early childhood education. Preprimary1ci ,s represents having attended early childhood education for a year or less, while the Preprimary2ci ,s refers to having attended pre-primary education for more than a year. Including both variables in the function supposes interpreting the coefficients as the difference observed with respect to the base category, which in this case is not having participated in early childhood education. To attenuate the risk of omitted-vari­ able bias, we control for several variables. We included the vector Fi c,s, which covers a series of individual and family characteristics of the student. Specifically, it includes socio-demographic variables such as the ESCS index (as control factor in equation (3)), immigrant background (as control factor in equation (3)), sex (being a female), and having repeated a grade.

332

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

As for the school-level control factors, we include the vector Ssc , which covers a set of school-level characteristics. Here we account for whether the school is publicly funded, the school size, the school’s location, a proxy of school quality (aggregated performance), and the aggregated index of the students’ ESCS. Because the measurement survey includes a large number of countries, we included parameters to measure the level of country fixed-effects (δc) in all the specifications. Country fixed-effects are expected to capture any structural, cultural and systemic differences between the OECD countries.

RESULTS Background Factors Related to Access to Pre-Primary Education In the results, as a general trend across countries, we see how students with a high socioeconomic background are up to three times more likely to access pre-primary education than students with low socio-economic status (Table 18.1). Likewise, students with medium to high socio-economic status show double the probability of access to preprimary education than students from less well-off families. This represents that students with medium to high and with high status have 5% and 3% more probability of attending pre-primary education than disadvantaged students. Regarding the immigrant status, we see significant differences between the access of first-generation students and native students, but they are not observed between the native and the second-generation students. After accounting for the family socio-economic background, we note that native students are almost three times more likely to access preprimary education than first-generation students who arrived in the host country under 6 years of age.

As regards the first-generation students who arrived in the host country aged 6 or more years, our parameters for first-generation students could be a valid estimate of the origin country’s influence on the relationships examined, although we cannot deepen our understanding as we do not have information on the country of origin. A significant result in this sense is that students who came from other countries aged more than 6 years had a very low probability of accessing and being exposed to early childhood education services in their own country of origin. Specifically, the probability of having had access to pre-primary education of firstgeneration students, who spent their early childhood in the country of birth, is up to eight times lower than native students.

Relationship between background and duration of attendance to early childhood education It becomes relevant to analyze the results that we have previously discussed but separating the students according to the frequency with which they attended early childhood education. We show in Figure 18.1 results from multinomial logistic models, in which we explore the probability of attending pre-primary education for a year or less with respect to not attending, and of attending for more than a year with respect to not attending, depending on the immigrant background and the family social status. As an average result across countries, we observe first that the greatest social divide between social groups occurs when comparing attending for long periods of time with respect to not attending. For example, as shown in Figure 18.1, the most noticeable differences between students with a high status and students coming from disadvantaged families is the probability of accessing pre-primary education programs for more than a year. Specifically, students from wealthier families are up to four times more likely to be exposed to long periods of early childhood education. A similar phenomenon occurs between native students and first-generation students

333

EQUITY OF ACCESS TO PRE-PRIMARY EDUCATION AND LONG-TERM BENEFITS

Table 18.1  Probability of Students attending pre-primary education: coefficients, odds ratios and marginal effects

Second-generation students First-generation students (arrival =6 years-old) Low-mid SES Mid-high SES High SES School-level controls Country fixed-effects Observations Pseudo R2 Second-generation students First-generation students (arrival =6 years-old) Low-mid SES Mid-high SES High SES Significance Second-generation students First-generation students (arrival =6 years-old) Low-mid SES Mid-high SES High SES

(1)

(2)

(3)

Coefficients

Odds ratio

Marginal effects

0.21 −0.992 −1.897 0.335 0.676 1.055 Yes Yes 290,116 0.423 Standard Errors (0.13) (0.16) (0.09) (0.04) (0.05) (0.08)

*** *** *** *** ***

1.233 0.371 0.150 1.398 1.966 2.871 Yes Yes 290,116 0.423

0.009 −0.044 −0.084 0.015 0.030 0.047 Yes Yes 290,116 0.423

(0.16) (0.06) (0.01) (0.06) (0.10) (0.23)

(0.01) (0.01) (0.0) (0.0) (0.0) (0.0)

*** *** *** *** ***

*** *** *** *** ***

Source: Prepared by authors using OECD-PISA 2012. Note: Results from logistic regression models. Robust standard errors (clustered by school) in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01. In addition to the variables in the table, individual controls include sex, having repeated a grade and academic performance in the PISA tests. At the school level, the following variables were accounted for: school size, private ownership, school location and social composition. Country fixed effects are included to control for structural, systemic and cultural differences between countries. Weights at the individual level. Missing values are imputed with missing dummies.

when we refer to programs lasting more than a year. Looking at the results shown in Figure 18.1, we can say that those first-generation students who arrived in the host country at the age of entering pre-primary education have a probability up to three times lower than native students. This difference is not observed with second-generation students. There are no significant differences in terms of access between native and second-generation students to pre-primary education for more than one year.

Interaction Effects between Students’ SES and Immigrant Background in Explaining Access to Pre-Primary Education Immigrant status interacts with the socioeconomic status of students to explain the probability of accessing pre-primary education. The first relevant result shown in Figures 18.2 and 18.3 is the same as indicated above: the probability of access to preprimary education increases along with the

334

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

3.8

4.0 3.5 3.0 2.5 2.0 1.5

2.3

2.0 1.6

1.0

1.2

1.2 0.6

0.5

1.6

1.5

0.3

ISCED 0: For one year ISCED 0: For more than or less one year Immigrant background (ref. Native students)

ISCED 0: For one year or less

High SES

Mid-high SES

Low-mid SES

High SES

Mid-high SES

Low-mid SES

First-generation students

Second-generation students

First-generation students

0.0 Second-generation students

Probability of attending for one year or less / for more than one year (ref. not attending)

4.5

ISCED 0: For more than one year

Students' SES (ref. Low SES)

Figure 18.1  Probability of students attending pre-primary education (odds ratios) Source: Prepared by authors using OECD-PISA 2012. Note: Results from logistic regression models (odds ratio). Robust standard errors (clustered by school) in parentheses. Columns with dotted lines indicate non-significant odds ratios. In addition to the variables in the graph, individual controls include sex, having repeated a grade and academic performance in the PISA tests. At the school level, the following variables were accounted for: school size, private ownership, school location and social composition. Country fixed effects are included to control for structural, systemic and cultural differences between countries. Weights at the individual level. Missing values are imputed with missing dummies.

socio-economic status. The second relevant result is that such probability varies depending on the immigrant background of students. In Figure 18.2 we show the discrete differences in the probability of accessing the pre-primary education of native and second-generation students. We see that it varies according to their socio-economic level. As we observe in Figure 18.2a, the probability for second-generation students with a low social status of accessing preprimary education is higher than that of native students with the same socio-economic background. For its part, Figure 18.2b shows the differences in probabilities. Specifically, it points out that the differences that we indicated in the probability of accessing childhood education between native and second-generation students are statistically significant only when they come from less well-off family backgrounds.

On the other hand, in Figure 18.3 we show the discrete differences in the probability of accessing the pre-primary education of native versus first-generation students. In this case we also observe that the probability varies according to the socio-economic level (Figure 18.3a). In this case, native students are those who have a higher probability of accessing preprimary education, regardless of their background. In fact, we see that there is always a significant difference between a native’s probability with respect to the probability of a first-generation immigrant student accessing early childhood education (Figure 18.3b). Differences are therefore significant both between native and first-generation students from well-off families, and between native and first-generation students from disadvantaged families.

EQUITY OF ACCESS TO PRE-PRIMARY EDUCATION AND LONG-TERM BENEFITS

335

Figures 18.2a and 18.2b  Probability of students attending pre-primary education by immigrant background and SES: native versus second-generation students Note: Results from logistic regression models (categorical by continuous interaction, marginal effects). Robust standard errors (clustered by school) in parentheses. The range plot is delimited by the upper and lower confidence bounds. The horizontal line at the 0 value indicates no difference in the probability. In addition to the variables in the graph, individual controls include sex, having repeated a grade and academic performance in the PISA tests. At the school level, the following variables were accounted for: school size, private ownership, school location and social composition. Country fixed effects are included to control for structural, systemic and cultural differences between countries. Weights at the individual level. Missing values are imputed with missing dummies.

Figure 18.3a and 18.3b  Probability of students attending pre-primary education by immigrant background and SES: native versus first-generation students Note: Results from logistic regression models (categorical by continuous interaction, marginal effects). Robust standard errors (clustered by school) in parentheses. The range plot is delimited by the upper and lower confidence bounds. The horizontal line at the 0 value indicates no difference in the probability. In addition to the variables in the graph, individual controls include sex, having repeated a grade and academic performance in the PISA tests. At the school level, the following variables were accounted for: school size, private ownership, school location and social composition. Country fixed effects are included to control for structural, systemic and cultural differences between countries. Weights at the individual level. Missing values are imputed with missing dummies.

Participation in Pre-Primary Education and Academic Performance The second research objective is to analyze into what extent attending pre-primary

education is associated with obtaining better academic outcomes, as well as to establish whether the benefits of attendance depend on the socio-economic status of the students. The first analysis consists of observing the differences of scores in mathematics, reading

336

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

and science of 15-year-old students according to whether they attended pre-primary education. The main differences observed are among those who attended pre-primary education for more than one year with respect to those who did not attend. As shown in Figure 18.4, the performance differences oscillate between 8 and 9 points in all the tested disciplines. Having attended pre-primary education for one year or less also shows a positive effect with respect to students who did not attend, especially in the case of reading, in which there is no significant difference with having attended more than one year. The cases of science and mathematics are especially remarkable. The differences in scores between not attending and having received

early education for more than one year is almost twice the difference between not attending and participating in pre-primary education for one year or less. The question we ask ourselves at this point is who benefits the most from participating in early education programs. We have carried out subsample regressions according to the socio-economic status of the students. We have applied the conditional models separately with students from high and low socioeconomic backgrounds in two subject areas, mathematics and reading. In Table 18.2, the coefficients indicate the difference of scores between having attended pre-primary education for a year or less and having attended for more than a year, with respect to nonattendance, which is the omitted category.

Science

+8.3

For one year or less

+5.4

Reading

For more than one year

Mathematics

Ref: Students not attending preprimary education

For more than one year

+8.8

For one year or less

+7.9

For more than one year

+8.3

For one year or less

+4.7 0

2

4

6

8

10

Difference in performance

Figure 18.4  Cross-country analysis: difference in students performance according to attendance in pre-primary education Source: Prepared by authors using OECD-PISA 2012. Note: Results from OLS models. Robust standard errors (clustered by school) in parentheses. Columns with dotted lines indicate non-significant odds ratios. In addition to the variables in the graph, individual controls include sex, immigrant background, SES, having repeated a grade. At the school level, the following variables were accounted for: school size, private ownership, school location and social composition. Country fixed effects are included to control for structural, systemic and cultural differences between countries. Weights at the individual level. Missing values are imputed with missing dummies.

337

EQUITY OF ACCESS TO PRE-PRIMARY EDUCATION AND LONG-TERM BENEFITS

Table 18.2  Cross-country analysis: difference in students performance according to attendance in pre-primary education by sub-samples (low versus high SES students) Subject

SES Level

Coefficients

Significance of the coefficients

Attendance Attendance Observations for one year for more than or less one year

R2

Standard errors Attendance for one year or less

Ref. Non-attendance Mathematics Low SES students

Reading

Attendance for more than one year

Ref. Non-attendance

7.4

9.67

73,209

0.476 (2.63) (2.27)

High SES students

1.47

8.37

72,012

0.470 (3.23) (3.41)

Low SES students

10.81

9.29

73,209

0.427 (2.63) (2.35)

High SES students

4.43

9.04

72,012

0.408 (3.09) (3.25)

***

*** **

***

*** ***

Source: Prepared by authors using OECD-PISA 2012. Note: Results from OLS models. Robust standard errors (clustered by school) in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01. In addition to the variables in the table, individual controls include sex, immigrant background, having repeated a grade. At the school level, the following variables were accounted for: school size, private ownership, school location and social composition. Country fixed effects are included to control for structural, systemic and cultural differences between countries. Weights at the individual level. Missing values are imputed with missing dummies.

The first noteworthy result refers to the trend already observed, according to which receiving early childhood education for more than one year has a significant and positive impact, regardless of the social origin of the students. The second relevant result refers to having received pre-primary education for a short period of time. In this case, observed differences were different for students of high and low socio-economic background. Specifically, the results show that students with low socio-economic status are more sensitive to participation in early childhood education. In this case, it is relevant to having attended pre-primary education for more than one year, but it seems equally important simply having received it even for a short period of time. Instead, for students who come from socially advantaged environments, having received early childhood education interventions for one year or less does not show any significant effect. In fact, it is the same null effect observed among students of high status who never attended the pre-primary education.

CONCLUSIONS AND POLICY IMPLICATIONS The first relevant conclusion we have obtained is that access to early childhood education differs greatly according to the students’ family background. The probability of accessing pre-primary education programs by welloff families also increases according to the frequency with which these programs were attended. In contrast, disadvantaged families are not only less likely to attend pre-primary education programs for more than a year, but they are also less likely to be exposed to these programs for short periods of time. Moreover, this differential access is not only observed following the socio-economic gradient, but also relates to the families’ immigrant background. First-generation students are systematically less likely to access early childhood education compared to native students , while second-generation students with a low socioeconomic status have a higher probability of accessing pre-primary education with respect to native students. Only in the case of

338

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

second-generation students do there are no significant differences in access compared to native students when they show a medium or high socio-economic and cultural status. The second conclusion that is worth reporting is the robust relationship between preprimary education and the benefits in terms of learning. Although we are fully aware that when using PISA cross-sectional data, it is not appropriate to talk of causal effects, we have obtained robust correlations between attending early childhood education and academic results after holding constant a large part of the systematic differences observed at the individual, school and country level. This cross-sectional relationship is so robust that it begs to be tested with a longitudinal followup survey of students. We also observe that the frequency with which children are exposed to pre-primary education makes a difference. We observe systematically in all the tested subjects – mathematics, reading and science – that children participating in early childhood education for more than one year, with respect to not attending, obtain higher results than children attending for one year or less. When, at this point, we ask who benefits the most, we see that there are no significant differences between students from wealthy families and students from disadvantaged backgrounds when they are exposed for long periods of time. Both groups of students equally benefit from participating in early childhood education programs and interventions for more than a year. In contrast, there are significant differences when participation in early childhood education occurs for only one year or less. The students from advantaged backgrounds do not show any benefit from having attended pre-primary education for short periods of time in relation to never having attended. Instead, we see a significant benefit for students who come from less enriched environments in economic, social and cultural terms. All in all, our results make Torsten Husén’s words still relevant today (Husén, 1972).

Although it is essential to continue moving towards a formal equality of access to preprimary education, regardless of the social origin of the students, it is not enough. The results confirm the need to adopt what he calls a more radical conception of equal educational opportunities. This means that early childhood education institutions must contribute to achieving higher levels of equality of school attainments for all students. Only in this way can the eventual gaps generated in the environments of children from disadvantaged families be compensated. In this same sense, the investigations carried out recently from economic, neurobiological and behavioral perspectives are relevant. The main conclusion of interdisciplinary research in relation to pre-primary education in fact indicates that the most effective way to strengthen the future workforce is to invest in the formal and informal learning environments of disadvantaged children during their early childhood, since early experiences have a determining influence on cognitive and social development (Knudsen et al., 2006).

Notes 1  The 2015 wave of PISA could not be used because there have been substantial changes in the formulation of the question regarding attendance to pre-primary education. In this edition it was asked about the age of students when they joined the pre-primary education, leaving the possibility to answer ‘I do not remember’. The main problem here was that this last option left an excessive amount of missing values across OECD countries (13%, which were added to the existing 5% of missing values). In addition, the fact that the questionnaire only asked about the age of entrance to pre-primary education implies that the time of permanence cannot be inferred accurately. 2  Although intergenerational social mobility is particularly high in some countries, such as Australia, Canada and the Nordic countries (OECD, 2010), a substantial change in socio-economic status happens beyond post-compulsory education, when young people get higher returns to education in terms of qualified jobs, earnings and other non-monetary returns.

EQUITY OF ACCESS TO PRE-PRIMARY EDUCATION AND LONG-TERM BENEFITS

REFERENCES Belfield, C. R., Nores, M., Barnett, W. S., & Schweinhart, L. (2006). The High/Scope Perry Preschool program: Cost-benefit analysis using data from the age-40 followup. Journal of Human Resources, 41(1), 162–190. Bukodi, Erzsébet, & Goldthorpe, John H. (2013). Decomposing ‘social origins’: The effects of parents’ class, status, and education on the educational attainment of their children. European Sociological Review, 29(5), 1024–1039. Camilli, G., Vargas, S., Ryan, S., & Barnett, W. S. (2010). Meta-analysis of the effects of early education interventions on cognitive and social development. Teachers College Record, 112(3), 579–620. Crul, M. & Schneider, J. (2009). The Second Generation in Europe: Education and the Transition to the Labor Market. Washington, D.C.: Migration Policy Institute. Retrieved from: www.migrationpolicy.org/research/second-generation-europe-education-and-transition-labor-market (accessed 26 August 2016). Education Endowment Foundation (2018). Early Years Interventions: Teaching & Learning Toolkit. London: Education Endowment Foundation. Hall, J., Sylva, K., Sammons, P., Melhuish, E., Siraj-Blatchford, I., & Taggart, B. (2013). Can pre-school protect young children’s cognitive and social development? Variation by center quality and duration of attendance. School Effectiveness and School Improvement: An International Journal of Research, Policy and Practice, 24, 155–176. Heckman, J. J., Stixrud, J., & Urzua, S. (2006). The effects of cognitive and noncognitive abilities on labor market outcomes and social behavior. Journal of Labor Economics, 24(3), 411–482. Husén, T. (1972). Social Background and Educational Career: Research Perspectives on Equality of Educational Opportunity. Paris: OECD Publishing. Knudsen, E. I., Heckman, J. J., Cameron, J. L., & Shonkoff, J. P. (2006). Economic, neurobiological, and behavioral perspectives on building America’s future workforce. Proceedings of the National Academy of Sciences, 103(27), 10155–10162. Manning, M., Homel, R., & Smith, C. (2010). A meta-analysis of the effects of early

339

developmental prevention programs in at-risk populations on non-health outcomes in adolescence. Children and Youth Services Review, 32, 506–519. OECD (2006). Where Immigrant Students Succeed: A Comparative Review of Performance and Engagement in PISA 2003. Paris: OECD Publishing. OECD (2010). A family affair: Intergenerational social mobility across OECD countries. In Economic Policy Reforms: Going for Growth. Paris: OECD Publishing. OECD (2013a). PISA 2012 Database. Retrieved from: http://www.oecd.org/pisa/data/ (accessed 2 July, 2018). OECD (2013b). PISA 2012 Results, Excellence through Equity: Giving Every Student the Chance to Succeed (Vol. II). Paris: OECD Publishing. OECD (2013c). PISA 2012 Results: What Makes Schools Successful? Resources, Policies and Practices (Vol. IV), Paris: OECD Publishing. OECD (2014). PISA 2012 Technical Report. Paris: OECD Publishing. OECD & EU (2015, July 2). Indicators of Immigrant Integration 2015: Settling In. Paris: OECD Publishing. Retrieved from: www. oecd-ilibrary.org/social-issues-migrationhealth/indicators-of-immigrant-integration-2015settlingin_9789264234024-en (accessed 26 October 2018). Rutkowski, D., & Rutkowski, L. (2013). Measuring socio-economic background in PISA: One size might not fit all. Research in Comparative and International Education, 8(3), 259–278. Schnepf, S. V. (2004). How Different are Immigrants? A Cross-Country and Cross-Survey Analysis of Educational Achievement. IZA Discussion Paper No. 1398. Retrieved from: ftp. iza.org/dp1398.pdf (accessed 16 July 2018). Sylva, K., Sammons, P., Chan, L. S., Mehuish, E., Siraj-Blatchford, I., & Taggart, B. (2013). The effects of early experiences at home and pre-school on gains in English and mathematics in primary school: A mutilevel study in England. Zeitschrift fur Eziehungswissenschaft, 16, 277–301. Washbrook, E., Waldfogel, J., Bradbury, B., Corak, M., & Ghanghro, A. A. (2012). The development of young children of immigrants in Australia, Canada, the United Kingdom, and the United States. Child Development, 83(5), 1591–1607.

340

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

APPENDIX Change in the predicted probabilities

5%

+4.7%

+4.2%

4%

+3.1%

3%

2%

+1.5%

1% 0% -1% -2%

-1.1% Second-generation

First-generation

Immigrant background (ref. Native)

Low-mid SES

Mid-high SES

High SES

Students' SES (ref. Low SES)

Figure 18.5  Probability of students attending pre-primary education: average marginal effects Source: Prepared by authors using OECD-PISA 2012. Note: Results from logistic regression models (marginal effects). Columns with dotted lines indicate non-significant marginal effects. In addition to the variables in the graph, individual controls include sex, having repeated a grade and academic performance in the PISA tests. At the school level, the following variables were accounted for: school size, private ownership, school location and social composition. To control for the structural, systemic and cultural differences between countries, country fixed effects are included. Weights at the individual level. Missing values are imputed with missing dummies.

341

EQUITY OF ACCESS TO PRE-PRIMARY EDUCATION AND LONG-TERM BENEFITS

Table 18.3  Probability of students attending pre-primary education: coefficients and odds ratios Statistics Coefficients ISCED 0: For one year or less Second-generation students First-generation students Low-mid SES Mid-high SES High SES ISCED 0: For more than one year Second-generation students First-generation students Low-mid SES Mid-high SES High SES School-level controls Country fixed-effects Observations Pseudo R2

Standard errors

Odds ratio

Coefficients

Odds ratio

Significance Coefficients

Odds ratio

0.445 0.493 0.220 0.448 0.708

1.560 0.611 1.246 1.565 2.030

(0.14) (0.19) (0.05) (0.06) (0.09)

(0.21) (0.12) (0.06) (0.09) (0.19)

*** *** *** *** ***

*** *** *** *** ***

0.196 1.146 0.400 0.815 1.339

1.217 0.318 1.492 2.259 3.814

(0.14) (0.16) (0.04) (0.06) (0.09)

(0.17) (0.05) (0.07) (0.13) (0.34)

*** *** *** ***

*** *** *** ***

Yes Yes 290,140 0.254

Yes Yes 290,140 0.254

Source: Prepared by authors using OECD-PISA 2012. Note: Results from multinomial logistic models. Robust standard errors (clustered by school) in parentheses. * p < 0.10, ** p < 0.05, *** p < 0.01. In addition to the variables in the table, individual controls include sex, having repeated a grade and academic performance in the PISA tests. At the school level, the following variables were accounted for: school size, private ownership, school location and social composition. Country fixed effects are included to control for structural, systemic and cultural differences between countries. Weights at the individual level. Missing values are imputed with missing dummies.

19 Primary Education Curricula across the World: Qualitative and Quantitative Methodology in International Comparison Dominic Wyse and Jake Anders

Provision for children from age 5 to age 11 is a vital phase in formal education settings worldwide. During this primary education phase (also called elementary education in some countries) children should make the journey from acquiring basic understanding to developing the bases for ways of thinking that will be used throughout their lives. Primary education is also an essential preparation for secondary education. The international importance of primary education is recognized by the United Nations Educational, Scientific and Cultural Organization (UNESCO) as part of their ‘17 goals to transform our world: Goal 4: Ensure inclusive and quality education for all and promote lifelong learning’ (UNESCO, 2017, online). However, in a report in 2017 it was found that ‘more than 387 million children of primary school age (about 6 to 11 years old) and 230 million adolescents of lower secondary school age (about 12 to 14 years old)’ were not achieving minimum proficiency levels in reading and mathematics (UNESCO, 2017, p. 2).

The idea of proficiency in any subject area is strongly related to the curriculum, in the activities that children encounter in their classrooms, in the organization of learning throughout schools, and hence in the specification of countries’ national curricula. The curriculum that children experience in formal education settings is one of the fundamental elements that determines the nature of their learning. Reading and mathematics are important aspects of any primary curriculum, and are subject to the most common comparative tests, but the nature of the whole curriculum, its aims and areas/school subjects has to be taken into account when seeking to understand the effectiveness of national curricula. This chapter begins by exploring definitions of curriculum and linking this with the place of knowledge in relation to curriculum studies. The chapter reveals that although knowledge is a prime focus of the curriculum studies field, it is far from clear how this might guide the most appropriate content for modern curricula in the range of cultural

PRIMARY EDUCATION CURRICULA ACROSS THE WORLD

contexts that are represented by different countries. Having established a theoretical frame, the main part of the chapter features an analysis of the traditions and methodology of qualitative comparison of countries’ curricula compared with quantitative measures of different curricula. One of the most impressive qualitative comparative studies of recent years is used as the basis for a detailed comparison of findings and methods for international comparison. This section of the chapter leads to reflections on the representation of oral language development as part of national curricula and classroom practice, as identified in this seminal study. Qualitative research methodology to explore primary curricula, which underpins this study, is then compared with quantitative methodology represented by international comparative work. This section of the chapter features some examples of curriculum issues, such as the teaching of reading, that are represented in the questions that are asked in surveys that are part of the methodology. The examples chosen to illustrate curriculum comparison and its methodology are all representations of the complexities of how knowledge can be selected and represented in curricula. For example, the presence, or lack of presence, of oral language in curricula is compared with the ways in which the teaching of reading (which draws on oral language competence) is represented. Strengths and limitations of both methodologies are considered with the aim to suggest stronger designs in future. The chapter concludes with some suggestions for future research possibilities for the comparison of countries’ curricula.

CURRICULUM AND KNOWLEDGE Definitions of curriculum differ according to the disciplinary context, and by the scope and scale of the conception of curriculum. For example, from a sociological perspective, curriculum has been defined as ‘the principle by

343

which units of time and their contents are brought into special relationship with each other’ (Bernstein, 1971, p. 48). From an educational perspective, curriculum can be ‘[w]hat is intended to be taught and learned overall (the planned curriculum); what is taught (the curriculum as enacted); what is learned (the curriculum as experienced)’ (Alexander, 2010, p. 250). The sociological definition, the principles of the relationship between units of time, is succinct but does not account for the place of pedagogy, including the teacher–pupil interaction that is an essential part of curriculum. The educational definition, with its distinction made between curricula as intended, enacted and experienced, is helpful, although enactment and experience are so closely related that this makes the utility of this definition for analysis purposes difficult. As a result, the focus in this chapter is on curriculum as ‘planned human activity intended to achieve learning in formal educational settings’ (Wyse, Hayward, & Pandya, 2016, p. 4). Curriculum studies, as a field of the discipline of education, has established the importance of questions about knowledge in curricula. This importance is underlined by the idea that the specification of knowledge, realized through classroom activity, is a curriculum’s main function. As early as the 19th century, Herbert Spencer asked what knowledge is of most worth? Putting aside the dubious introductory comparisons with indigenous people’s customs at the beginning of Spencer’s book, it is remarkable, in view of the time of publication, to see Spencer’s explicit attention to rationales for knowledge in the curriculum: If there needs any further evidence of the rude, undeveloped character of our education, we have it in the fact that the comparative worths of different kinds of knowledge have been as yet scarcely even discussed – much less discussed in a methodic way with definite results. Not only is it that no standard of relative values has yet been agreed upon; [sic] but the existence of any such standard has not yet been conceived in any clear manner. And not only is it that the existence of any such standard has not been clearly conceived; but the

344

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

need for it seems to have been scarcely even felt. (Spencer, n.d. [1860, p. 11], semi colons and their spacing as in original text)

However, the contribution of curriculum studies has in the last decade or so been characterized as in ‘crisis’. The reasons for the crisis have at least three possible roots: 1 A perception of domination by ‘empiricists’ (who are typified as basing their claims for particular curricula on international testing and its analyses) and ‘post-conceptualists’ (with an emphasis on personalizing concepts derived from poststructuralism and German existentialism) at the expense of ‘traditionalist work based on close relations with teachers’ work and/or curriculum development’ (Hopman, 2010, online). 2 Theoretical neglect of access to knowledge as part of the curriculum. And more particularly, ‘the reluctance of curriculum theory, at least since Hirst and Peters (1970), to address epistemological issues concerning questions of the truth, and reliability of different forms of knowledge and how such issues have both philosophical and sociological dimension’ (Young, 2013, p. 103). 3 The crisis has also more recently been located in the nature of political control of the curriculum and, in particular, politicians’ perceptions of risk associated with international league tables of pupil testing, which have led to greater intervention in national curricula. The pressures caused by high-stakes testing have been seen as a recent manifestation of performativity (Wyse, Hayward, & Pandya, 2016).

In its original conception, the French philosopher Lyotard saw performativity as revolving around the perceived need for two sets of ‘skills’ (in relation to university education rather than schools and early years settings). One set of skills, he argued, was needed for selling on the world market through the prioritization of ‘particular subjects to support growth in demand for high and middle management executives; the other set of skills was those needed for maintenance of internal cohesion, a strategy that pushes out other aims of university education such as those built on emancipatory

narratives’ (Lyotard, 1984, p. 50). This performativity brought ‘inevitable disorders … in the curriculum, student supervision and testing, and pedagogy, not to mention its sociopolitical repercussions’ (ibid.). Lyotard concluded that in a performativity culture, educational institutions are therefore subordinated to the ruling powers. Although Lyotard’s analysis was applied to universities, we can see a possible link with schooling. The significance of selling on a world market can be seen in globalization trends (typified by global networks with growing influence on education policy; Ball, 2012); in the finance and influence of international comparative testing; and the associated influence of international testing on subjects in national curricula such as literacy and mathematics. Greater control over curricula, and increasingly pedagogy, can be used to maintain government ideology (Lyotard’s ‘internal cohesion’) at the expense of student, teacher, and local authority control of education (the emancipatory narrative). And the disorders in curriculum, and socio-political repercussions, can be seen in the criticisms evident in academic analyses, particularly from the qualitative comparative tradition (for a very strongly worded example, see Alexander’s (2011) paper, ‘Evidence, rhetoric and collateral damage: The problematic pursuit of “world class” standards’). The idea of ownership of the curriculum (Lyotard’s emancipatory narrative) is in some ways in tension with current emphases on knowledge in curricula (Young, 2008), and hence a central problem for the curriculum studies field. Johan Muller, who has worked with Michael Young, suggested a possible way forward: It seems that Wyse et al.’s view that knowledge is ‘both constructed and real’ (2014: 5) was right after all. Quite how to establish the reality of ‘powerful knowledge’ while acknowledging its social roots remains a challenge in 2014 as it was in Mannheim’s day. What is undeniably underway is a sort of rapprochement, but it remains a work in progress. (Muller, 2016, p. 103)

PRIMARY EDUCATION CURRICULA ACROSS THE WORLD

The idea of knowledge being both constructed and real was a reference to the editorial of the special issue of the BERA Curriculum Journal (Wyse, et  al., 2014), which highlighted Biesta’s 2014 award-winning paper addressing the idea of transactions, and the idea of knowledge being both constructed and real. In addition to recognizing the contribution made by Wyse et al., Muller’s suggestion about rapprochement was built on a series of other broad points. One part of his argument was a reminder of the significance of the work of Basil Bernstein, which he carried out while at London’s Institute of Education (since 2015 part of University College London – UCL). Bernstein was critical of some theories of cultural reproduction that had emerged from France. He argued that their conception of education as a ‘carrier of power relations’ such as ‘class, patriarchy, race’ (Bernstein, 1996, p. 4) resulted in a lack of attention to ‘internal analysis of the structure of the [pedagogic] discourse itself, and it is the structure of the discourse, the logic of this discourse, which provides the means whereby external power relations can be carried by it’ (ibid.).1 However, Bernstein may not have paid due attention to Lyotard’s depiction of the legitimation of education through performativity, which included the ‘transmission’ of an established body of knowledge. This led Lyotard to a series of pragmatic questions: ‘Who transmits learning? What is transmitted? To whom? Through what medium? In what form? With what effect?’ (1984, p. 48). It is true that Lyotard’s analysis portrayed education as a carrier of power relations, such as forces of performativity, yet Lyotard did appear to address some of the discourse features as well. For curriculum theorists, Bernstein’s distinction between classification as ‘the degree of boundary maintenance between contents’ (1971, p. 46) and framing as ‘the degree of control teacher and pupil possess over the selection, organization and pacing of the knowledge transmitted and received in the pedagogical relationship’ (ibid.) has been significant.

345

Where framing is strong, the transmitter has explicit control over selection, sequence, pacing, criteria and the social base. Where framing is weak, the acquirer has more apparent control (I want to stress apparent) over the communication and its social base. (Bernstein, 1996, p. 12)

For example, a subjects-based curriculum (which Bernstein called a ‘collection code’ curriculum) has strong classification, whereas a theme-based curriculum (an ‘integrated code’ curriculum) has weak classification. Bernstein identified ‘some reasons for a movement towards the institutionalizing of integrated codes of the weak classification and framing (teacher and taught above the level of the primary school)’ (Bernstein, 1971, p. 66, italics in original). In national curriculum terms, weak classification might be realized as a thematic curriculum (such as the International Baccalaureate (IB), or an aims-based curriculum; Reiss & White, 2013), and weak framing might entail strong control at the teacher– pupil level rather than the governmental level. Bernstein’s reasons for recommending moving towards an integrated code curriculum included recognition that: higher levels of thinking were increasingly differentiated, and that more flexibility was required in the labour force, hence students needed empowerment to pursue their interests within this wide range; and that there was a need for more egalitarian education, not least to make sense of major societal problems related to power and control. Bernstein’s thinking focused on ways to ‘declassify and so alter power structures and principles of control; in so doing to unfreeze the structuring of knowledge and to change the boundaries of consciousness’ (Bernstein, 1971, p. 67). He theorized that only a select few pupils/students are normally allowed access to ‘relaxed frames’, in other words, a state of empowerment for these pupils to ‘create endless new realities’ as part of the understanding that knowledge is permeable and provisional. Bernstein’s proposal seems to differ in important ways from Young’s proposals for implementation of a subject-based curriculum, as can be seen in some of Young’s work:

346

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

5.2. The relationship between a National Curriculum and the individual curricula of schools. A National Curriculum should limit itself to the key concepts of the core subjects and be designed in close collaboration with the subject specialists. This limit on National Curricula guarantees autonomy to individual schools and specialist subject teachers, and takes account of schools with different cultural and other resources, different histories and in different contexts (for example, schools in cities and rural areas). At the same time, it ensures a common knowledge base for all students when some may move from school to school. (Young, 2013, p. 110)

Young’s emphasis on ‘core subjects’ (in schools) and ‘subject specialists’ (presumably in universities) represents strong classification, and although governmental limits on the content specification of national curricula might enable ‘autonomy’ for individual schools, and might appear to represent weak framing, the pressure to ensure the place of the core-subjects (in systems where national testing is ‘high stakes’) is likely to be at the expense of both pupil and teacher autonomy and at the expense of subjects beyond the core (Wyse & Torrance, 2009).

themselves in it and practice it together. Competition becomes useful to those very ones who would at first think to see in it an obstacle to their interests. A wise and well-informed politician discovers in the development and prosperity of other nations a means of prosperity for his own country. (Fraser, 1964, p. 37)

Jullien’s emphasis on ‘science’, ‘competition’, ‘prosperity’, and the link with politics, has strong resonances with contemporary debates about international comparison of education systems. Jullien’s plan included specifications for surveying education systems in different countries. See, for example, some of Jullien’s categories of questions: FIRST SERIES [of questions] A. PRIMARY AND COMMON EDUCATION Primary schools or elementary and common Directors Students Physical education and gymnastics Moral and religious education Intellectual education and knowledge

INTERNATIONAL COMPARISON AND PRIMARY EDUCATION CURRICULA

Domestic and common education, as it is related to public education Primary and common education, as it is related to secondary education or to the second stage, or with the intentions of children

The beginning of international comparison of countries’ education systems and curricula is often attributed to Marc-Antoine Jullien (1775– 1848). In his ‘Plan and Preliminary View for a Work on Comparative Education’ (Fraser, 1964), originally published in French, Jullien argued for the importance of comparison of the education systems in Europe in order to improve them and contribute to societal avoidance of the horrors of future wars. When viewed in relation to current debates about international comparison of education systems, Jullien’s suggestions were visionary:

In category 6. Intellectual Education, Jullien’s suggested questions include these:

In order for educational science to keep up, spread, and perfect itself, it, like other sciences, requires many nations at the same time to interest

92. At what age are children usually taught to read, write, count, and what method is considered the easiest.

General considerations and various questions. (Fraser, 1964, p. 50)

91. How does one conduct from the cradle, the first education of senses and organs? With what objects is care taken to surround children, to exercise them to see, touch, hear, taste, feel? What are the first exercises of observation and language?

PRIMARY EDUCATION CURRICULA ACROSS THE WORLD

93. What are the aims of education which the children usually receive in primary school? (Does one limit oneself in the majority of these schools to reading, writing, arithmetic? Or does one also give a few elementary ideas of grammar, singing, geometrical drawing, geometry, and land surveying, applied mechanics, geography and history of the country, anatomy of the human body, practical hygiene, natural history applied to the study of land products most useful to men? All the elements of these sciences, as essentials to each individual in all conditions and circumstances of life, would seem to have to form a part of a complete system of primary and common education, perfectly appropriate to the true needs of man in our present state of civilization.) (Fraser, 1964, p. 63)

The appreciation of the risk of a preoccupation with a narrow curriculum of the basics of reading, writing and arithmetic (the 3 Rs) predates modern debates, and Jullien also raised the importance of the holistic aims of primary education. In more recent theoretical work, and in the qualitative tradition, various studies can be seen as arising out of Jullien’s foundations, though not necessarily related to his appeal to science. For example, using curriculum as the unit of analysis, Forestier and Adamson (2017) root their critique of PISA in Jullien’s ideas in their review of comparative studies. Their critique notes the dangers of a narrow focus on literacy and mathematics; the need for a holistic and contextual approach to comparison (such as that taken by Alexander (2000)); the need for an investigative orientation to comparison versus the evaluative and formative. However, although the paper is focused on curriculum, it does not cite key work in the curriculum studies field, preferring to draw on a wider comparative tradition. In another evaluation of the impact of PISA, Carvallo and Costa (2015) review six papers that summarized nations’ responses to PISA results as follows: in France the focus was on what is legitimate knowledge in the curriculum for the functioning of education; in Hungary the PISA findings were used as a master narrative for policy; in Francophone Belgium the programme contributed to moves from regulated to deregulated state; and in Portugal it was

347

used as a national evaluation tool, and to legitimate government policies. So, it can be seen that outcomes generated as a result of PISA are used to fulfil different purposes: to legitimize policy; manage the policy agenda; develop secondary research; and support development of domestic regulatory instruments. Perhaps the most significant study of primary education in the qualitative comparative tradition is the Five Cultures project (Alexander, 2000). The ambition of this study is clear from the title of the work, which sought to establish robust evidence about the culture and education systems of five nations. Indeed, it is made clear in the work that the context of the research was comparison of ‘five continents’. The main publication resulting from the work is very extensive, and impressively multi-layered account, addressing several significant levels of education systems, from documentary analysis of national policy down to observations of teacher–pupil interactions and classroom practice. The work is also explicit about the risks of generalization from such studies. However, having exhaustively and memorably portrayed the five countries, the book singles out one particular finding to conclude definitively that education in England is deficient in comparison with some of the other countries: Further, the prescribed English language curriculum for the primary stage – and indeed the secondary as well – makes far less of the development of ‘spoken’ language than do the equivalent statements in France and Russia. … This difference in emphasis is really quite striking, and it manifests itself in the characteristically episodic lesson structures with their relatively fast interactive and cognitive pace that I described earlier [in France and Russia]. The quality and power of children’s spoken language gain immeasurably from this approach, as one would expect. Further, there is no evidence that the development of children’s reading and writing are in any way disadvantaged as a result. On the contrary, the relationship and function of each seem to be better understood. (Alexander, 2000, p. 565)

This conclusion is preceded by an outline of the place and nature of oral language, or Speaking and Listening, in the different

348

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

versions of England’s national curriculum. Alexander appropriately, in our view, identifies a deprioritization of oral language in preference to written language over the different versions of England’s national curriculum since 1988, a view warranted by documentary analysis of national curriculum texts. The argument is that, with regard to oral language, the emphasis in the programmes of study was reduced between the period from 1988 (when England’s first national curriculum was established) to the national curriculum of the year 2000. Although Alexander’s original work could not cover this, the process of deprioritization of oral language was significantly accelerated in England’s national curriculum of 2014 (Wyse, Jones, Bradford, & Wolpert, 2018, in press). Evidence for Alexander’s conclusion can be seen as linked with his coverage of previously published empirical studies carried out in England, for example, to the work of Maurice Galton and his team in relation to teacher–pupil interaction. But if similar empirical work has been carried out in the other four countries in the Five Cultures study, this is not cited. Nor was any kind of systematic review of studies on oral language in different countries carried out, although this is understandable in view of the already largescale nature of the Five Cultures project. With regard to knowledge in the curriculum, language, particularly spoken language, is a good test of theories of curriculum, knowledge and comparative methodology because language is fundamental to all learning and, as a result, occupies a complex space in national curricula. It is known that oral language is fundamental to acquiring reading and writing. But mother tongue oral language is acquired naturally by nearly all children as a result of the innate characteristics of human beings, supported by the interaction of significant others in the child’s life, such as parents/guardians (Goswami, 2008). There are some facets of oral language that can be seen as ‘knowledge’ in relation to the definition offered at the beginning of this chapter, for example, the acquisition of vocabulary,

the special use of the voice in drama, sensitivity to formality in social contexts, but oral language does not sit easily in relation to Young’s idea of ‘a common knowledge base’, nor even to Bernstein’s idea of classification and the framing of subjects. This is because oral language is predominantly the vehicle by which the curriculum is delivered much more than an area of subject content. As such, there is an argument, contrary to Alexander’s, to suggest that oral language might not need the extent of prescribed curriculum content as reading and writing, while at the same time recognizing its fundamental link with literacy and the necessity for oral language to have an appropriate specification in curricula, not least in relation to standard forms of language such as standard English (but for the complexities of the development, including historical development, of standard English, see Wyse, 2017). Alexander continues with his main conclusion, about oral language: Close analysis of all the videotapes and transcripts from the Five Cultures project – which of course far exceed the few examples contained in Chapter 16 – force me unambiguously to the conclusion that in English primary classrooms, although much may be made of the importance of talk in learning, and a great deal of talking goes on, its function is seen as primarily social rather than cognitive, and as ‘helpful’ to learning rather than as fundamental to it. (Alexander, 2000, p. 566)

The conclusions from the Five Cultures study include a statement suggesting causation not just correlation: ‘The quality and power of children’s spoken language gain immeasurably from this approach, as one would expect’ (Alexander, 2000, p. 565). And, as was seen in the quote above, the conclusions include a generalization at the whole-country level: ‘in English primary classrooms, although much may be made of the importance of talk in learning, and a great deal of talking goes on, its function is seen as primarily social rather than cognitive, and as “helpful” to learning rather than as fundamental to it’ (2000, p. 566, underline added). In order to provide

PRIMARY EDUCATION CURRICULA ACROSS THE WORLD

evidence to support the first conclusion, ‘quality and power’ would need to be defined then a robust measurement of oral language established. For the second conclusion, first and foremost, the very broad and interpretable concepts of ‘social’ and ‘cognitive’ need to be clearly specified. Then, as the emphasis is on ‘perceptions’ of the function of oral language, a sufficient number of people reporting their perceptions would need to be established. What’s more, in the cases of both conclusions, the methodological requirements to establish causal and generalizable conclusions would need to apply in each of the countries that are part of the study. And hence this raises questions about the methodology of such studies. A key parameter for the Five Cultures study, and generally for quantitative and qualitative research comparing different countries, is sampling. The first sampling decision made was of course in relation to the five countries. Paradoxically, in view of Alexander’s criticisms of quantitative comparative work, the choice of countries is based first on numbers: five countries is better than two, it is argued, because it avoids a tendency to polarization. Five is better than three because this avoids the ‘Goldilocks effect’ (2000, p. 44). Ultimately, though, the selection of countries was made because ‘the countries offer similarities, contrasts, and intriguing connections’ (2000, p. 44). But surely any selection of countries, from two countries to 33 countries or more, would offer these connections? Perhaps the selection is most accurately described as ‘a convenience sample’ or ‘a non-probability sample’, a sampling choice that can be argued to be acceptable, but this is not stated explicitly, nor are the specific reasons why, for example, India not Pakistan (in relation to Britain’s colonial past), Russia not China (in relation to globalization, colonial legacy and evolution). If there were particular contacts and/or networks, or other practical reasons, that facilitated the selection of countries, it would have been helpful to know these. The argument that comparison of education systems requires inclusion of observations of

349

classroom practice is well made in Culture and Pedagogy, but the sampling of schools and classrooms in the Five Cultures study is not entirely clear in some respects. It appears that 30 schools divided between the five countries in the study were the basis for the comparison of lessons: ‘The Five Cultures data include material from fieldwork in thirty school buildings in five countries and from an additional sixty or so English schools [data from previous studies by Alexander]’ (Alexander, 2000, p. 177). The argument is made that by analysing schools, national policy and histories of education in the different countries, the problems of generalizing from a small sample can be avoided. But in order to make a robust claim about perceptions of talk as primarily social, or primarily cognitive, in relation to a whole country many would argue that this would require a nationally representative sample of people to give their perceptions. Although transcriptions of teacher–pupil interaction are a well-regarded means to assess one important aspect of classroom pedagogy (and too often not part of the methodology of studies comparing education systems), the numbers of teachers (and pupils) involved in the sampling is again important. The number of teaching sessions that were observed, annotated and recorded was as follows: France 20, India 19, Russia 33, United States 19, England 75 (60 from preceding projects together with 15 Five Cultures updates). This gave a total data set of 166 lessons. Of these, 36 (six to nine from each country) were selected for transcription and close scrutiny … although any logistical generalisations below are based on the full range from each country. (Alexander, 2000, p. 276)

Apart from the lack of contextual background to the chosen lessons (such as teachers’ gender, experience, languages spoken, qualification level, or class sizes, etc.), the sampling raises questions about the extent to which generalization about emphasis on oral language in the different countries can be supported by the data. There is no information about how the selection was made for ‘close scrutiny’, nor about data analysis

350

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

techniques used to arrive at the findings. For example, if ‘the full range’ of videos of lessons was used, how did the analysis account for transcribed lessons versus those that remained only in video form? Or, how was ‘fast pacing’ defined and measured? This is a theme discussed at greater length by Praetorius et al. (this volume). An alternative approach to the methodology could have included a more explicit account of the rationale for sampling. For example, in relation to the sampling of countries, a different approach could have included more careful matching of countries by population size, main language spoken, and geography in order to sharpen the comparison of policies and classroom practice. In relation to the sampling of schools, these might have been selected using a form of stratified random sampling in order to mitigate some threats to validity that are part of other means of selection, and to try and match some of the school contexts across different countries. And, as we argue at the end of this chapter, a more radical approach could have been to build on evidence from large-scale international comparative quantitative work with in-depth qualitative work, or the reverse. Important findings from in-depth analysis of the contexts and cultures in different countries in comparison with other countries have been established. However, the challenge of generalization remains, not least when seeking to establish an evidence base for the development of national curricula. Having examined a particularly notable qualitative study, we now turn to quantitative work that compares countries and curricula.

QUANTITATIVE COMPARISON OF THE FIVE COUNTRIES As documented extensively elsewhere, including in this volume, internationally comparative quantitative analysis in education has grown rapidly over recent years, particularly

because of the increasing avail­ability of datasets designed explicitly for this purpose. The most prominent of these are the OECD’s Programme for International Student Assessment (PISA), Trends in International Mathematics and Science Studies (TIMSS) and Progress in International Reading Literacy Studies (PISA). Although these studies are primarily focused on student achievement (albeit in different ways), and have their flaws, we believe they have the potential to contribute to an understanding of differences between primary curricula across the world. Although, as will be seen, direct comparison of the place of oral language in different curriculum is not possible using the quantitative datasets, we carry out some illustrative comparison using the countries included in the Five Cultures study. It is worth noting at the outset that one of the benefits of quantitative enquiry is to allow for analysis of a wider range of countries than is feasible through qualitative work. Nevertheless, decisions to compare a tightly-defined set of countries available in internationally comparative data are common, for example on comparing economically-developed Anglophone countries (e.g., Washbrook et  al., 2012; Jerrim et al., 2016) because of the cultural elements, particularly language, that such countries share. As language or languages are such an important cultural element of any country or region, and such an important part of the curriculum at primary education level, the comparison of countries with the same dominant language can allow for more meaningful comparison. Such decisions will, of course, depend on the specific research question. All of the countries featured in the Five Cultures studies have participated in PIRLS or TIMSS in recent years with the exception of India.2 This presents an opportunity to explore primary curricula in these countries that will complement analysis such as that discussed above, through use of quantitative data collected as part of these studies since 2003. The most prominent element of the largescale quantitative studies is the overall

351

PRIMARY EDUCATION CURRICULA ACROSS THE WORLD

Table 19.1: 

Extract from PIRLS 2016 Reading Achievement Distribution

Country

Rank

Russian Federation England United States France

1 10 14 33

Reading Score 581 559 549 511

Standard Error (SE) (2.2)↑ (1.9)↑ (3.1)↑ (2.2)↑

Source: IEA (2016a). Reading scores are relative to PIRLS Scale Centrepoint of 500, located at the mean of the combined international achievement distribution, and scaled such that 100 points corresponds to the standard deviation of the distribution.

country rankings that are produced each time the results are published. As we demonstrated earlier, these publications have increasingly attracted different kinds of responses from governments. With regard to the four Five Cultures studies included in PIRLS 2016, their rank order is shown in Table 19.1. As you will recall, Alexander argued that in Russia and in France, The quality and power of children’s spoken language gain immeasurably from [the episodic lesson structures with their relatively fast interactive and cognitive pace], as one would expect. Further, there is no evidence that the development of children’s reading and writing are in any way disadvantaged as a result. On the contrary, the relationship and function of each seem to be better understood. (Alexander, 2000, p. 565, italics added)

The PIRLS data have some relevance to Alexander’s comparisons of the countries. Although in 2016 Russian students performed well in the reading assessments, students in France did not. And in spite of the neglect of oral language, England’s ranking improved in comparison to the previous PIRLS assessment round in 2011. At the very least, the PIRLS data suggest that whether reading is disadvantaged (or advantaged) as a result of the approach to oral language in the curriculum, as Alexander claimed, is a moot point. As with the critique of the qualitative methodology that we offered above, there are issues about the methodology of PIRLS that need to be considered. For example, questions are often raised about the statistical significance of any comparison between two or more countries. The PIRLS reports address this, and in relation to the four countries, the relative positions

are all statistically significant (IEA, TIMSS, & PIRLS International Study Centre, 2016b). In addition, the upward arrows in Table 19.1 indicate that the countries were all assessed as above the ‘centrepoint of the PIRLS scale’, which was located in 2001 at 500, the mean of the combined achievement distribution. Another important methodological issue is the validity and reliability of the test to measure reading ability. Comparability across languages is carefully addressed by the designers of PIRLS, but many would argue that being a reader in the fullest sense is a culturally-specific activity. Another common criticism is that due to the nature of paper-based tests, they tend towards low-level, short-answer questions which are not able to evaluate more sophisticated forms of comprehension. It is true that the test format means that answers are in the main short, but in addition to multiple choice answers they do include the requirements for students to write some answers in open space formats. And the questions that appear later in the tests are designed to assess aspects such as ‘Process: Interpret and Integrate Ideas and Information’. One example of this in the 2016 tests was for students being tested to explain, in the story to be comprehended, the significance of the turn of phrase ‘at the top of the pecking order’ (IEA, TIMSS, & PIRLS International Study Centre, 2016c, p.  353) in relation to the main character Macy’s clever plan to save the hen from predators. In addition to the quantitative test data, and the extensive methodological publications, there are further sources of data that could be used for comparison of different countries. In each cycle, PIRLS/TIMSS country administrators appointed by the international

352

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

organizers are asked to complete a questionnaire regarding the curriculum arrangements in their jurisdiction (for example, for TIMSS 2011, Mullis et al., 2012a; and for PIRLS 2011, Mullis et al., 2012b). Similarly, in each cycle, questionnaires are completed by school leaders and teachers in the stratified random sample of schools in which pupils take the PIRLS and TIMSS tests (to which we return below). The national-level data include a core of questions on the nature of the curriculum that have remained constant, or at least very similar, across the years, capturing: the existence of a national curriculum; specificity of the curriculum, e.g., goals, processes, or materials; the form in which the curriculum is made available; whether the curriculum prescribes the percentage of instructional time devoted to this part of the curriculum; and the way in which the curriculum implementation is evaluated. These concepts link, albeit imperfectly, with theoretical constructs underpinning curricula, as discussed above. This allows for the potential of comparing how these differ between the five countries at any given point in time, along with the extent to which we observe differential changes in these countries over the successive rounds of TIMSS (every three years) and PIRLS (every five years). It is important to be aware of the imperfect nature of these measures and the way in which they are collected. For example, it is difficult for a national administrator to provide a consistent report for their jurisdiction where policy is highly federal; similarly, country administrators of TIMSS and PIRLS change over time and, even where the organization does not change, personnel changes could introduce differences in question interpretation. Nevertheless, we argue that national-level comparative data on curriculum arrangements bring some advantages (as well as disadvantages) compared to work that has attempted to consider country-level policy using data collected from school leaders (e.g., OECD, 2011) or from more ad-hoc consideration of national

governments’ policy documents. Given the complexity of the concepts under consideration, it is a distinct advantage to use this deliberately internationally-comparative data in which a common language (English) is used, aiding the comparability of responses to the questions posed to country administrators. However, we should not overlook the continued presence of some degree of translation-interpretation issues when comparing between countries with differing languages or understandings of specific concepts (which could, of course, differ between countries sharing a language). A noticeable trend in critiques of international comparative work has included questions raised about the comparison of countries where different dominant languages are present (e.g., Hilton, 2006). The issue of lack of attention to oral language (particularly in relation to England when compared with the four other countries) is also the case in the information that is available through quantitative international comparisons. Oral language is left out of the scope of PIRLS (as suggested by its name as the ‘Progress in International Reading Literacy Study), potentially because it is a difficult, time-consuming and costly aspect to assess, compared with the testing of reading, for example. In the absence of specific data about oral language, in order to explore a specific area of the curriculum we have selected reading. Reading as an area in the curriculum does have significant connections with oral language because oral language is the basis for the development of literacy. What is more, reading for pleasure is an aspect that is individual to the learner but also a concern of teachers and school systems, just as oral language is. Table 19.2 finds a decrease in the proportion of these countries who report a major emphasis on reading for pleasure in their primary reading curriculum (from more than half to below 40%), along with an increase in the proportion of countries reporting little or

353

PRIMARY EDUCATION CURRICULA ACROSS THE WORLD

Table 19.2  Percent of PIRLS 2006 and PIRLS 2011 countries reporting levels of emphasis on reading for pleasure Emphasis on reading for pleasure Total percent little or no emphasis some emphasis major emphasis

2006

2011

100.00 9.1 33.3 57.6

100.00 15.2 45.5 39.4

Notes: Data from curriculum questionnaires completed by countries participating in PIRLS 2006 and PIRLS 2011. N=33. Reporting column percentages.

Table 19.3  Transition matrix of PIRLS 2006 and PIRLS 2011 of levels of emphasis on reading for pleasure Emphasis on reading for pleasure

Total

Year 2011 little or no emphasis

Year 2006 Total little or no emphasis some emphasis major emphasis

100 100 100 100

15.2 0 18.2 15.8

some emphasis 45.5 100 45.5 36.8

major emphasis 39.4 0 36.4 47.4

Notes: Data from curriculum questionnaires completed by countries participating in PIRLS 2006 and PIRLS 2011. N=33. Reporting row percentages.

no emphasis on this aspect (from under 10% to more than 15%); taken together, these suggest a reduction in the emphasis on reading for pleasure in primary curricula across the world (to the extent that it is well represented by the available countries). We may consider the dynamics underlying these aggregate trends with a transition matrix (Table 19.3), which reports the percentage of those countries that reported a level of emphasis on reading for pleasure in the PIRLS surveys of 2011 compared with the level in 2006. While we reiterate the caveats that we emphasise above regarding the interpretation of these data, we think this is a meaningful basis for further investigation of trends that may otherwise go unnoticed. For example, none of the countries which said they had ‘little or no emphasis’ on reading for pleasure in their primary curriculum in 2006 still report that this is the case by 2011. This raises the question of whether such a phenomenon might represent a significant change

in curriculum development internationally. However, since the group reporting ‘little or no emphasis’ in 2006 only represents three countries (n=33), we should not simply draw this as a conclusion. Instead, it raises a hypothesis that could be explored with (a) subsequent rounds of data, in which we can see what becomes of the countries that have moved into this category in 2011, and/or (b) case studies on the dynamics of this shift in policy in these three countries. The question about reading for pleasure also prompts questions about the extent to which reading for pleasure is a focus for classroom interaction in classrooms, research questions that are amendable to qualitative and quantitative methods (e.g., pupil surveys). The focus on a specific area of the curriculum, the teaching of reading, can be extended to the education system level by looking at another area covered in the questionnaires: the use of inspection. This is linked with performativity, and hence control exerted on

354

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Table 19.4  Percentage of PIRLS 2006 and PIRLS 2011 countries where inspection is used to assess implementation of primary reading curriculum at each time point Inspection Total percent not used to assess reading used to assess reading

2006

2011

100 36.4 63.6

100 27.3 72.7

Notes: Data from curriculum questionnaires completed by countries participating in PIRLS 2006 and PIRLS 2011. N=33. Reporting column percentages.

Table 19.5  Transition matrix of PIRLS 2006 and PIRLS 2011 of inspection use to assess implementation of primary reading curriculum Inspection use to assess reading curriculum 2006

2011 Total percent

not used

used

Total

100

27.3

72.7

not used

100

58.3

41.7

used

100

9.5

90.5

Notes: Data from curriculum questionnaires completed by countries participating in PIRLS 2006 and PIRLS 2011. N=33. Reporting row percentages.

schools by such mechanisms as those discussed earlier in this chapter. Specifically, we analyse the changes in the proportion of countries reporting that inspection is used to assess implementation of primary reading curricula observed between the 2006 and 2011 rounds. For the purposes of illustration, we use the full sample available across both rounds, rather than restricting to the five countries discussed above; however, in using such analyses to address research questions, the issue of an appropriate sample of countries would be important to consider. The cross-sectional percentages reporting that inspection is used to this end in their country are reported in Table 19.4, providing illustration of the prevalence of this practice at each time point and suggesting an aggregate increase in the proportion using inspection among this sample. Again, more information on the changes underlying these aggregate shifts may be explored using a transition matrix (Table 19.5). This reports the percentage of those who did not use inspection in 2006 who have (a) continued not to do so and (b) started doing so, along with the percentage of those who did

use inspection in 2006 who have (c) stopped doing so and (d) continued to do so. This analysis of changes suggests that the overall increase in the proportion of sampled countries using inspection to assess implementation of primary reading curricula is not a one-way street. While some countries that weren’t using inspection in 2006 were doing so by 2011 (42%), it is also the case that a proportion of those who were using inspection in 2006 reported that they had stopped doing so by 2011 (just under 10%). While we again emphasize the limitations inherent in these data, differential change of this kind raises hypotheses that education policy may not be best characterized by simple policy convergence (Bieber, 2016) but, rather, may be characterized by more complex dynamics that may be of further interest to explore (Jakobi & Teltemann, 2011). It is also possible to use these data to explore curriculum implementation. TIMSS (but not PIRLS) asks questions on which areas the curriculum covers, not only at the national level, but also to teachers of pupils participating in the associated attainment

PRIMARY EDUCATION CURRICULA ACROSS THE WORLD

tests. These questions link directly to the areas tested as part of the TIMSS attainment tests. Adopting the definition of curriculum in three elements – an intended curriculum (at the national level), an implemented curriculum (what is taught in classrooms) and the attained curriculum (what students learn) (Mullis et  al., 2009) – the presence of data on these issues at all three levels potentially allows for the exploration of relationships between these levels. We acknowledge, however, that these questions have been less stable over time, which poses significant challenges for analyses using these data (Suter, 2017). The school- and teacher-level data in TIMSS also provide important details of contextual factors at each of these levels. At school level, these include details of school size, socioeconomic context, and management practices. At the teacher level, these include details of teacher age, experience, qualification, and job satisfaction. In addition, the TIMSS teacher questionnaire includes reports of the areas of the curriculum they implement, corresponding to the questions asked in the national curriculum questionnaire. These contextual factors would have been relevant to the system level analyses in the Five Cultures study. In this section, we have highlighted some potential uses of quantitative data and analysis to explore cross-national differences in primary curricula. This approach does not capture the same richness as in-depth qualitative inquiry discussed above; we do not pretend otherwise. Nevertheless, it can provide opportunities for developing unexplored hypotheses, additional insights, and valuable context (for example, from more representative samples and/or wider ranges of countries) than is feasible when conducting in-depth qualitative work alone.

DISCUSSION AND CONCLUSIONS Most research that engages with curriculum as an object of study is either explicitly located in

355

theories of knowledge, or links implicitly with theories of knowledge through its selection of curriculum elements for focus, for example, the differing emphases on oral language versus written language reviewed in this chapter. Our analysis of qualitative and quantitative methods to examine national curricula in primary education suggests that there may be a way forward in relation to Muller’s question about knowledge that is both ‘constructed’ and ‘real’. Rapprochement in curriculum studies may be extended through methodologies that seek to constructively use large-scale datasets in comparison with in-depth empirical enquiry within countries. With regard to knowledge in the curriculum, powerful or otherwise, rapprochement may also lie in the ways in which both so-called powerful and non-powerful aspects might be represented, balanced and enacted as a result of curriculum specifications. One of the historical features of comparative work that Jullien identified is in need of renewed attention through international comparative work. Aims for education are fundamental to any nation’s education system, and as a result require value judgements to be made in the context of degrees of democratic involvement of stakeholders. The question of whether performativity pressures are leading to homogenization of national curriculum aims is an important one because, if found to be the case, then it is possible that the democratic involvement of countries’ citizens in the development of national aims is being replaced by un-critical acceptance that patterns of similar aims in jurisdictions scoring highly in international comparative testing is a sufficient warrant for such aims. The methodology of qualitative comparisons allows for significant depth to include some of the daily interactions of pupils and teachers as their curricula are enacted. There is also a significant tradition of critical attention to national curriculum texts on the basis of their inherent logic, or lack of logic, and the complex differences between texts and the practice in schools. Yet the problem of generalization remains a real one for

356

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

qualitative work, in spite of the significant depth and theorization that is a feature of the best work. The methodology of quantitative comparisons allows for significant breadth. This breadth, including large-scale testing of pupil achievement, does allow for generalizations on the basis of statistical analyses. However, too often the careful caveats expressed by the authors of these large-scale studies are ignored by politicians as they seek to justify their policies on the basis of correlations between a country’s position in world rankings and selected policies. Inevitably, the breadth results in lack of depth in some areas, for example a lack of attention to oral language, and to some of the many cultural, political and historical aspects that are a defining feature of work in the qualitative traditions. In conclusion, on the basis of our analysis of the methodology of some of the most notable qualitative and quantitative work, the limitations of both strongly suggest the need for studies that combine quantitative and qualitative comparative methodology. Our work in this chapter suggests that oral language should be attended to in large-scale comparisons. This would represent using the findings from in-depth qualitative work, such as the Five Cultures study, as the basis for building change to large-scale quantitative work. It would also be possible to build change on the basis of some of the findings of quantitative comparison by undertaking in-depth enquiry, for example to examine the validity of claims about changes in motivation for reading. These kinds of mixed methods combinations have great potential to add original findings to the considerable findings already established separately through the different traditions. The urgency for this kind of new work on primary education curricula is perhaps most starkly underlined in the worryingly unequal access to high-quality primary education, including the lack of evidenceinformed exemplary curriculum, pedagogy and assessment, that is a feature of so many of the education systems of the world today.

Notes 1  It is important to note here that Bernstein’s interest in the pedagogic discourse of curriculum was distinct from his preoccupation with the language of pupils, in particular his dubious articulation of restricted and elaborated code. 2 Part of India, Himachal Pradesh, participated in 2009 PISA and Tamil Nadu & Himachal Pradesh participated in 2015 PISA, however we do not believe either participated in PIRLS or TIMSS.”

REFERENCES Alexander, R. (2000). Culture and Pedagogy: International Comparisons in Primary Education. Oxford: Blackwell. Alexander, R. (2011). Evidence, rhetoric and collateral damage: The problematic pursuit of ‘world class’ standards. Cambridge Journal of Education, 41(3), 265–286. Alexander, R. (Ed.) (2010). Children, their World, their Education. Final report and recommendations of the Cambridge Primary Review. London: Routledge. Ball, S. (2012). Global Education Inc.: New Policy Networks and the Neo-Liberal Imaginary. London: Routledge. Bernstein, B. (1971). Class, Codes and Control. Volume 1: Theoretical Studies towards a Sociology of Language. London: Routledge and Kegan Paul. Bernstein, B. (1996). Pedagogy, Symbolic Control and Identity. London: Taylor & Francis. Bieber, T. (2016). Soft Governance, International Organizations and Education Policy Convergence. London: Palgrave Macmillan. Biesta, G. (2014). Pragmatising the curriculum: Bringing knowledge back into the curriculum conversation, but via pragmatism. The Curriculum Journal, 25(1), 29–49. Carvalho, L. M., & Costa, E. (2015). Seeing education with one’s own eyes and through PISA lenses: Considerations of the reception of PISA in European countries. Discourse: Studies in the Cultural Politics of Education, 36(5), 638–646. Forestier, K., & Adamson, B. (2017). A critique of PISA and what Jullien’s plan might offer. Compare: A Journal of Comparative and International Education, 47(3), 359–373.

PRIMARY EDUCATION CURRICULA ACROSS THE WORLD

Fraser, S. (1964). Jullien’s Plan for Comparative Education 1816–1817. New York: Teachers College, Columbia University, Bureau of Publications. Goswami, U. (2008). Cognitive Development: The Learning Brain. Hove: Psychology Press. Hilton, M. (2006). Measuring standards in primary English: Issues of validity and accountability with respect to PIRLS and National Curriculum test scores. British Educational Research Journal, 32(6), 817–837. Hopman, S. (2010). When the Battle’s Lost and Won: Some Observations concerning ‘Whatever Happened to Curriculum Theory’. Stirling, Scotland: University of Stirling. IEA, TIMSS, & PIRLS International Study Centre (2016a). Student Achievement: Distribution of Reading Achievement. Chestnut Hill, MA: IEA, TIMSS, & PIRLS International Study Centre. Retrieved from: http://timssandpirls.bc.edu/ pirls2016/international-results/pirls/studentachievement/pirls-achievement-results/ IEA, TIMSS, & PIRLS International Study Centre (2016b). Student Achievement: Multiple Comparisons of Average Reading Achievement. Chestnut Hill, MA: IEA, TIMSS, & PIRLS International Study Centre. Retrieved from: http:// timssandpirls.bc.edu/pirls2016/internationalresults/pirls/student-achievement/multiplecomparisons-of-reading-achievement/ IEA, TIMSS, & PIRLS International Study Centre (2016c). Appendix H: Restricted Use Passages, Questions, and Scoring Guides. Chestnut Hill, MA: IEA, TIMSS, & PIRLS International Study Centre. Jakobi, A., & Teltemann, J. (2011). Convergence in education policy? A quantitative analysis of policy change and stability in OECD countries. Compare: A Journal of Comparative and International Education, 41(5), 579–595, doi: 10.1080/03057925.2011.566442 Jerrim, J., Parker, P., Chmielewski, A. K., & Anders, J. (2016). Private schooling, educational transitions and early labour market outcomes: Evidence from three Anglophone countries. European Sociological Review, 32(1), 280–294. Lyotard, J.-F. (1984). The Postmodern Condition: A Report on Knowledge (G. Bennington & B. Massumi, Trans.). Manchester: Manchester University Press.

357

Muller, J. (2016). Knowledge and the curriculum in the sociology of knowledge. In D. Wyse, L. Hayward, & J. Pandya (Eds.), The SAGE Handbook of Curriculum, Pedagogy and Assessment. London: Sage. Mullis, I. V. S., Martin, M. O., Ruddock, G. J., O’Sullivan, C. Y., & Preuschoff, C. (2009). TIMSS 2011 Assessment Frameworks. Chestnut Hill, MA: TIMSS & PIRLS International Study Centre, Boston College. Mullis, I. V. S., Martin, M. O., Minnich, C. A., Stanco, G. M., Arora, A., Centurino, V. A. S., & Castle, C. E. (2012a). TIMSS 2011 Encyclopedia: Education Policy and Curriculum in Mathematics and Science (Vols 1 and 2). Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. Mullis, I. V. S., Martin, M. O., Minnich, C. A., Drucker, K. T., & Ragan, M. A. (2012b). PIRLS 2011 Encyclopedia: Education Policy and Curriculum in Reading (Vols 1 and 2). Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. OECD (2011). School Autonomy and Accountability: Are They Related to Student Performance? PISA in Focus. Paris: OECD Publishing. Reiss, M., & White, J. (2013). An Aims-Based Curriculum: The Significance of Human Flourishing for Schools. London: IOE Press. Spencer, H. (n.d. [1860]). Education: Intellectual, Moral and Physical. New York: Hurst & Company. Suter, L. E. (2017). How International Studies contributed to educational theory and methods through measurement of opportunity to learn mathematics. Journal of Comparative and International Education, 12, 1–24. UNESCO (2017). Sustainable Development Goals. Paris: UNESCO. Available from: www. un.org/sustainabledevelopment/education/ Washbrook E., Waldfogel, J., Bradbury, B., Corak, M., & Ghanghro, A. (2012). The development of young children of immigrants in Australia, Canada, the United Kingdom and the United States. Child Development, 83, 1591–1607. Wyse, D., & Torrance, H. (2009). The development and consequences of national curriculum assessment for primary education in England. Educational Research, 51(2), 213–228.

358

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Wyse, D., Hayward, L., Higgins, S., & Livingston, K. (2014). Editorial: Creating curricula: aims, knowledge, and control, a special edition of the Curriculum Journal. The Curriculum Journal, 25(1), 2–6. Wyse, D. (2017). How Writing Works: From the Invention of the Alphabet to the Rise of Social Media. Cambridge: Cambridge University Press. Wyse, D., Hayward, L., & Pandya, J. (2016). Introduction: Curriculum and its message systems – from crisis to rapprochement. In D. Wyse, L. Hayward, & J. Pandya (Eds.), The

SAGE Handbook of Curriculum, Pedagogy and Assessment. London: Sage. Wyse, D., Jones, R., Bradford, H., & Wolpert, M. A. (2018). Teaching English, Language and Literacy (4th ed.). London: Routledge. Young, M. (2008). Bringing Knowledge Back In: From Social Constructivism to Social Realism in the Sociology of Education. London: Routledge. Young, M. (2013). Overcoming the crisis in curriculum theory: A knowledge-based approach. Journal of Curriculum Studies, 45(2), 101–118.

20 Outside-School-Time Activities and Shadow Education Siyuan Feng and Mark Bray

INTRODUCTION Especially during school vacations, but even during term time, children and youths spend more time outside classrooms than inside them. In many societies, significant proportions of this time are devoted to organized activities that supplement and complement schooling. The possibilities for learning are increased by the availability of libraries, museums, and after-school study programs. Some outside-school-time (OST) activities are commonly called shadow education on the grounds that they mimic school curricula (Aurini, Davies, & Dierkes, 2013; Bray, 1999, 2009). Other forms elaborate on school curricula and/or operate in free-standing ways, such as for religious instruction (OECD, 2011; Suter, 2016). Considerable variations are evident around the world in the scale, nature and impact of these activities. Becher and Trowler (2001) used the metaphor of tribes and territories to describe academic communities and their research foci,

and the metaphor is as applicable to OST research as to other fields. Although OST is a widely accepted concept which receives research attention, other vocabularies are also used with different implications for different groups of researchers within and across countries. The terms bring their own conceptual orientations and practical implications. OST is a common term in the United States, where the origin of research interest in the field concerns criminal and other undesirable activities of juveniles who have excessive free time. The US federal government investment in OST activities increased significantly after the mid-1990s (Afterschool Alliance, 2015, p. 1), and the foci for many researchers evolved from prevention of passive consequences from unattended OST to evaluation of the outcomes of structured OST programs (Mahoney, Larson, Eccles, & Lord, 2005; Mahoney, Vandell, Simpkins, & Zarrett, 2009). The term ‘extracurricular’ is also in widespread use. Compared to OST, which

360

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

encompasses the broadest range of activities outside regular school hours, the concept of extracurricular activities implies some links to the main course of study. An alternative vocabulary is ‘co-curricular’, which implies an even tighter link to regular school programs. Often these activities are conducted during regular school hours, but not always. Singapore’s Ministry of Education views co-curricular activities as an integral part of holistic education, and requires every secondary student to take part in co-curricular activities organized by schools or some other approved bodies (Singapore, Ministry of Education, 2014). In a very different context, in Ethiopia, co-curricular activities have been promoted not only by the government but also by international agencies (UNICEF, 2014). In the United States, the concept of OST activities also embraces supplemental education. In the early 2000s, the No Child Left Behind (NCLB) policy of the Bush administration propelled development of after-school programs to improve students’ test performance (Zimmer, Hamilton, & Christina, 2010). The US Department of Education (2012) has defined supplemental educational services as: Free extra academic help, such as tutoring or remedial help, that is provided to students in subjects such as reading, language arts, and math. This extra help can be provided before or after school, on weekends, or in the summer.

In other parts of the world, concepts of supplemental education may have different meanings, and in particular may not be free of charge (see e.g. Harnisch, 1994; Mori & Baker, 2010). This observation underlines the need for researchers investigating concepts across national and cultural boundaries to check the ways that terms are used in different settings. In the United States, much literature has focused on non-profit programs organized by public or non-­ governmental bodies, and academic programmes provided by profit-making commercial agencies have rarely been included in studies under the OST label.

In Asian and other societies in which large proportions of students receive tutoring from private providers, OST research can rarely ignore the shadow forms of private supplementary tutoring (Bray & Lykins, 2012). In contrast to the US Department of Education’s definition of supplemental education, private supplementary tutoring is usually defined to encompass fee-charging extra academic help, for remedial and non-remedial purposes, provided in and beyond school academic subjects during the outside-school time. Another group of researchers, most of whom are based in Europe, uses the term ‘extended education’ and views OST activities as an ‘extension or supplementation of traditional educational institutions’ (Stecher et  al., 2013, p. 3). The notion of extended education as defined by this group includes after-school programmes, all-day schools (e.g. in Germany), and private supplementary tutoring (BöhmKasper, Dizinger, & Gausling, 2016; Stecher & Maschke, 2013; Stecher et al., 2013). The above-mentioned list of terms addresses only those in the English language, and other languages bring further nuances and ambiguities (Bray, Kwo, & Jokić, 2015, pp. 4–6). These are challenges for comparative education researchers but also opportunities because the terms reflect different contexts, conceptualizations and communities. With such factors in mind, the following sections commence with literature on the broad concept of OST activities before turning to the literature on shadow education that has developed more recently and to some extent in parallel. To fit the purposes of this Handbook, each section comments on the challenges for research.

THE NATURE AND IMPLICATIONS OF OST EDUCATIONAL ACTIVITIES Definitions and Scope This section of the chapter follows the definition of OST activities given by Noam and Shah (2014, pp. 200–201), namely:

Outside-School-Time Activities and Shadow Education

programs that offer activities that may or may not align with school curricula; that focus on youth development and enriching learning activities; and that can take place in a school setting, local community center, or museum on weekdays, weekends or during the summer.

This definition is broad not only in the content but also the location of activities. In addition to considering community centres and museums, it includes activities that are outside school time but nevertheless on school premises. It does not seem to consider commercial enterprises or homes, which may be the location of much shadow education. The definition explicitly includes the summer, which means the long school vacation for many students in countries that have such a season. By implication, it can also include other school vacations and public holidays. Also, by implied definition, the duration of outside school time is determined by the duration of inside school time. A 2011 survey by the Organisation for Economic Co-operation and Development (OECD) showed that Polish students aged 12–14 were expected to spend fewer than 5,000 hours at school, while Italian counterparts spent over 8,000 hours. Banks et  al. (2007, p. 9) suggested that school students in the United States spent only 18.5% of their waking hours in formal learning environments; but Taiwanese high school students may spend an average of 9.5 hours at school during term-time weekdays, thus comprising a much higher proportion (Liang, 2016). Again, however, the boundaries of outside school time and inside school time may be blurred by regulations on what activities are permitted on school premises. Making another forward link to the discussion on shadow education, in many countries public-school teachers provide private supplementary lessons to their own students. In most countries, teachers are forbidden to provide this instruction on school premises, but in Cambodia, for example, it is a standard practice (Bray, Kobakhidze, Liu, & Zhang, 2016). Concerning content of the OST activities, students who spend most of their outside

361

school time enjoying leisure activities may have different developmental outcomes from those who spend most of their time in extracurricular academic training (Mahoney et al., 2005). The types of activities of children and adolescents are also influenced by cultural factors. Larson and Verma (1999) noted that East Asian students are more likely to spend outside school time in activities that assist school performance, while North American adolescents are more likely to engage in leisure activities.

OST Activities and Personal Development The above-cited definition by Noam and Shah (2014) focused on programs. By implication, these are structured, and indeed that is made explicit in a related definition by Little, Wimer, and Weiss (2008, p. 2), who added the phrase that the structured programmes should provide ‘supervised activities intentionally designed’ for student learning and development. By contrast, many OST activities are loosely organized or self-directed. These activities include casual socializing, play, games, and internet surfing; instrumental activities, such as household chores and part-time paid employment; developmental activities, such as completing homework and developing hobbies and interests; and self-directed informal learning (Kleiber & Powell, 2005). A few empirical studies have compared structured and unstructured OST activities. In terms of problem-behaviour prevention, studies by Mahoney and colleagues suggested that structured OST activities were usually more effective than unstructured ones (see also Mahoney, Schweder, & Stattin, 2002). Similarly, research on the academic effectiveness of OST activities has suggested that structured program are usually more important to students’ learning and developmental outcomes. In a study of students in Grades 6–12 in the United States, Cooper, Valentine, Nye, and Lindsay (1999) reported that higher academic

362

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

performance was positively associated with more time in structured after-school and extracurricular activities and negatively associated with unstructured television viewing. Some OST program aim to reduce the negative impacts of unsupervised activities (Mahoney et  al., 2005). Juvenile criminologists Farrington and Welsh (2008, p. 142) examined OST pro-social opportunities for young people which reduced their exposure to possible negative influence from peers. Much of course depends on how well the OST activities are organized, but in general empirical evidence suggests that students’ participation in OST activities reduces the likelihood of aggressive or antisocial behaviour, alcohol and drug abuse, delinquency, and teenage pregnancy (Allen, Philliber, Herrling, & Kuperminc, 1997; Gottfredson, Gerstenblith, Soulé, Womer, & Lu, 2004). In Mahoney’s (2000) longitudinal study of antisocial patterns among at-risk adolescents, participation in OST activities organized by schools was associated with reduced rates of dropout and future arrests. Another focus of structured OST activities is its association with adolescents’ psychosocial development. In similar ways that unstructured leisure activities are relevant to psychosocial well-being (Caldwell & Witt, 2011), participation in structured OST activities may be related to the relief of negative emotions (Mahoney et  al., 2005). In their longitudinal study of a high school OST programme in the United States, Barber, Eccles, and Stone (2001) reported reduced levels of worry, social isolation and self-esteem. Research also suggests that participation in organized OST programs can be related to high levels of self-efficacy and motivation (Duda & Ntoumanis, 2005). A number of studies have also revealed how development of social identity is linked to the structured activities (Catalano, Berglund, Ryan, Lonezak, & Hawkins, 2004; Jones & Deutsch, 2012; Luehmann & Markowitz, 2007; McIntosh, Metz, & Youniss, 2005). Although much existing research has stressed

the beneficial psychological outcomes from OST activities, the impact can be different, depending on the content and duration. In South Korea, where education is highly competitive, Hong et  al. (2011) found that time spent in extracurricular education in excess of four hours per day was correlated with increased depressive symptoms among the 761 first graders surveyed. One final highlighted research focus is the influence of OST activities on students’ educational attainment and achievement. The educational gains of structured OST programmes may include reduction of school dropout (Mahoney & Cairns, 1997; McNeal, 1995), improvement in academic performance (Broh, 2002; Fredricks & Eccles, 2006; Won & Han, 2010), or both. O’Donnell and Kirkner (2014) examined the effects of the YMCA High School Youth Institute on attainment and attendance of urban high school students in the United States. The authors suggested that participants of the OST program had significantly higher test scores in mathematics and English language, and fewer absences than those who did not participate. In global comparisons of mathematics and science achievement, the durations of OST activities have also been viewed as important predictors of academic performance. Vest, Mahoney, and Simpkins (2013) studied data from the Trends in International Mathematics and Science Study (TIMSS) and identified strong correlations between the duration of OST activities and students’ performance in the international standardized mathematics and science achievement tests. Also drawing on the TIMSS data, Won and Han (2010) studied different OST activities and their relations to student performance between South Korea and the United States. The authors suggested that although reading was associated with high achievement for both countries, OST activities such as homework and sports were associated with achievement in different ways. Homework was a positive predictor to academic performance for Korean students, but it was a

Outside-School-Time Activities and Shadow Education

negative predictor in the United States. In contrast, the sports were negatively associated with achievement in South Korea and positively associated in the United States.

Researching OST Activities Although investigating student development outside school time is not simple, increasing research interest has been focused on two major goals. The first and direct goal is to identify the influence of OST activities on student development. Studies relevant to this goal include focus on educational gain and seek to explain how impacts are made through program design and implementation processes (Deutsch, Blyth, Kelley, Tolan, & Lerner, 2017). A second strong focus is on the role of OST activities in adolescent development. On the basis of conceptualization, the developed theories can be integrated with program design or empirical study (Tolan, 2014). To reach the aforementioned goals, a variety of research designs and methods can be identified. Eccles and Templeton (2002, p. 114) listed six groups: (1) in-depth ethnographic studies of local programs; (2) cross-sectional and longitudinal surveys of youth development across diverse contexts; (3) large- and small-scale experimental evaluations of both longstanding and new programs; (4) descriptive studies of programs considered effective by the communities in which they are located; (5) meta-analyses of published articles; and (6) more traditional summative reviews of both published and unpublished reports. For empirical studies, three major methodological issues must be considered. First, researchers must be careful about program participants and their sampling. The youths being studied in OST programs differ in gender, age, ethnicity, socio-economic status, sexual orientation, and community affiliation (Eccles & Templeton, 2002). Deutsch et  al. (2017, p. 55) suggested that research attention

363

should focus not only on participants of OST activities but also on those who did not participate. Samples are biased when they only represent the participating youths, and thus the generalizability of findings is uncertain. Second, there is a strong dominance of quantitative research methods in empirical OST studies. Much effort is focused on evaluation of after-school programmes or group-level analysis of participants using quantitative surveys. Because of heterogeneity and sampling issues, much research attention is needed on person-centred analyses of OST experiences using different or combined methodological approaches. Tolan and Deutsch (2015, p. 747) suggested that mixed-methods approaches could help to ‘elaborate and relate partial understandings into coherent and useful explanations of developmental phenomenon’. In the complex research field, the application of multiple analytic methods will help to create complementary understanding of relevant developments (Deutsch et al., 2017). Third, social and cultural contexts need to be considered carefully. Children’s and adolescents’ decision to participate in OST activities is under the influence of ‘a combination of person and context factors’ (Mahoney et  al., 2009, p. 230). In the United States, the Afterschool Alliance (2014) reported that youths from low-income households or ethnic minority groups were less likely to participate in OST programs. Context also matters in international comparisons. Sociocultural contexts may influence the types of programs that students would attend, the outcomes of participation, and the populations of attendance. Suter’s (2016) study of student involvement in OST programs and science achievement found that students from the United States and Ireland were attending science OST programs for both remedial or enhancement purposes whereas students in Great Britain mostly attended OST programs for remedial reasons. The difference in purposes of attendance contribute differently to student experiences and learning outcomes.

364

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Physical Context

Home, School, Religious Institution, Community Center, Family Residence, Shopping Mall, Neighborhood

Social Context

Solitary, Peers (age mates, siblings), Adults (relatives, non-relatives)

Out-of-school Activities/Arrangements After-school Programs Extracurricular Activities Community-based Organizations Adult Care (parents, relatives) Self-care (in-home, out-of-home)

Activity Opportunities

Passive, Unstructured (television) Active, Unstructured (video games) Active, Structured (supervised sports)

Amount of Exposure Duration Breadth Intensity

Figure 20.1  Key parameters in the ecology of OST activities (Mahoney et al., 2009, p. 232)

On the basis of meta-analyses and review of empirical studies, various efforts have been made to conceptualize outside school time. In their edited book on extracurricular activities, after-school and community programs, Mahoney and colleagues provided a comprehensive overview which perceived organized OST activities as the context for youth development (Mahoney et  al., 2005). This perspective is part of the conceptual shift around the beginning of the new millennium that recognized OST as an important opportunity for youth development rather than a problem to be managed (Roth, Brooks-Gunn, Murray, & Foster, 1998). Mahoney and colleagues’ vision is in line with the theoretical view in human development research which emphasizes human plasticity (Lerner, 2002, 2005). Therefore, researching OST could provide grounds for positive changes to youth development. In a later work, Mahoney et  al. (2009) employed the bioecological theory to construct a conceptual framework for systematic analysis of OST activities. This theory views children and adolescents as ‘active and purposeful agents in the developmental process’

who interact in an ecosystem (Mahoney et al., 2009, p. 230). Factors such as the physical context of programs, social background, level and type of activity, amount of exposure, and activity opportunities constitute an ecosystem within which the youths both affect and get affected by interrelationships (Figure 20.1).

SHADOW EDUCATION AND ITS IMPLICATIONS FOR BROADER OST STUDIES Definitions of Shadow Education Recent decades have brought a surge of shadow education or private supplementary tutoring across the globe (Bray & Kwo, 2014; Bray & Lykins, 2012). Increasingly, students spend considerable time on private tutoring, and the research focus of shadow education overlaps with the broader OST research. The book by Whewell (1838) is among the earliest works noting the help that private tutors can provide to school students. It

Outside-School-Time Activities and Shadow Education

was written in a European setting, but during the second half of the 20th century shadow education became particularly prominent in East Asia (Bray, 1999; Byun & Baker, 2015; Harnisch, 1994). During the present century, shadow education has rapidly expanded in other parts of Asia (Bray & Lykins, 2012; Dawson, 2010; Silova, 2009), in Africa (Antonowicz, Lesné, Stassen, & Wood, 2010; Buchmann, 2002), Europe (Bray, 2011; Mori & Baker, 2010), North America (Buchmann, Condron, & Roscigno, 2010; Davies & Aurini, 2004), and South America (Cámara & Gertel, 2016; Gomes & Ventura, 2013). The expansion of private tutoring has made it one important component of many students’ experience, and shadow education itself is receiving increasing attention from both researchers and policy makers because of its implications for educational quality and equity (Byun & Baker, 2015). The concept of shadow education was first used by scholars in the 1970s. Huberman (1970, p. 19) used the term ‘shadow systems’ to indicate alternative education programs which ‘allow for a natural compensation and balance against deficiencies in the formal education system’. Court (1973, p. 331) also used ‘shadow systems’ in discussion of ‘educational enterprises outside the formal system’ that ‘may complement the formal system’. In the early 1990s, several writers employed the term ‘shadow education’ to imply the strong connection between mainstream education and students’ OST learning experience in private tutoring (George, 1992; Marimuthu et al., 1991; Stevenson & Baker, 1992). Stevenson and Baker (1992, p. 1639) defined shadow education as ‘a set of educational activities that occur outside formal schooling and are designed to enhance the student’s formal school career’. Following increasing research attention in private tutoring, a definition of shadow education was developed by Bray (1999, p. 20) and has now become a widely-accepted instrument to determine the parameters of

365

tutoring programs (Byun & Baker, 2015). Specifically, shadow education is defined by three characteristics, namely: • Supplementation: tutoring that addresses subjects already covered in core curricula of mainstream schools; • Privateness: tutoring provided in exchange for a fee, and excluding free tutoring by family members or teachers as part of their duties; • Academic: tutoring in academic subjects, particularly languages, mathematics, and other examinable subjects. Arts, music, and sports are excluded because they are often learned for pleasure or personal development.

Methodological Challenges in Cross-National Studies of Shadow Education Bray, Kwo, and Jokić (2015) identified several challenges in cross-national studies of shadow education. The first, again, is one of vocabulary. In the English language, terms include private tutoring, private tuition, cramming, grinds, coaching, extra lessons, and supplemental education. In Greek, parapedia literally translates as parallel education; and in Korean, hagwon would usually be translated as private schools. Beyond the parameters of shadow education set by Bray’s 1999 book and following works, some blurring boundaries can be noted (Bray, 2010; Bray & Kobakhidze, 2014). The main implication of the metaphor of shadow education is that the private forms of provision mimic what public education offers. Yet as Bray et al. (2015, p. 5) noted, private tutoring is not always a precise mimic of the public sector, and it sometimes offers courses in advance of the regular public curriculum. Bray and Kwo (2015) observed that researchers must be aware about what models of tutoring are involved in their own studies before their definitions and parameters can be communicated clearly to the audiences. Factors may include:

366

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

• Scale: whether tutoring is provided on one-toone basis, in small groups, medium-sized classes, large classes, or mass-scale lectures (see e.g. Kwok, 2009). • Media: whether tutoring uses face-to-face tutorials, video recordings, live internet streaming (Ventura & Jang, 2010), or other forms (Buchmann et al., 2010). • Provision: whether tutoring is provided by individuals (e.g. independent tutors, secondary/ university students, retirees, and housewives), private organizations, or semi-public individuals (e.g. school teachers) (Kobakhidze, 2018), or public organizations (e.g. schools, universities, communities, and governmental services) (Byun & Baker, 2015, pp. 5–6). • Curriculum: whether tutoring is used in academic subjects (e.g. languages, mathematics) or nonacademic ones (e.g. music, dance, and sports). • Purpose: whether tutoring is provided for remedial reasons, as parallel courses to the school provision, in advance of mainstream instruction, or only focusing on preparation for examinations. • Levels: whether tutoring is provided to preprimary, primary, secondary, or post-secondary students.

The diversity of tutoring provision across countries brings challenges to measurement. Byun and Baker (2015, p. 4) noted that one common limitation in shadow education research is that many researchers ‘focused only on the effects of participation versus nonparticipation in shadow education’ and ignored the implications for measurement of the diversity of tutoring provision. In parallel, Bray and Kobakhidze (2014) noted problems in crossnational shadow education studies from ambiguous definitions of shadow education. Cross-national studies such as TIMSS and PISA have collected data on shadow education using a range of questions about what can be described as extra lessons, private tutoring, and out-of-school support. Problems arose from ambiguities and inconsistent phrasing of questions in the surveys and were exacerbated by lost or distorted meaning as the result of translations. Therefore, examining the detailed nature would be necessary for research and evaluation of shadow education programs.

It is difficult to determine the scale of shadow education because of the informality of much OST tutoring. Few countries have detailed regulations on shadow education (Bray & Kwo, 2014). Even within societies which regulate private tutoring, tutoring establishments may not be registered for many reasons, including to avoid tax or government attention. Thus, tutors may hesitate to provide information; and students who receive tutoring, and their parents, may be reluctant to reveal their participation in tutoring because they could feel humiliated for seeking remedial help or unfairly gaining competitive advantages. Nevertheless, a global pattern can be identified from international research literature. Since the beginning of the new millennium, the expansion of shadow education has been documented in almost all regions of the world (see e.g. Aurini et al., 2013; Bray, 2009; Park, Buchmann, Choi, & Merry, 2016). In highly competitive societies such as China (Zhang & Bray, 2017), Hong Kong (Wang & Bray, 2016), Japan (Dawson, 2010; Entrich, 2018), and South Korea (Kim & Lee, 2010), attending OST shadow education has become a standard part of student life. The OECD’s Programme for International Student Assessment (PISA) has also collected cross-national data on private tutoring in multiple countries. Although there were some measurement issues relevant to the definition, phrasing, and translation regarding the PISA questions (Bray & Kobakhidze, 2014), the study provided a basis for comparison across diverse settings at the level of secondary education. Figure 20.2 demonstrates the ratio of sampled 15-year-old students who indicated in the 2012 PISA questionnaire that they had participated in (1) OST classes organized by a commercial company and paid for by their parents, and (2) paid or unpaid personal tutoring. This snapshot of global participation in shadow education reflects the expansion of private supplementary education worldwide. Private tutoring is prominent not only in societies such as Greece and South Korea, which have long

Outside-School-Time Activities and Shadow Education

367

Figure 20.2  Percentages of 15-year-old students receiving supplementary education, 2012 (Bray, 2017, p. 475; adapted from Park et al., 2016, p. 233)

been well known for the education provisions of frontistiria and hagwon (Kassotakis & Verdis, 2013; Lee, Lee, & Jang, 2010), but

also in such countries as Thailand, Brazil, Russia, and Turkey (Lao, 2014; Manzon & Areepattamannil, 2014).

368

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Implications of Shadow Education Consideration of shadow education within the OST context raises various social and methodological implications. First, shadow education has implications for educational equity. Unlike non-profit OST programs organized by schools, communities, and nongovernmental organizations which are free or with fees only to cover their costs, private tutoring usually requires significant monetary payment. The commercial nature of shadow education has obvious implications for disparities across different socio-economic groups. Literature on shadow education has constantly recorded that students from higher socio-economic status (SES) families are more likely to receive more and better private tutoring than their counterparts from lower SES families (Byun & Baker, 2015). The disparities of the access to OST tutoring may eventually result in disparities of students’ learning and development within regular school systems. Zhang and Bray (2017) found that in Shanghai, China, when the entrance examination to lower secondary schooling was abolished in public schools the selective admission process went underground as schools entrusted tutorial centres to select high-performing students for admission. Shadow education may also have implications for students’ in-school learning experience. In some societies, tutoring is mostly offered by teachers, and students have little option to reject it. Elaborating on the Cambodian pattern in which teachers commonly offer tutoring instruction to their own students after official school hours yet still on the school premises (Brehm & Silova, 2014; Dawson, 2010), teachers may imply that the tutoring is not mandatory but parents worry about their children being left behind and being discriminated against by the teachers. In some countries, tutoring may even become a substitute for regular schooling. Students may prefer tutoring over regular schooling because they might perceive tutoring to be more helpful, especially with examination

preparation. In India, Pakistan and Turkey, researchers found that students escaped school to attend tutoring centres in order to prepare for competitive examinations (Bhorkar & Bray, 2018; Jilani, 2009; Tansel & Bircan, 2006). The boundaries between shadow education and mainstream schooling are blurring in such cases.

LOOKING BEYOND TRIBES AND TERRITORIES As noted in the introduction to this chapter, various interconnected and overlapping fields exist under the umbrella of OST. Within both the broad OST arena and its sub-fields, comparative perspectives are always valuable, and much can be gained from looking beyond the boundaries of what Becher and Trowler (2001) called academic tribes and territories. Among the merits of comparative perspectives are the stimuli that can be gained from review of research designs and programs in other fields and countries. Russell (1997) noted that parental engagement formed a learning society in Japan and suggested that experience from Japanese juku could shed light on how American education could develop. One challenge in the United States is the unsupervised period after school that is relevant to juvenile crimes. Experiences from OST programs elsewhere involving parental engagement may help researchers to design activities. Shadow education scholars may also gain from experience of research in OST programmes from the United States. The theories and research methods that have been developed may be adopted and adapted. For example, one challenge facing shadow education researchers concerns the weakness of instruments to determine the exact nature and magnitude of participation (Byun & Baker, 2015; Kobakhidze, 2015). Many research designs involving shadow education have used participation as an indicator, but specific aspects of participation are often

Outside-School-Time Activities and Shadow Education

absent in such designs. OST researchers have commonly used stylized questions to capture detailed participation of diverse activities (Mahoney et al., 2009). Adapting such experimental designs or approaches may assist researchers to investigate the research field more effectively. Looking beyond the boundaries of tribes and territories can also help researchers and policy makers to avoid possible mistakes. It is especially important for cross-national studies to identify local OST characteristics before comparisons are made. For children from Great Britain and Thailand, OST experiences may be different due to local cultures and education policies. The cultural and geographical boundaries between different tribes and territories are also blurring in academic fields relevant to OST. In the United States, where tutoring is not as widespread as Japan, forms of shadow education have already started to emerge (Buchmann et al., 2010); and in China, labour migration has produced many leftbehind children in rural areas who may also face problems of unsupervised OST (Chang, Dong, & MacPhail, 2011). Learning from OST experiences in other societies can help countries to respond better to challenges. For most children around the world, the end of a school day does not mark the end of the education day. From leisure-time activities to surging shadow education, OST research involves diverse fields with dynamic contexts. Future advances in OST research call for comparative perspectives which look beyond national boundaries and beyond the parameters of tribes and territories.

REFERENCES Afterschool Alliance. (2014). America after 3PM: Afterschool Programs in Demand. Washington, DC: Author. Afterschool Alliance. (2015). Evaluations Backgrounder: A Summary of Formal Evaluations of Afterschool Programs’ Impact on Academics, Behavior, Safety and Family Life.

369

Washington, DC: Author. Retrieved from http://afterschoolalliance.org//documents/ Evaluation_Backgrounder.pdf Allen, J. P., Philliber, S., Herrling, S., & Kuperminc, G. P. (1997). Preventing teen pregnancy and academic failure: Experimental evaluation of a developmentally based approach. Child Development, 68(4), 729–742. Antonowicz, L., Lesné, F., Stassen, S., & Wood, J. (2010). Africa Education Watch: Good Governance Lessons for Primary Education. Berlin: Transparency International. Aurini, J., Davies, S., & Dierkes, J. (2013). Out of the Shadows: The Global Intensification of Supplementary Education. Bingley, UK: Emerald. Banks, J. A., Au, K. H., Ball, A. F., Bell, P., Gordon, E. W., Gutiérrez, K. D., & Zhou, M. (2007). Learning in Out of School in Diverse Environments: Life-long, Life-wide, Lifedeep. Seattle, WA: The Learning in Informal and Formal Environments Center, Center for Multicultural Education, University of Washington. Barber, B. L., Eccles, J. S., & Stone, M. R. (2001). Whatever happened to the jock, the brain, and the princess? Journal of Adolescent Research, 16(5), 429–455. Becher, T., & Trowler, P. R. (2001). Academic Tribes and Territories: Intellectual Enquiry and the Culture of Disciplines. Buckingham, UK: Open University Press. Bhorkar, S., & Bray, M. (2018). The expansion and roles of private tutoring in India: From supplementation to supplantation. International Journal of Educational Development, 62, 148–156. Böhm-Kasper, O., Dizinger, V., & Gausling, P. (2016). Multiprofessional collaboration between teachers and other educational staff at German all-day schools as a characteristic of today’s professionalism. International Journal for Research on Extended Education, 4(1), 29–51. Bray, M. (1999). The Shadow Education System: Private Tutoring and its Implications for Planners. Paris: UNESCO International Institute for Educational Planning (IIEP). Bray, M. (2009). Confronting the Shadow Education System: What Government Policies for what Private Tutoring? Paris: UNESCO International Institute for Educational Planning (IIEP).

370

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Bray, M. (2010). Blurring boundaries: The growing visibility, evolving forms and complex implications of private supplementary tutoring. Orbis Scholae, 4(2), 61–72. Bray, M. (2011). The Challenge of Shadow Education: Private Tutoring and its Implications for Policy Makers in the European Union. Brussels: European Commission. Bray, M. (2017). Schooling and its supplements: Changing global patterns and implications for comparative education. Comparative Education Review, 61(3), 469–491. Bray, M., & Kobakhidze, M. N. (2014). Measurement issues in research on shadow education: Challenges and pitfalls encountered in TIMSS and PISA. Comparative Education Review, 58(4), 590–620. Bray, M., Kobakhidze, M. N., Liu, J., & Zhang, W. (2016). The internal dynamics of privatised public education: Fee-charging supplementary tutoring provided by teachers in Cambodia. International Journal of Educational Development, 49, 291–299. Bray, M., & Kwo, O. (2014). Regulating Private Tutoring for Public Good: Policy Options for Supplementary Education in Asia. Hong Kong: Comparative Education Research Centre. Bray, M., & Kwo, O. (2015). Organisational and cross-cultural issues: Learning from research approaches. In M. Bray, O. Kwo, & B. Jokić (Eds.), Researching Private Supplementary Tutoring: Methodological Lessons from Diverse Cultures (pp. 261–288). Hong Kong: Comparative Education Research Centre (CERC), The University of Hong Kong, and Dordrecht: Springer Netherlands. Bray, M., Kwo, O., & Jokić, B. (2015). Introduction. In M. Bray, O. Kwo, & B. Jokić (Eds.), Researching Private Supplementary Tutoring: Methodological Lessons from Diverse Cultures (pp. 3–19). Hong Kong: Comparative Education Research Centre (CERC), The University of Hong Kong, and Dordrecht: Springer Netherlands. Bray, M., & Lykins, C. (2012). Shadow Education: Private Supplementary Tutoring and Its Implications for Policy Makers in Asia. Manila & Hong Kong: Asian Development Bank and Comparative Education Research Centre. Brehm, W. C., & Silova, I. (2014). Hidden privatization of public education in Cambodia: Equity

implications of private tutoring. Journal for Educational Research Online, 6(1), 94–116. Broh, B. A. (2002). Linking extracurricular programming to academic achievement: Who benefits and why? Sociology of Education, 75(1), 69–95. Buchmann, C. (2002). Getting ahead in Kenya: Social capital, shadow education, and achievement. In Schooling and Social Capital in Diverse Cultures (Vol. 13, pp. 133–159). Bingley, UK: Emerald. Buchmann, C., Condron, D. J., & Roscigno, V. J. (2010). Shadow education, American style: Test preparation, the SAT and college enrollment. Social Forces, 89(2), 435–461. Byun, S.-Y., & Baker, D. P. (2015). Shadow education. In R. A. Scott & M. C. Buchmann (Eds.), Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource (pp. 1–9). Online. Hoboken, HJ: John Wiley & Sons. Caldwell, L. L., & Witt, P. A. (2011). Leisure, recreation, and play from a developmental context. New Directions for Youth Development, 2011(130), 13–27. Cámara, F., & Gertel, H. R. (2016). The shadow education market of a mass higher education institution in Argentina. In M. F. Astiz & M. Akiba (Eds.), The Global and the Local: Diverse Perspectives in Comparative Education (pp.133–153), Rotterdam: Sense. Catalano, R. F., Berglund, M. L., Ryan, J. A. M., Lonczak, H. S., & Hawkins, J. D. (2004). Positive youth development in the United States: Research findings on evaluations of positive youth development programs. The ANNALS of the American Academy of Political and Social Science, 591(1), 98–124. Chang, H., Dong, X., & MacPhail, F. (2011). Labor migration and time use patterns of the left-behind children and elderly in rural China. World Development, 39(12), 2199–2210. Cooper, H., Valentine, J. C., Nye, B., & Lindsay, J. J. (1999). Relationships between five after-school activities and academic achievement. Journal of Educational Psychology, 91(2), 369–378. Court, D. (1973). Dilemmas of development: The village polytechnic movement as a shadow system of education in Kenya. Comparative Education Review, 17(3), 331–349. Davies, S., & Aurini, J. (2004). The transformation of private tutoring: Education in a

Outside-School-Time Activities and Shadow Education

franchise form. The Canadian Journal of Sociology, 29(3), 419–438. Dawson, W. (2010). Private tutoring and mass schooling in East Asia: reflections of inequality in Japan, South Korea, and Cambodia. Asia Pacific Education Review, 11(1), 14–24. Deutsch, N. L., Blyth, D. A., Kelley, J., Tolan, P. H., & Lerner, R. M. (2017). Let’s talk afterschool: The promises and challenges of positive youth development for after-school research, policy, and practice. In N. L. Deutsch (Ed.), After-School Programs to Promote Positive Youth Development: Integrating Research into Practice and Policy (Vol. 1, pp. 45–68). Cham, Switzerland: Springer International. Duda, J. L., & Ntoumanis, N. (2005). Afterschool sport for children: Implications of a task-involving motivational climate. In J. L. Mahoney, R. W. Larson, & J. S. Eccles (Eds.), Organized Activities as Contexts of Development: Extracurricular Activities, After-School and Community Programs (pp. 311–330). Mahwah, NJ: Lawrence Erlbaum Associates. Eccles, J. S., & Templeton, J. (2002). Chapter 4: Extracurricular and other after-school activities for youth. Review of Research in Education, 26(1), 113–180. Entrich, S. (2018). Shadow Education and Social Inequalities in Japan: Evolving Patterns and Conceptual Implications. Dordrecht: Springer Netherlands. Farrington, D. P., & Welsh, B. C. (2008). Saving Children from a Life of Crime: Early Risk Factors and Effective Interventions. Oxford: Oxford University Press. Fredricks, J. A., & Eccles, J. S. (2006). Is extracurricular participation associated with beneficial outcomes? Concurrent and longitudinal relations. Developmental Psychology, 42(4), 698–713. George, C. (1992). Time to come out of the shadows. Straits Times, 4 April. Gomes, C., & Ventura, A. (2013). Supplementary education in Brazil: Diversity and paradoxes. In Out of the Shadows: The Global Intensification of Supplementary Education (Vol. 22, pp. 129– 151). Bingley, UK: Emerald. Gottfredson, D. C., Gerstenblith, S. A., Soulé, D. A., Womer, S. C., & Lu, S. (2004). Do after school programs reduce delinquency? Prevention Science, 5(4), 253–266.

371

Harnisch, D. L. (1994). Supplemental education in Japan: Juku schooling and its implication. Journal of Curriculum Studies, 26(3), 323–334. Hong, H. J., Kim, Y. S., Jon, D.-I., Soek, J. H., Hong, N., Harkavy-Friedman, J. M., & Greenhill, L. L. (2011). Mental health and extracurricular education in Korean first graders: A school-based cross-sectional study. The Journal of Clinical Psychiatry, 72(6), 861–868. Huberman, M. (1970). Reflections on democratization of secondary and higher education. International Education Year 1970 (Vol. 4). Paris: UNESCO. Jilani, R. (2009). Problematizing high school certificate exam in Pakistan: A washback perspective. The Reading Matrix, 9(2), 175–183. Jones, J. N., & Deutsch, N. L. (2012). Social and identity development in an after-school program. The Journal of Early Adolescence, 33(1), 17–43. Kassotakis, M., & Verdis, A. (2013). Shadow education in Greece: Characteristics, consequences and eradication efforts. In M. Bray, A. E. Mazawi, & R. G. Sultana (Eds.), Private Tutoring across the Mediterranean: Power Dynamics and Implications for Learning and Equity. Rotterdam: Sense Publishers. Kim, S., & Lee, J. H. (2010). Private tutoring and demand for education in South Korea. Economic Development and Cultural Change, 58(2), 259–296. Kleiber, D. A., & Powell, G. M. (2005). Historical change in leisure activities during afterschool hours. In J. L. Mahoney, R. W. Larson, & J. S. Eccles (Eds.), Organized Activities as Contexts of Development: Extracurricular Activities, After-School and Community Programs (pp. 23–44). Mahwah, NJ: Lawrence Erlbaum Associates. Kobakhidze, M. N. (2015). Shadow education research through TIMSS and PIRLS: Experiences and lessons in the Republic of Georgia. In M. Bray, O. Kwo, & B. Jokić (Eds.), Researching Private Supplementary Tutoring: Methodological Lessons from Diverse Cultures (pp. 23–48). Hong Kong: Comparative Education Research Centre, The University of Hong Kong, and Dordrecht: Springer Netherlands. Kobakhidze, M. N. (2018). Teachers as Tutors: Shadow Education Market Dynamics in

372

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Georgia. Hong Kong: Comparative Education Research Centre, The University of Hong Kong, and Dordrecht: Springer Netherlands. Kwok, P. (2009). A cultural analysis of cram schools in Hong Kong: Impact on youth values and implications. Journal of Youth Studies [Hong Kong], 12(1), 104–114. Lao, R. (2014). Analyzing the Thai state policy on private tutoring: The prevalence of the market discourse. Asia Pacific Journal of Education, 34(4), 476–491. Larson, R. W., & Verma, S. (1999). How children and adolescents spend time across the world: Work, play, and developmental opportunities. Psychological Bulletin, 125(6), 701–736. Lee, C., Lee, H., & Jang, H.-M. (2010). The history of policy responses to shadow education in South Korea: Implications for the next cycle of policy responses. Asia Pacific Education Review, 11(1), 97–108. Lerner, R. M. (2002). Concepts and Theories of Human Development. Mahwah, NJ: Lawrence Erlbaum Associates. Lerner, R. M. (2005). Foreword: Promoting positive youth development through community and after-school programs. In J. L. Mahoney, R. W. Larson, & J. S. Eccles (Eds.), Organized Activities as Contexts of Development: Extracurricular Activities, After-School and Community Programs (pp. ix–xii). Mahwah, NJ: Lawrence Erlbaum Associates. Liang, Y.-l. (2016). Taiwan high school hours ranks first in the world and generates different views. Retrieved from https://international.thenewslens.com/article/37769 Little, P. M. D., Wimer, C., & Weiss, H. B. (2008). After School Programs in the 21st Century: Their Potential and What it Takes to Achieve It (Vol. 10). Cambridge, MA: Harvard Family Research Project. Luehmann, A. L., & Markowitz, D. (2007). Science teachers’ perceived benefits of an outof-school enrichment programme: Identity needs and university affordances. International Journal of Science Education, 29(9), 1133–1161. Mahoney, J. L. (2000). School extracurricular activity participation as a moderator in the development of antisocial patterns. Child Development, 71(2), 502–516. Mahoney, J. L., & Cairns, R. B. (1997). Do extracurricular activities protect against early

school dropout? Developmental Psychology, 33(2), 241–253. Mahoney, J. L., Larson, R. W., Eccles, J. S., & Lord, H. (2005). Oganized activities as development contexts for children and adolescents. In J. L. Mahoney, R. W. Larson, & J. S. Eccles (Eds.), Organized Activities as Contexts of Development: Extracurricular Activities, After-School and Community Programs (pp. 3–22). Mahwah, NJ: Lawrence Erlbaum Associates. Mahoney, J. L., Schweder, A. E., & Stattin, H. (2002). Structured after-school activities as a moderator of depressed mood for adolescents with detached relations to their parents. Journal of Community Psychology, 30(1), 69–86. Mahoney, J. L., Vandell, D. L., Simpkins, S., & Zarrett, N. (2009). Adolescent out-of-school activities. In R. Lerner & L. Steinberg (Eds.), Handbook of Adolescent Psychology (pp. 228–269). New York: John Wiley & Sons. Manzon, M., & Areepattamannil, S. (2014). Shadow educations: Mapping the global discourse. Asia Pacific Journal of Education, 34(4), 389–402. Marimuthu, T., Singh, J. S., Ahmad, K., Lim, H. K., Mukherjee, H., & Osman, S. (1991). Extra-School Instruction, Social Equity and Educational Quality. Singapore: International Development Research Centre. McIntosh, H., Metz, E., & Youniss, J. (2005). Community service and identity formation in adolescents. In J. L. Mahoney, R. W. Larson, & J. S. Eccles (Eds.), Organized Activities as Contexts of Development: Extracurricular Activities, After-School and Community Programs (pp. 331–352). Mahwah, NJ: Lawrence Erlbaum Associates. McNeal, R. B. (1995). Extracurricular activities and high school dropouts. Sociology of Education, 68(1), 62–80. Mori, I., & Baker, D. (2010). The origin of universal shadow education: What the supplemental education phenomenon tells us about the postmodern institution of education. Asia Pacific Education Review, 11(1), 36–48. Noam, G. G., & Shah, A. (2014). Informal science and youth development: Creating convergence in out-of-school time. Teachers College Record, 116(13), 199–218.

Outside-School-Time Activities and Shadow Education

O’Donnell, J., & Kirkner, S. L. (2014). Effects of an out-of-school program on urban high school youth’s academic performance. Journal of Community Psychology, 42(2), 176–190. OECD. (2011). Education at a Glance 2011: OECD Indicators. Brussels: OECD Publishing. Park, H., Buchmann, C., Choi, J., & Merry, J. J. (2016). Learning beyond the school walls: Trends and implications. Annual Review of Sociology, 42, 231–252. Roth, J., Brooks-Gunn, J., Murray, L., & Foster, W. (1998). Promoting healthy adolescents: Synthesis of youth development program evaluations. Journal of Research on Adolescence, 8(4), 423–459. Russell, N. U. (1997). Lessons from Japanese cram schools. In W. K. Cummings & P. G. Altbach (Eds.), The Challenge of Eastern Asian Education: Implications for America (pp. 153–171). Albany, NY: State University of New York Press. Silova, I. (2009). Examining the scope, nature and implications of private tutoring in Central Asia. In I. Silova (Ed.), Private Supplementary Tutoring in Central Asia New Opportunities and Burdens (pp. 69–92). Paris: UNESCO International Institute for Educational Planning (IIEP). Singapore, Ministry of Education. (2014). A Holistic Education for Secondary School Students: LEAPS 2.0. Retrieved from www.moe.gov.sg/ docs/default-source/document/education/programmes/co-curricular-activities/leaps-2.pdf Stecher, L., & Maschke, S. (2013). Research on extended education in Germany: A general model with all-day schooling and private tutoring as two examples. International Journal for Research on Extended Education, 1(1), 31–52. Stecher, L., Maschke, S., Klieme, E., Fischer, N., Dyson, A., Mahoney, J., & Bae, S. H. (2013). Editorial. International Journal for Research on Extended Education, 1(1), 3–4. Stevenson, D. L., & Baker, D. P. (1992). Shadow education and allocation in formal schooling: Transition to university in Japan. American Journal of Sociology, 97(6), 1639–1657. Suter, L. E. (2016). Outside school time: An examination of science achievement and non-cognitive characteristics of 15–year olds in several countries. International Journal of Science Education, 38(4), 663–687. Tansel, A., & Bircan, F. (2006). Demand for education in Turkey: A tobit analysis of

373

private tutoring expenditures. Economics of Education Review, 25(3), 303–313. Tolan, P. H. (2014). Future directions for positive development intervention research. Journal of Clinical Child and Adolescent Psychology, 43(4), 686–694. Tolan, P. H., & Deutsch, N. L. (2015). Mixed methods in developmental science. Handbook of Child Psychology and Developmental Science. New York: John Wiley & Sons. UNICEF. (2014). Briefing Note: School Clubs. Addis Ababa: UNICEF Ethiopia. US Department of Education. (2012). Description of Supplemental Educational Services. Retrieved from www2.ed.gov/nclb/choice/ help/ses/description.html Ventura, A., & Jang, S. (2010). Private tutoring through the internet: Globalization and offshoring. Asia Pacific Education Review, 11(1), 59–68. Vest, A. E., Mahoney, J. L., & Simpkins, S. D. (2013). Patterns of out-of-school time use around the world: Do they help to explain international differences in mathematics and science achievement. International Journal for Research on Extended Education, 1(1), 71–85. Wang, D., & Bray, M. (2016). When wholeperson development encounters social stratification: Teachers’ ambivalent attitudes towards private supplementary tutoring in Hong Kong. The Asia-Pacific Education Researcher, 25(5–6), 873–881. Whewell, W. (1838). Of private tutors. In W. Whewell, On the Principles of English University Education (pp. 70–75). London: J. W. Parker. Won, S. J., & Han, S. (2010). Out-of-school activities and achievement among middle school students in the U.S. and South Korea. Journal of Advanced Academics, 21(4), 628–661. Zhang, W., & Bray, M. (2017). Microneoliberalism in China: Public–private interactions at the confluence of mainstream and shadow education. Journal of Education Policy, 32(1), 63–81. Zimmer, R., Hamilton, L., & Christina, R. (2010). After-school tutoring in the context of No Child Left Behind: Effectiveness of two programs in the Pittsburgh public schools. Economics of Education Review, 29(1), 18–28.

21 Measuring Opportunity: Two Perspectives on the ‘Black Box’ of School Learning Leland S. Cogan and William H. Schmidt

To establish the relevance of test items to pupils’ learning opportunities is important both from the point of view of measuring achievement and from that of maintaining the goodwill of teachers whose pupils undergo the tests… (Walker, 1962, p. 63) The school environment of a child consists of many elements, ranging from the desk he sits at to the child who sits next to him, and including the teacher who stands at the front of his class. A statistical survey can give only fragmentary evidence of this environment. (Coleman et al., 1966, p. 8)

INTRODUCTION Educational opportunity and opportunity to learn (OTL) appear to be simple ideas yet these can also be rather complex constructs in educational research. At their core is the concept of opportunity and the idea that what students learn or are able to do is shaped, affected by, or somehow a function of the opportunities provided. In the most proximal sense, the learning opportunities students

experience in schools are a function of the decisions a teacher makes in deciding which things to teach, how to sequence the topics to be taught, how much time to teach each concept, and which instructional strategies to use with students. Such decisions are made by teachers who are the decision-makers most closely connected to students (Schmidt & McKnight, 1995; Schwille, Porter, Belli, Floden, Freeman, Knappen, Kuhs, & Schmidt, 1983). Teachers’ instructional decisions are often constrained or otherwise affected, however, by more distal decisionmakers, such as school and district administrators and regional, state or national policy-makers. All of these decision-makers have the potential to affect the substance, timing, and manner in which students encounter what it is they are to learn. Exactly how students encounter, interact with, and come to learn what they do has been considered the ‘black box’ of schooling or student learning. Distal policy-makers may not make many decisions

TWO PERSPECTIVES ON THE ‘BLACK BOX’ OF SCHOOL LEARNING

about what occurs in this black box, but they profoundly affect the context and quality of the block box, the teaching/learning environment of schools and the classrooms in them. Such factors as the location of schools, who attends each one, the quality and comfort of the school building and its furnishings, the quality and availability of textbooks and other instructional related resources, as well as the quantity and quality of the instructional personnel are all policy issues determined before any school opens its doors for students and those selected to function as its faculty. The two quotes that begin this chapter illustrate two perspectives or approaches to research on the black box. The one focuses on the classroom learning opportunities, what occurs inside the black box that has a direct effect on student learning; the other focuses on the school environment, the context surrounding the black box. These two perspectives are rooted in two different social science disciplines. Researchers with a background in sociology tend to talk about educational opportunity and to focus primarily

375

on the contextual determinants affecting the black box. Those with a background in psychology and human development tend to talk about student learning opportunities and to focus on the instructional, pedagogical, and academic factors defining the substance of what occurs inside the black box and how students interact with it. Approaching the black box from two different perspectives, it would not be surprising to find that proponents of one may have difficulty making sense of or appreciating the ideas of the other, and vice versa. Yet these two are not mutually exclusive and the two together provide a more complete perspective on the black box of learning in schools. These two perspectives are represented in the conceptual model represented in Figure  21.1. Both the contextual (SES) and the instructional (OTL) factors are considered to have an effect on student achievement or learning. Yet there is also an interplay between these factors. Some have considered this problematic, a collinearity that yields results that are difficult to interpret (e.g.,

Schooling (OTL)

Student Achievement (Learning)

Home and Family Background (SES)

Figure 21.1  Conceptual model of school learning (student achievement)

376

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Heyneman & Loxley, 1983). In our decades of work on international assessments, primarily of mathematics and science, this conceptual model has proved fruitful in understanding student achievement both within our own country (the USA) and a few others as well as internationally, across all participating countries (Cogan, Schmidt, & Wiley, 2001; Schmidt, Cogan, Houang, & McKnight, 2011; Schmidt, Houang, Cogan, & Solorio, 2018; Schmidt, McKnight, Houang, Wang, Wiley, & Cogan, 2001; Schmidt, Zoido, & Cogan, 2014). Although our work has focused on mathematics and science as taught in schools, there is little to indicate this model must be restricted to these two disciplines alone. Nonetheless, our experience with international assessments of mathematics and science are the foundation upon which the substance of this chapter rests.

QUANTIFYING EDUCATION OPPORTUNITY The field of comparative international education has long considered the social, political and economic context of schools and schooling as essential factors affecting all aspects of the education enterprise. The early history of comparative education consisted primarily of qualitatively rich descriptions of these factors. Early efforts to quantify the influence of family background on student learning led researchers to count various resources in the home, such as books, toys, and possessions, that were considered to be an indication of the family’s wealth and status (Schmidt, Houang, Cogan, & Solorio, 2018). A more quantitative approach to the education enterprise took center stage in the United States as the US Congress, responding at least in part to the 1964 Civil Rights Act, commissioned a report to document the extent to which the educational experiences of white students and African-American

students and others differed. Coleman et al.’s (1966) comprehensive examination of many aspects around education cast a broad net that characterized not only aspects of local K-12 schools but the future teachers of minority students, access to and participation in higher education, case studies of integration, along with extended examinations of important education resources, including vocational education, guidance counselors, and Project Headstart. This far-reaching examination of K-12 schools included characterizing school faculty and staff, training, experience, and certification as well as the social environment of students through characterizing the attitudes, motivation, and education-related goals of students along with their parents’ educational expectations for them. Surveys completed by principals and teachers indicated the availability of different curriculum programs, such as commercial, college preparatory, general, or vocational. This broad-stroke characterization of the curriculum was the only attempt to obtain any indication of what occurs within the black box, yet it did reveal some differences between the educational opportunities that white and African-American students had: 97% of white students attended schools offering a college preparatory curriculum, yet this was the case for only 87% of the African-American students. Similar differences were observed for attendance at schools offering a commercial curriculum: 94% for white students; 75% for AfricanAmerican students (Coleman et al., 1966). In a subsequent article, Coleman (1968) stated that ‘[i]n the United States … the concept of education opportunity had a special meaning which focused on equality.’ This inherent interest in equality in considerations of educational opportunity carried with it a need for quantifiable measures that could support or disconfirm conclusions or claims about the equality of the educational experiences for all students. This quantification of educational opportunity to document the experiences among

TWO PERSPECTIVES ON THE ‘BLACK BOX’ OF SCHOOL LEARNING

different groups of students may have been a particular concern within the United States but was reflected in modern international comparative education studies from their beginning. International studies of student achievement always include some measures of family background (Buchmann, 2002). Studies sponsored by the International Association for the Evaluation of Educational Achievement (IEA), such as the Trends in Mathematics and Science Study (TIMSS) and the Progress in International Reading Literacy Study (PIRLS), have students respond to questions about the language they speak in the home, the number of books in their home as well as questions about the presence of a list of wealth-indicating and ­ education-related possessions in their home and their parents’ educational background. Very similar items exist in OECD’s Programme for International Student Assessment (PISA). Unlike the IEA studies, however, PISA creates a composite socio-economic (SES) index from items addressing a student’s family’s economic, social, and cultural status (Schmidt, Houang, Cogan, & Solorio, 2018). Many of these factors were included in the Coleman et al. (1966) report. Table 21.1 lists some of these factors that were considered in that educational opportunity report. Buchmann (2002) noted three roles for SES measures in international comparative assessment studies. One is simply to serve as a statistical control in analyses in which the focus is on the relationship between other factors. A second role is as an indication of family background in general in recognition of the continuing effects such background may have on students’ attitudes towards education, motivation towards learning, their educational aspirations, and their educational outcomes. A third role in international comparisons relates to differences in the social and economic context from one country to another. This role recognizes the important effect SES factors have on educational outcomes, yet such effects are not viewed as invariant across all or any of the national and political

377

contexts represented by countries. Table 21.1 also includes factors considered in a more recent article making use of the 2003 TIMSS (Chudgar & Luschei, 2009). This economic perspective article focused primarily on Buchmann’s second role relating inequalities in measures of wealth and other contextual factors to students’ educational opportunity making use of the IEA’s 2003 TIMSS. Table 21.2 illustrates the relationship between SES components and mathematics performance at the student level as these have been measured in three international studies. The countries listed in the table are those that have participated in all three studies: 1995 TIMSS, 2011 TRENDS (TIMSS), and the 2012 PISA. The number of books in the home (Books) was measured by the same question in both TIMSS studies, while the PISA split the highest TIMSS category (more than 200 books) into two categories: 201–500 books and more than 500 books. In each of the three studies, parents’ education (Parent Ed) was the highest level of education obtained by either parent using the standard ISCED definitions (UNESCO, 2012). The measure of wealth (Possessions) in PISA 2012 considered more items (17) than was the case for TIMSS 1995 (e.g., six) or TIMSS 2011 (e.g., five). The TIMSS SES measures are a simple summation across these three components. The PISA 2012 SES in Table 21.1 is the composite ESCS measure that includes these three components together with the highest occupational status of either parent. Despite slight differences in measurement, the correlations for these components with student mathematics performance appear rather consistent for each country yet the strength of these correlations vary from country to country. For example, the correlation with the number of books in the home ranges in Table 21.2 from a low of 0.13 for Thailand in the 1995 TIMSS to a high of 0.52 for Hungary in PISA 2012. Similarly, the highest level of parents’ education ranges from a low of 0.14 in Sweden (TIMSS 1995) to a high of 0.43 for Israel in TIMSS 2011.

378

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Table 21.1  Sociological perspectives on education opportunity Project Reference

Research Project Source

Opportunity Related Constructs

Coleman et al., 1966

Student, Teacher and Principal Questionnaires

Chudgar & Luschei, 2009

TIMSS 2003 Elementary School Student Questionnaire: Student and Family Characteristics

School environment: • Types of facilities available, e.g., auditorium, cafeteria, gymnasium, laboratory • Age of textbooks • Number of school days • Average number of hours in academic day • Number of daily homework hours expected • Curriculums offered, e.g., college preparatory, general, vocational • Average teacher verbal score • Encyclopedia in the home • Parents’ expectation for college • Student’s desire to finish college Student variables: • Age and gender • Student self-confidence in math/science • Time spent on math/science homework • How often test language spoken at home • Family socio-economic status (based on number of books in the home and number of possessions related to learning, e.g., dictionary, calculator, computer, desk) • Index of school safety

Table 21.2  Student-level correlations of mathematics achievement with SES and its components in three international mathematics studies: TIMSS 1995 (T95), TIMSS 2011 (T11), and PISA 2012 (P12) Books Country

T95

T11

Possessions

Parent Ed

P12

T95

T11

P12

T95

SES

T11

P12

T95

T11

P12

Australia

0.28

0.38

0.35

0.23

0.23

0.22

0.31

0.42

0.28

0.39

0.50

0.35

Hong Kong

0.14

0.30

0.30

0.14

0.12

0.22

0.18

0.26

0.23

0.23

0.32

0.27

Hungary

0.35

0.51

0.52

0.33

0.38

0.39

0.29

0.42

0.36

0.42

0.57

0.48

Israel

0.20

0.30

0.31

0.17

0.32

0.21

0.22

0.43

0.33

0.29

0.46

0.41

0.28

0.25

0.21

0.24

0.31

0.28

0.38

0.32

0.34

0.42

0.36

0.25

0.22

0.29

0.24

0.33

0.24

0.37

0.46

0.32

Lithuania

0.31

0.37

0.36

0.21

0.29

0.29

0.21

0.35

0.26

0.32

0.44

0.37

New Zealand

0.28

0.39

0.40

0.20

0.29

0.33

0.22

0.37

0.26

0.32

0.49

0.42

Japan* Korea

Romania

0.27

0.46

0.40

0.27

0.38

0.37

0.23

0.37

0.23

0.33

0.51

0.44

Russian Federation

0.25

0.25

0.29

0.14

0.12

0.24

0.22

0.29

0.24

0.28

0.30

0.34

Singapore

0.22

0.32

0.32

0.17

0.26

0.31

0.22

0.27

0.30

0.27

0.38

0.38

Slovenia

0.29

0.38

0.40

0.19

0.11

0.24

0.28

0.29

0.29

0.34

0.39

0.39

Sweden

0.27

0.36

0.40

0.18

0.21

0.23

0.14

0.23

0.17

0.27

0.39

0.33

Thailand

0.13

0.26

0.24

0.15

0.30

0.28

0.24

0.32

0.24

0.23

0.41

0.31

USA

0.32

0.38

0.39

0.27

0.25

0.32

0.23

0.29

0.26

0.37

0.42

0.39

*Japan did not include any of the SES-related questions on the student background questionnaire in the 1995 TIMSS.

TWO PERSPECTIVES ON THE ‘BLACK BOX’ OF SCHOOL LEARNING

ASSESSING OPPORTUNITY TO LEARN (OTL) Dissatisfied with the state of comparative education that yielded rich qualitative descriptions of educational systems, a group of university professors of educational psychology and psychometrics gathered in the late 1950s to discuss the viability of constructing quantitative measures for such comparisons. In particular, there was an interest to address what C. Arnold Anderson, a University of Chicago professor, described as ‘the major missing link in comparative education,’ which was ‘the scarcity of information about the outcomes or products of educational systems’, i.e., measures of student achievement (Foshay, Thorndike, Hotyat, Pidgeon, & Walker, 1962, p. 5). In 1959, the International Association for the Evaluation of Educational Achievement (IEA) was organized with the assistance of UNESCO towards the goal of examining student ‘achievement against a wide background of school, home, student and societal factors in order to use the world as an educational laboratory so as to instruct policy makers at all levels about alternatives in educational organization and practice’ (Travers & Westbury, 1989, p. v). One of those school factors, the very heart of schooling, was students’ opportunities to learn especially their learning opportunities that were relevant to whatever assessment was used in the study. The IEA broke with the comparative education tradition not only by using a quantitative student assessment and quantitative measures of their family’s social, economic, and cultural background (SES), but also for the first time to create some indication of the curriculum students had studied. A brief overview of the table of contents for a popular book on comparative education prior to the pilot study project that led the founding of the IEA had chapters that addressed four major factors influencing education: natural factors such as language and race; broad secular factors reflecting an education system’s

379

‘zeitgeist’, e.g., socialism or humanism; religious factors; and a section that situates national education systems in their larger political-social-government regulatory contexts (Hans, 1949). However, this standard comparative education textbook offered no discussion of curricular structure or content. The model of potential educational experiences, in Figure 21.2, provides a conceptual map of how all the contextual and instructional, distal and proximal factors may affect the black box of classroom learning and student achievement. This model was developed from all that had been learned from the IEA studies conducted before the 1995 Third International Mathematics and Science Study (TIMSS). In particular, factors in the system characteristics and student characteristics windows identify many of the factors associated with the concept of SES. An expansion of the tripartite curriculum model articulated for the Second International Mathematics Study (SIMS), this model underscores the curriculum (OTL) focus not only through the windows labeled intended, implemented, and attained, but also through the research questions that are included as part of the model (Schmidt & Cogan, 1996; Schmidt et al., 1996). As the Walker quote at the beginning of this chapter indicates, the initial effort to peer inside the black box in the pilot project was to have teachers provide a check on the validity of the assessment used with their students (Walker, 1962). Some version of this validity determination that compares assessment items with the curriculum students have experienced has been carried out in one form or another in virtually all IEA studies. Teachers responded to the test items in the pilot study and in the First International Mathematics Study (FIMS) and in the second studies, SIMS and SISS (Husén, 1967a; Keeves, 1974; Travers & Westbury, 1989). With the advent of the 1995 TIMSS, this task was completed by a mathematics expert in each country and has become known as the Test-Curriculum Match exercise (Beaton,

380

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Figure 21.2  Model of potential educational experiences

Mullis, Martin, Gonzalez, Kelly, & Smith, 1996; Mullis, Martin, Foy, & Hooper, 2016). The interest in OTL, however, was broader than merely establishing the validity of the student assessment. From the pilot study, many of the questions that researchers wanted to explore had to do with what was going on in classrooms (Foshay et al., 1962; Husén & Postlethwaite, 1996; Keeves, 2011). This interest in the curriculum was one of the main foci of SIMS, which is evident by the title of the first report volume: The IEA Study of Mathematics I: Analysis of Mathematics Curricula (Travers & Westbury, 1989). This curricular interest together with continued development of new approaches to measuring classroom OTL reached an apex in the 1995 TIMSS which conducted quantitative surveys of the intended curricula (content standards and textbooks) and the implemented curricula (what teachers taught), all of which was coordinated by the TIMSS Mathematics and Science Frameworks. The frameworks provided a common language system that allowed all aspects of the study to be compared and related to each other in a common metric (Schmidt et  al., 1996; Schmidt, McKnight et  al., 1997; Schmidt, Raizen et  al., 1997; Schmidt et  al., 2001). A brief

overview of how OTL has been assessed in a few IEA studies is presented in Table 21.3. From FIMS to SIMS, SISS, and TIMSS, researchers were exploring ways to assess how much emphasis or time teachers had devoted to teaching the mathematics or science deemed relevant to the student assessment (Schwille, 2011; Schmidt, Houang, Cogan, & Solorio, 2018). Conceptually, they were working from Carroll’s 1963 model of school learning (see Figure 21.3). Both John Carroll and Benjamin Bloom had been involved in the early planning for what became the pilot study. Both were interested in student learning and came to focus on the essential aspect of time, the idea that no student learns something apart from having spent time learning it. Carroll’s research was focused on second language learning (Carroll, 1984). Bloom built on Carroll’s model to create his mastery learning model (Bloom, 1968, 1974). Psychologists had long recognized time as an essential component of learning. William James, for example, noted that a student’s attention to learning could only be maintained for a certain period of time and that for a student to learn, one had to repeatedly focus volitional attention (James, 1899). Relatedly, one of E. L.

TWO PERSPECTIVES ON THE ‘BLACK BOX’ OF SCHOOL LEARNING

381

Table 21.3  Psychological perspectives on opportunity to learn (OTL) Project reference

Research project source

Walker, 1962

Teacher Questionnaire

Opportunity-related constructs

Home and school opportunities to learn ‘the knowledge and skills required’ by each item on the test Husén, 1967a First International Mathematics Study (FIMS), Rated students’ exposure to each test item twice: National Mathematics Expert Panel via assessments and classroom instruction Husén, 1967b First International Mathematics Study (FIMS) Indicated for each test item whether all, some, or Teacher Questionnaire few students had ‘an opportunity to learn this type of problem’ Schmidt & Cogan, Third International Mathematics and Science Indicated the number of instructional periods 1996 Study (TIMSS), Teacher Questionnaire teachers taught each topic set. These spanned all the grade 1–12 topics identified in the TIMSS Frameworks Beaton, Mullis, Martin, Third International Mathematics and Science Indicated for each test item whether it was in the Gonzales, Kelly, & Study (TIMSS), National Research curriculum for the grade being tested Smith, 1996 Coordinator Questionnaire

Figure 21.3  Carroll’s model of school learning (Carroll, 1963)

Thorndike’s laws of learning noted that ‘the degree of strengthening of connection will depend upon the vigor and duration as well as the frequency of its making’ (Thorndike, 1914, p. 70). In other words, making connections between new information with what one already knows constitutes learning and this requires time. Although some became distracted with ever more complex measurements of learning time, the essential nature of this time is the content in focus; something that seems evident in the writings of the early IEA researchers, Carroll and Bloom (Berliner, 1990; Schmidt, Houang, Cogan, & Solorio, 2018). The major innovation in the measurement of OTL that emerged in the 1995 TIMSS was not a more precise measurement of time but a broader focus on what was taught in the classroom. More specifically, teachers were asked to indicate how many lessons they had

taught their class from a list of mathematics (or science) topics. The list of topics was not limited to those covered by the TIMSS student assessment but encompassed the whole scope of the relevant TIMSS content framework, which had been crafted to represent all the relevant topics taught in K-12 schools from an international perspective. In this way, the time metric teachers reported revealed that they spent much time teaching topics that were not represented on the TIMSS assessment nor were they necessarily represented in their country’s standards for the grade they taught. Of course, this varied greatly from country to country, but this variation was strongly related to their students’ performance on the TIMSS assessments (Schmidt, McKnight, Cogan, Jakwerth, & Houang, 1999; Schmidt et al., 2001). This last observation suggests a third role OTL can have in international research,

382

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

a role that parallels one for SES. Both can be used as a statistical control in analyses that are intended to explore the relationship between two or more other factors with student achievement. In summary, OTL also can play three different analytics roles in international studies: (1) as an indicator of validity for the content of the student assessment; (2) as a focus of measurement and analysis in its own right; and (3) as a statistical control in analyses intended to explore the relationship among other related factors to student achievement. One of the challenges, however, to including OTL in analyses of student achievement is that the OTL data have most often been supplied by a teacher. Although the Carroll model is referred to as a model of school learning, its focus is on the individual student; that is, the constructs are all phrased in terms of what affects the degree of learning for an individual student. Yet in a school setting, although all students in a single classroom are exposed to the same learning opportunities, their learning is not uniform; something which would most likely to be evident on any given assessment. As one might imagine, asking one teacher to provide ample information on the learning opportunities experienced by the students in the teacher’s classroom that would explain differences in these students’ achievement is rather challenging. It is a challenge from both a conceptual and from a practical, measurement viewpoint. Conceptually, how can one teacher explain how what has been taught yields different results among students? Practically, how does the researcher construct an instrument for a teacher to complete – and most especially, an instrument that doesn’t require hours upon hours to complete – that will capture the OTL provided to students that will explain their differential achievement? Nonetheless, IEA studies from the beginning have explored multiple ways of eliciting this information from teachers. The goal is always to collect information that, although it can’t explain individual differences within the

classroom, when aggregated, the differences between teachers – particularly those differences between teachers from one country to another – would provide insight into differences in student achievement. Table 21.4 provides a brief window on the relationship between OTL and mathematics achievement as these have been measured in TIMSS 1995, TIMSS 2011, and PISA 2012. These correlations were observed across classrooms as this was the level at which the OTL was measured in the TIMSS studies. The PISA OTL was measured at the student level yet the correlations in Table 21.4 were constructed at the school level to be comparable to the TIMSS studies (classrooms are not sampling units in PISA). The correlations in Table 21.4 demonstrate the strength of the simple bivariate relationship between OTL and mathematics achievement, particularly when this has been measured at the student level, as it was in PISA 2012. One of the challenges in considering the two perspectives on the black box is that student background (SES) has often been measured with far greater precision than measures that reflect learning opportunities (Schmidt, Houang, Cogan, & Solorio, 2018). Consequently, the within-country corrections in Table 21.4 may not appear as large as one might expect. Nonetheless, these correlations do not capture the full story. For example, the bivariate correlation across classrooms in Japan was not significant in the 1995 TIMSS, yet Japan had one of the most coherent curricular policies among participating countries. Structural models relating various aspects of the curriculum, that is, teacher instruction time, textbook topic emphasis, and content standards emphasis, were all significantly related to student achievement gain in Japan. Further, path modeling revealed significant paths from content standards to both textbook emphasis and teacher instruction time and teacher instruction time was significantly related to student achievement gain (see Schmidt et al., 2001, Figure 8.4 on page 283). The difference in the relationship

383

TWO PERSPECTIVES ON THE ‘BLACK BOX’ OF SCHOOL LEARNING

Table 21.4  Classroom/school-level correlations of OTL with mathematics performance and OTL with SES in three international mathematics studies OTL with Math Achievement Country Australia Hong Kong Hungary Israel Japan Korea Lithuania New Zealand Romania Russian Federation Singapore Slovenia Sweden Thailand USA

TIMSS ‘95 0.47 ns ns ns ns ns ns 0.22 ns ns 0.59 ns 0.28 −0.28 0.45

OTL with SES

TIMSS 2011

PISA 2012

TIMSS ‘95

TIMSS 2011

ns ns ns ns ns ns ns 0.35 ns 0.14 0.16 ns 0.16 ns 0.23

0.71 ns 0.87 0.69 0.83 0.92 0.68 0.71 0.60 0.33 0.85 0.84 0.23 0.78 0.66

0.39 ns ns ns

ns ns ns ns ns ns ns 0.32 ns 0.14 0.16 ns 0.16 ns 0.13

of OTL to student gain rather than student performance has been noted by others (e.g., Suter, 2017).

THE CHALLENGE: THE CUMULATIVE NATURE OF OTL Planning and preparation for conducting research involves not only identifying the factors one wants to measure and creating the instruments that will be used to measure those factors, it also entails a detailed scheme by which all the measurements will be gathered. In education research, this includes precisely identifying which population of students one wants to survey. Although such a definition may be easily stated, for example, eighth grade mathematics students, unless all such students are included in the survey, plans to sample the survey population can become quite complex. As is often the case in any complex enterprise, general outlines may be easily stated but the ‘devil is in the details.’ How a study is planned and

ns ns 0.28 ns ns 0.48 ns 0.33 −0.39 0.40

PISA 2012 0.54 −0.20 0.80 0.51 0.75 0.79 0.47 0.58 0.60 0.43 0.76 0.75 0.34 0.66 0.53

carried out affects the interpretation of the results. Consequently, the design of a study is quite important. For each of the IEA mathematics and science studies researchers spent some time making decisions about which students were of interest, how they would be sampled, and how teachers would also be involved in responding to questionnaires. Leading up to the 1995 TIMSS, Wiley and Wolfe (1992) published a brief overview of four possible study designs, explicating the implications for what could be learned from each one. They identified the major design issue for the anticipated study was to decide whether it would examine ‘learning’ or ‘knowledge, e.g., cumulative learning’. This distinction reflects what we know about human learning. Human learning is cumulative in nature: we always learn something new on the foundation of what has been learned previously. As we apprehend new information it is incorporated into our networked body of knowledge. The design of a study of student achievement entails the decision to focus either on

384

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

what has been learned within a specified period of time, such as during the eighth grade, or focusing on what 15-year old students know and are able to do, such as is done in PISA. The problem, however, is that most international studies use a cross-sectional design that doesn’t lend itself to measuring student learning during a specified grade but only the knowledge they have accumulated up to the time point of the assessment. As Wiley and Wolfe noted, ‘Implicit in a decision to do a cross-sectional study is a focus on the cumulative learning of children’ (1992, p. 297). Only a longitudinal design holds the capacity to truly measure what students have learned during any given grade/ time span. A longitudinal design requires that the same set of students be assessed at two time points, one at the beginning of the grade/ instructional time period and another at the end of the instructional time period of interest. The two assessments need not be identical but the one administered first needs to provide an indication of the extent to which students may already know what they are expected to learn during the instructional period. This design is rarely used as it may well double the assessment costs and requires careful tracking of students from one time point to another. SIMS defined an international longitudinal option but only nine of the 21 countries implemented this design (Burstein, 1993). More recently, the Russian Federation took advantage of the one-year separation between the 2011 TIMSS and the 2012 PISA assessments to conduct a longitudinal sub-study with some intriguing results (Carnoy, Khavenson, Loyalka, Schmidt, & Zakharov, 2016; Schmidt, Houang, Cogan, & Solorio, 2018). For large-scale assessments, another design option Wiley and Wolfe defined was the use of a multi-grade cross-sectional survey that can approximate the outcomes of a longitudinal design only for one cohort of students, not for individual students. In this design, the student population in the grade just prior to the student grade of focal interest

can be assessed to serve as a cohort-level pretest for focal grade students. This requires that suitable prior grade students be included at each of the sampling levels in order to appropriately model growth or learning for focal grade students. This cohort longitudinal design was used in the 1995 TIMSS but has not been used in any international study since. Consequently, for the 1995 TIMSS we were able to model achievement gain rather than mere achievement performance. This distinction explains the difference between the nonsignificant bivariate correlation observed for Japan in Table 21.4 and the significant structural and path models predicting achievement gain noted above for Japan. More germane to the focus of this chapter, however, was the assertion by Wiley and Wolfe that not only is learning cumulative, but so too is OTL. Again, this does not seem to be overly surprising as it is consistent with what we have learned about how people learn, e.g., learning experiences are most effective when sequenced in a way that leads one who knows little about something through a reasonable and sensible progression into the greater detail and complexity of the subject matter, discipline, or area to be learned. Yet this insight has not often made it into the instrumentation for measuring student OTL. As Wiley and Wolfe note, ‘learning opportunities must be measured in those grades in which these experiences have contributed to cumulative achievement’ (1992, p. 300). As mentioned previously, one of the innovations the 1995 TIMSS made in OTL measurement was not primarily in the time metric but in the range in the specification of the OTL categories to which teachers responded. Because teachers responded to a set of topics that spanned the whole TIMSS framework, they were indicating the extent to which they had taught topics that ranged from beginning elementary topics appropriate to the very early grades to topics that, at least according to the curriculum defined in their country’s standards, were not expected to be taught until one or more years later.

TWO PERSPECTIVES ON THE ‘BLACK BOX’ OF SCHOOL LEARNING

This provided a sense of the cumulative nature of the OTL students had experienced. Virtually all teachers in each country taught the topics intended for that grade and the topics that were the focus of the TIMSS assessment. However, the instructional context was quite different not only from one country to another, but for most countries, this was true for teachers within the country. For example, in some classrooms the assessed topics were just being introduced to students. The amount of time on these topics could well vary, but teachers were also spending time teaching topics that were to be covered earlier. In other classrooms, the assessed topics were covered but the majority of the instruction had progressed to focus on more advanced concepts. We can imagine three distinctly different OTL experiences for the assessed topics: one in which these were being introduced to students for the first time; another in which students were spending the majority of their time in studying these topics; and yet another in which instruction on the assessed topics had mostly concluded but had not moved on. In all three scenarios, students had OTL for the assessed topics, yet we would not likely expect all to exhibit the same performance on the assessment. The breadth of the OTL included in the 1995 TIMSS provided a nuanced measurement that was likely to capture the difference between the three scenarios envisioned and it has demonstrated significant relationship to student performance on the assessment (Schmidt et  al., 2001). Unfortunately, the OTL in subsequent TIMSS has been more limited. It has been limited in terms of the time dimension, indicating only whether a topic has been a focus of instruction in prior years, current year, or a subsequent year. Yet, more importantly, the cumulative nature of OTL has not been acknowledged as the OTL measure has been limited to the topics relevant to the assessment. For the first time, PISA 2012 included several items on the student background questionnaire assessment that asked students

385

about their learning opportunities with different types of mathematics problems and a range of mathematics concepts. The list of concepts ranged from one that students would most likely have encountered in early primary schooling to concepts students may well not have yet encountered unless they were enrolled in some of the most advanced secondary school mathematics (Cogan & Schmidt, 2015). In this way, the OTL measure that was developed reflected each student’s cumulative exposure to mathematical concepts spanning the breadth of mathematics instruction from primary through secondary school. This measure has proven very fruitful in predicting student performance on the various PISA measures of mathematical literacy (Schmidt, Zoido, & Cogan, 2014; Cogan, Schmidt, & Guo, 2018). In PISA 2012, we have all three factors identified in Figure 21.1 measured at the student level and, notable in our discussion here, both OTL and mathematics literacy were measured from a cumulative perspective. Thus, we were able to examine the relationship among student background (SES), student OTL, and their mathematics literacy. Since the sampling of students in PISA is clustered in schools, the appropriate analysis makes use of multi-level linear regressions at each sampled level: student, school, and country. Results of this analysis across all participating countries/economies are summarized in Table 21.5. The first model examines the relationship of student background (SES) to mathematics literacy; model two examines the relationship of student OTL to mathematics literacy; model three includes both student background and OTL to predict mathematics literacy. Results from the three models revealed that ‘[t]he inclusion of both variables into a single model reduced the size of the student-level SES coefficient by 32%, but the positive coefficient for the students-level OTL variable was essentially the same being reduced by only 5%’ (Schmidt, Burroughs, Zoido, & Houang, 2015, p. 374).

386

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Table 21.5  Three three-level models predicting PISA mathematics literacy Variable Intercept Student SES Student OTL School SES School OTL Country SES Country OTL

SES Alone

OTL Alone

SES and OTL

487* 15*

390*

393* 10* 42* 39* 84* 47* 54*

44* 65* 122* 45* 48*

*p < 0.05

Failure to recognize the cumulative nature of both achievement and OTL raises the issue of interpretive validity. The cross-sectional design of all current international assessments (as well as most national assessment programs) do not yield measures of learning within a given grade yet this is often how they are portrayed in the media and considered by others. Messick (1989), in his treatise on validity, indicates that validity isn’t something that is inherent to the assessment but rather is a property of inferences and actions based on it. In so far as many of the interpretations about an assessment are made with reference to the grade at which the test was given, this is a threat to the assessment’s validity. Relational analyses that relate only a given year’s OTL to an assessment of cumulative learning – such as occurs in all crosssectional designs – are not likely to yield valid conclusions. Cumulative learning must be examined in relation to cumulative OTL.

CONCLUSION The challenge to interpreting international assessments is that, to the extent that the three factors in Figure 21.1 have been included in the study, all three are cumulative in nature. However, the media treatment of such results, for example the most recent TIMSS, miss this nuance. Too often the attention of journalists, policy-makers, and

others is entirely focused on the students and grades represented in the assessments. This is also problematic for the research community in so far as the interrelationship of OTL and SES is not recognized. Using only one of these factors without the other yields an inaccurate indication of its effect. This is of particular concern in those studies that do not have a suitable comprehensive and cumulative measure of student OTL. In the resulting analyses, this limited OTL measure is then related to SES and students’ achievement, which are both cumulative in nature. Such a restricted OTL measure necessarily yields an underestimate of the effect of OTL to student performance. On the other hand, without any measure of OTL the relationship of students’ background to their achievement is overestimated. The concepts of educational opportunity represented by measures of SES and students’ OTL are not simply two different perspectives on the black box but are profoundly interconnected. Both are required in international studies to appropriately make sense of assessments of what students have acquired in the black box of schooling.

REFERENCES Beaton, A. E., Mullis, I., Martin, M. O., Gonzalez, E. J., Kelly, D. L., & Smith, T. A. (1996). Mathematics achievement in the middle school years: IEA’s third international mathematics and science study. Chestnut Hill, MA: Center for the Study of Testing, Evaluation, and Educational Policy, Boston College. Berliner, D. C. (1990). What’s all the fuss about instructional time? In M. Ben-Perez & R. Bromme (Eds.), The nature of time in school: Theoretical concepts, practitioner perceptions (pp. 3–35). New York: Teachers College Press. Bloom, B. S. (1968). Learning for mastery. Instruction and curriculum: Regional education laboratory for the Carolinas and Virginia. Topical Papers and Reprints, No. 1. Evaluation Comment, 1(2). Retrieved from http://eric.ed.gov/?id=ED053419

TWO PERSPECTIVES ON THE ‘BLACK BOX’ OF SCHOOL LEARNING

Bloom, B. S. (1974). Time and learning. American Psychologist, 29(9), 682–688. Buchmann, C. (2002). Measuring family background in international studies of education: Conceptual issues and methodological challenges. In A. C. Porter & A. Gamoran (Eds.), Methodological advances in cross-national surveys of educational achievement (pp. 150–197). Washington, DC: National Academies Press. Burstein, L. (Ed.). (1993). The IEA study of mathematics III: Student growth and classroom processes (Vol. 3). Oxford: Pergamon Press. Carnoy, M., Khavenson, T., Loyalka, P., Schmidt, W. H., & Zakharov, A. (2016). Revisiting the relationship between international assessment outcomes and educational production: Evidence from a longitudinal PISA-TIMSS sample. American Educational Research Journal, 53(4), 1054–1085. doi.org/10.3102/ 0002831216653180 Carroll, J. B. (1963). A model of school learning. Teachers College Record, 64(8), 723–733. Carroll, J. B. (1984). The model of school learning: Progress of an idea. In L. W. Anderson (Ed.), Time and school learning: Theory, research and practice (pp. 15–45). London: Cambridge University Press Archive. Chudgar, A., & Luschei, T. F. (2009). National income, income inequality, and the importance of schools: A hierarchical cross-national comparison. American Educational Research Journal, 46(3), 626–658. doi: 10.3102/0002831209340043 Cogan, L. S., & Schmidt, W. H. (2015). The concept of Opportunity to Learn (OTL) in international comparisons of education. In K. Stacey & R. Turner (Eds.), Assessing mathematical literacy: The PISA experience (pp. 207–216). New York: Springer. Cogan, L. S., Schmidt, W. H., & Guo, S. (2018). The Role That Mathematics Plays in Collegeand Career-Readiness: evidence from PISA Journal of Curriculum Studies. doi: 10.1080/ 00220272.2018.1533998. Cogan, L. S., Schmidt, W. H., & Wiley, D. E. (2001). Who takes what math and in which track? Using TIMSS to characterize U.S. students’ eighth-grade mathematics learning opportunities. Educational Evaluation and Policy Analysis, 23(4), 323–341.

387

Coleman, J. (1968). The concept of equality of educational opportunity. Harvard Educational Review, 38(1), 7–22. Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood, A. M., Weinfeld, F. D., & York, R. L. (1966). Equality of educational opportunity. Washington, DC: National Center for Educational Statistics. Foshay, A. W., Thorndike, R. L., Hotyat, F., Pidgeon, D. A., & Walker, D. A. (1962). Educational achievements of thirteen-year olds in twelve countries: Results of an international research project, 1959–61. Hamburg: UNESCO Institute for Education. Hans, N. A. (1949). Comparative education: A study of educational factors and traditions. London: Routledge & K. Paul. Heyneman, S. P., & Loxley, W. A. (1983). The effect of primary-school quality on academic achievement across twenty-nine high- and low-income countries. American Journal of Sociology, 88(6), 1162–1194. Husén, T. (Ed.) (1967a). International study of achievement in mathematics: A comparison of twelve countries (Vol. I). New York: Wiley. Husén, T. (Ed.) (1967b). International study of achievement in mathematics: A comparison of twelve countries (Vol. II). New York: Wiley. Husén, T., & Postlethwaite, N. (1996). A brief history of the International Association for the Evaluation of Educational Achievement (IEA). Assessment in Education: Principles, Policy & Practice, 3(2), 129–141. http://doi. org/10.1080/0969594960030202 James, W. (1899). Talks to teachers on psychology: And to students on some of life’s ideals. New York: H. Holt and Co. Keeves, J. P. (1974). The IEA Science Project: Science achievement in three countries – Australia, the Federal Republic of Germany and the United States. In M. Bruderlin (Ed.), Implementation of curricula in science education (pp. 158–178). Cologne: German Commission for UNESCO. Keeves, J. P. (2011). IEA – From the beginning in 1958 to 1990. In C. Papanastasiou, T. Plomp, & E. C. Papanastasiou (Eds.), IEA 1958–2008: 50 years of experiences and memories (Vol. 1). Amsterdam: International Association for the Evaluation of Educational Achievement.

388

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13– 103). New York: Macmillan. Mullis, I. V. S., Martin, M. O., Foy, P., & Hooper, M. (2016). TIMSS 2015: International results in mathematics and science achievement, curriculum, and instruction (pp. 380). Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College, and International Association for the Evaluation of Educational Achievement (IEA). Schmidt, W. H., Burroughs, N. A., Zoido, P., & Houang, R. T. (2015). The role of schooling in perpetuating educational inequality: An international perspective. Educational Researcher, 44(7), 371–386. doi: 10.3102/0013189X15603982 Schmidt, W. H., & Cogan, L. S. (1996). Development of the TIMSS context questionnaires. In M. O. Martin & D. L. Kelly (Eds.), Third international mathematics and science study technical report (Vol. I: Design and Development, pp. 5-1–5-22). Chestnut Hill, MA: Boston College. Schmidt, W. H., Cogan, L. S., Houang, R. T., & McKnight, C. C. (2011). Content coverage differences across districts/states: A persisting challenge for U.S. education policy. American Journal of Education, 117(3), 399–427. Schmidt, W. H., Houang, R. T., Cogan, L. S., & Solorio, M. L. (2018). Schooling across the globe: What we have learned from 60 years of mathematics and science international assessments. Cambridge, UK: Cambridge University Press. Schmidt, W. H., Jorde, D., Cogan, L. S., Barrier, E., Gonzalo, I., Moser, U., Wolfe, R. G. (1996). Characterizing pedagogical flow: An investigation of mathematics and science teaching in six countries. London, Dordrecht, & Boston, MA: Kluwer. Schmidt, W. H., & McKnight, C. C. (1995). Surveying educational opportunity in mathematics and science: An international perspective. Educational Evaluation and Policy Analysis, 17(3), 337–353. Schmidt, W. H., McKnight, C., Cogan, L. S., Jakwerth, P. M., & Houang, R. T. (1999). Facing the consequences: Using TIMSS for a closer look at U.S. mathematics and science

education. London, Dordrecht, & Boston, MA: Kluwer. Schmidt, W. H., McKnight, C., Houang, R. T., Wang, H. A., Wiley, D. E., Cogan, L. S., & Wolfe, R. G. (2001). Why schools matter: A cross-national comparison of curriculum and learning. San Francisco, CA: Jossey-Bass. Schmidt, W. H., McKnight, C., Valverde, G. A., Houang, R. T., & Wiley, D. E. (1997). Many visions, many aims. Vol. I: A cross-national investigation of curricular intentions in school mathematics. London, Dordrecht, & Boston, MA: Kluwer. Schmidt, W. H., Raizen, S. A., Britton, E. D., Bianchi, L. J., & Wolfe, R. G. (1997). Many visions, many aims. Vol. II: A cross-national investigation of curricular intentions in school science. London, Dordrecht, & Boston, MA: Kluwer. Schmidt, W. H., Zoido, P., & Cogan, L. S. (2014). Schooling matters: Opportunity to learn in PISA 2012. OECD Education Working Papers, No. 95. Retrieved from http:// dx.doi.org/10.1787/5k3v0hldmchl-en Schwille, J. (2011). Experiencing innovation and capacity building in IEA research, 1963– 2008. In C. Papanastasiou, T. Plomp, & E. C. Papanastasiou (Eds.), IEA 1958–2008: 50 years of experiences and memories (Vol. II, pp. 627–707). Nicosia, Cyprus: Cultural Center of the Kykkos Monastery. Schwille, J., Porter, A., Belli, G., Floden, R., Freeman, D., Knappen, L., Kuhs, T., & Schmidt, W. (1983). Teachers as policy brokers in the content of elementary school mathematics. In L. S. Shulman & G. Sykes (Eds.), Handbook of teaching and policy (pp. 370–391). New York: Longman. Suter, L. E. (2017). How international studies contributed to educational theory and methods through measurement of opportunity to learn mathematics. Research in Comparative and International Education, 12(2), 174–197. Thorndike, E. L. (1914). Education, a first book. New York: The Macmillan Company. Travers, K. J., & Westbury, I. (Eds.) (1989). The IEA study of mathematics I: Analysis of mathematics curricula (Vol. 1). Oxford: Pergamon Press. UNESCO (2012). International Standard Classification of Education (ISCED–2011). Montreal, Quebec: UNESCO Institute for Statistics.

TWO PERSPECTIVES ON THE ‘BLACK BOX’ OF SCHOOL LEARNING

Walker, D. A. (1962). An analysis of the reactions of Scottish teachers and pupils to items in the geography, mathematics and science tests. In A. W. Foshay, R. L. Thorndike, F. Hotyat, D. A. Pidgeon, & D. A. Walker (Eds.), Educational achievements of thirteenyear-olds in twelve countries (pp. 63–68). Hamburg: UNESCO Institute for Education.

389

Wiley, D. E., & Wolfe, R. G. (1992). Major survey design issues for the IEA Third International Mathematics and Science Study. Prospects, 22(3), 297–304. doi:10.1007/ BF02195952

22 What Can International Comparative Tests Tell Us about the Future Supply of Highly Skilled STEM Workers? Larry E. Suter and Emma Smith

INTRODUCTION Natural scientists construct meaning from observations of the natural world, mathematicians construct meaning through logical argument, and developments in engineering and technology are constructed entirely by human invention. Together, these fields have been labeled ‘STEM’ for Science, Technology, Engineering and Mathematics. As a nation’s economic prosperity may be partly dependent on the continual creation of new technical products and mechanisms; economic growth is no longer just linked to a country’s physical and natural resources but is instead driven by its human capital. In the field of science, technology, engineering and mathematics the strength of this human capital is directly linked to the size and quality of the workforce (Teitelbaum, 2014).

All young humans must learn about advances in scientific discoveries and technological inventions through attending formal and informal educational institutions rather than from experiences of daily living. The economic imperative that a strong national education system equates to a strong national economy has particular implications for how science (or STEM) is taught to these young people. This has resulted in the frequent conflation of science policy with science education policy and also in long-established concerns over the quality of school science teaching, the content of the science curriculum and the perceived inadequacy of science graduates. Thus, any evidence of a growing weakness in the STEM educational skills among the student population is regularly perceived to be a threat to national prosperity (e.g. National Commission on Excellence in Education, 1983).

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

Over the last 50 years, large-scale international comparative studies have stoked political concern about the quality of public education and, unsurprisingly given the above, the quality of public science education. But just as significantly, the information that international comparative studies can provide us about student learning allows a much closer and more critical examination of student performance and attitudes towards science across a wide range of countries and educational contexts. The aim of this chapter is to achieve greater clarity about what is known from comparative research about the relationship between country characteristics and student career choices in the STEM disciplines. The chapter will focus especially on large-scale survey analyses and will address the following research questions: • Do 15-year-old students who plan a career in science, mathematics or technology have high levels of achievement and expressions of strong interest in science in every country? • Are students who have high achievement and a high interest in science more likely to choose a science career in all countries? • Does a country’s geographical characteristics affect the aspirations of 15-year olds for a scientific career? • Does a country’s economic characteristics affect the aspirations of 15-year olds for a scientific career? • Are student non-cognitive factors a large factor in aspirations for a scientific career?

We will begin by reviewing research about macro-economic and social forces in the current global economy. Second, we will briefly consider the findings from social psychological studies of student learning and decision making, as well as from research into the educational practices that influence student career choice. We then conduct a re-analysis of a large-scale international comparative survey of science achievement to explore the relationship between student attainment and aspiration for a scientific career with a range

391

of country-level factors. Finally, we will discuss the policy implications of conducting cross-national studies of the science workforce. Throughout this chapter we focus attention on describing the variety of ­country-level characteristics that are related to individual student decision making. This kind of analysis will inform educational policies and theories that are increasingly made and implemented at the national and international levels. Generalizations that assume that one size fits all conditions can be better constructed by better descriptions of ­country-to-country distributions of student knowledge and decision making.

WHAT IS STEM? First, however, it is worth pausing to consider what we mean by STEM. The acronym STEM was created at the US National Science Foundation (NSF) in 2001 to communicate the educational fields of study that prepare students for research and scientific discovery (Patton, 2013). Before 2001, the term ‘SMET’ (Science, Mathematics, Engineering and Technology) was used (Patton, 2013), but readers may also have come across SET (Science, Engineering and Technology), STEMM (Science, Technology, Engineering, Mathematics and Medicine) or even STEAM (Science, Technology, Engineering, Arts, Mathematics) education. The more pronounceable acronym ‘STEM’ has become a ubiquitous name for science and technology programs now found in documents and policy statements by government, international organizations and in academic research around the world. How­ ever, the simplification of the reference to this group of disciplines may be leading researchers and policy makers to exaggerate the function of educational systems in fixing student behavior towards specific fields of knowledge. For example, the short acronym omits an important field that depends on

392

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

student preparation in sciences, namely the health sciences which provide public services as well as conduct basic research. One goal of this chapter is to clarify the concept of student orientation to STEM by evaluating the empirical relationships of science education and student orientation to STEM learning in a large number of countries. Comparing student motivation and attainment from country to country should guide educational research towards deeper understanding of general or specific theories of student career choice. The necessity of separating the fields of ‘technology’ from the natural sciences in ‘STEM’ to better predict student choices also will be examined. In the next section we review recent research into national differences in economic performance and student orientation towards science and scientific careers.

GLOBAL POLITICAL CONCERNS ABOUT STEM EDUCATION AND STEM RECRUITMENT The role of science (or STEM) within a nation’s education system has long been the focus of much discussion and debate. Whether the purpose of school science education is to provide scientific training in preparation for university and a scientific career, or whether it is to educate a scientifically literate population has been contested ever since the subject was first taught in schools (e.g. Taunton Committee, 1868; The Royal Society, 2008; Jenkins & Donnelly, 2001). This dual function of school science reveals a tension between scientific literacy as a prerequisite for active citizenship in a modern society and a view of scientific knowledge as a tool for economic growth, prosperity and security. In forming education policy, it has usually been the latter purpose that has prevailed. The literature in this area is vast but has kept to the same basic reasoning for decades: that there is a

shortage of highly skilled science and engineering graduates, arising in part because of poor quality science teaching in schools, which in turn puts students off scientific careers, and that these shortages are detrimental to a nation’s technological and economic development. Concerns about the supply of highly skilled STEM workers have been central to public policy on education, science and engineering in many industrialized countries for decades. In the UK and the USA, for example, the apparent poor quality of school science education along with insufficient numbers of well-qualified teachers have been linked to skills shortages by government and other agencies since at least the time of the Second World War (e.g. Bush, 1945; Cmd. 6824, 1946; Steelman, 1948; Smith, 2017). In the United States, the two decades following the end of the Second World War were characterized by a large volume of literature from government, industry and academic sources, on the supply and demand of highly skilled scientists and engineers (National Science Board, 2000; Godin, 2002). The context for these concerns was provided by the launch of Sputnik by the Russians in 1957, the ensuing ‘space race’ and the start of the Cold War. And was reflected in rising government investment in science and technology: for example, US expenditure on research and development across all federal departments increased more than ten-fold in the space of a decade, from $73.4 million in 1940 to $839.6 million in 1950 (National Science Foundation, 1951), with funding for education more than tripling over this period (National Science Board, 2000). The aftermath of Sputnik and the start of the Cold War had a huge impact on education and science policy in the USA. It succeeded in convincing the public and policy makers alike that the scientific and military challenges posed by the USSR could only be addressed by more effective scientific education and training (Dow, 1991). Such concerns have continued with varying degrees

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

of urgency over the following decades. For example, the election of President Ronald Regan in 1981 coincided with a renewed focus on the state of US public education and an emphasis on reversing falling or stagnating test scores nationally as well as poor performance on international comparative tests. In 1983 – ‘the year of the reports’ – nearly 50 reports totaling more than 6,000 pages voiced concern over the troubled state of American education, an outpouring of criticism overshadowing even that of the 1950s (Dow, 1991: 243). Much more recently, reports such as Rising above the Gathering Storm (National Academies of Science, 2010) brought attention to both shortages in the supply of highly skilled STEM workers as well as to apparent shortcomings in public STEM education. Skills shortage concerns in the UK have followed a similar trajectory to those in the USA. Such concerns – frequently reiterated by employer organizations – have been reflected in the range and scope of initiatives, policies and reports that have been aimed at increasing young people’s participation in STEM subjects, particularly at post-­compulsory levels. To take just one example, in the UK, a single month in 2016 saw the publication of two major reports into the employability of STEM graduates: The Wakeham Review of STEM Degree Provision and Graduate Employability (Wakeham Review, 2016) and The Shadbolt Review of Computer Sciences Degree Accreditation and Graduate Employability (Shadbolt Review, 2016). Both closely follow the recent extensive Perkins Review of Engineering Skills (Business, Innovation and Skills, 2013) in emphasizing the economic imperative of a strong and globally competitive STEM sector while at the same time reiterating similar shortcomings in the supply and skills of the STEM workforce. While the above examples outline some of the concerns over the quality of public STEM education and the supply of well-qualified STEM workers in the USA and the UK, these

393

challenges have global significance and are a key theme of STEM education policies in many other industrialized nations and regions (e.g. Shah & Burke, 2003; Gago et  al., 2004). One consequence of global interest in national education performance has been the development of the international comparative tests of education performance that are the focus of this Handbook.

THE ECONOMY, HIGHLY SKILLED WORKERS AND STEM EDUCATION Our nation’s skills are not world class and we run the risk that this will undermine the UK’s longterm prosperity … without increased skills we would condemn ourselves to a lingering decline in competitiveness, diminishing economic growth and a bleaker future for all. Achieving world class skills is the key to achieving economic success and social justice in the new global economy. (The Leitch Review of Skills, 2006: 1 and 9)

The above extract from the influential British report The Leitch Review of Skills underlines the economic imperative that improving a nation’s skills will lead to increased productivity, employment, economic competitiveness and prosperity. The idea that ‘education can make us all richer’ (Wolf, 2008: 37) has been the primary driver behind national education policy in many industrialized nations for some time (Hanushek & Kimko, 2000; OECD, 2007, 2016a: 113; Hanushek & Woessmann, 2009; National Academies of Science, 2010). The argument from policy makers, and many economists, is that prosperous economies depend on human capital and that modern economies require learning throughout one’s lifetime; therefore, preparation for this lifetime of learning must begin early – this is a foundation assumption of the PISA survey design (OECD, 2007). The rationale behind this argument is clear: people who earn the most money tend to be those who hold the highest educational qualifications and so if

394

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

everyone were to be educated to the same high level, they would all earn a similarly large amount of money. This would in turn benefit the country and make it more economically competitive in global terms. This argument is not new, and ever since Adam Smith wrote The Wealth of Nations in 1776 economists have examined the relationship between measures of education and economic development (Mitch, 2005). Economists who analyze the role of education in economic development have been provided with tools for further analyses with the growth of large-scale international surveys of student achievement (Hanushek  & Woessman, 2008, 2015a). Consequently, a growing number of research papers have appeared in the past 20 years that explore the relationships between student achievement and economic development. Studies by economists have found that economic growth is higher in countries that have a population of high academic ability (Weede, 2004; Hanushek & Woessmann, 2008; Lynn & Vanhanen, 2012). The number of papers by economists in the past few years about the role of ‘intelligence’ shows how the availability of large-scale measurements of student and adult achievement levels have altered economic analysis and theory. The titles of the papers by economists illustrate how measures of student achievement have been integrated into the analysis of country-level economies (see for example, Hanushek  & Kimko, 2000; Weede  & Kamph, 2002; Schofer & Meyer, 2006; Gelade, 2008; Hanushek & Woessmann, 2008; Lynn & Meisenberg, 2010; Lynn & Vanhanen, 2012). Research into international differences in economic growth and education provision and engagement has benefited from the data provided by large-scale student assessments, such as PISA and TIMSS. These surveys have been invaluable in helping researchers identify the characteristics of education that are likely to explain variations in the rate of economic development in different

national contexts as well as to correct misunderstandings. For example, Hanushek and Woessmann (2015b) have applied prediction models to estimate how much an economy could increase if student scores rose. The increases they estimated were large and their model – based on analysis of over 30 countries – shows that an increase in average student performance of one standard deviation is associated with a 2% growth for the economy of each country (Hanushek & Woessmann, 2015b). Policy makers’ interest in the need for improved STEM skills and the research community’s interest in understanding the relationship between economic performance and educational attainment has, unsurprisingly, influenced the content and development of international comparative tests. For example, the survey instruments and analysis frameworks for the PISA surveys of science in 2006 and 2015 were designed to address some of the issues about the role of educational institutions and student characteristics that may affect science learning. Indeed, the 2015 PISA report states that policy analysists of science education in Australia, the European Union and the USA have ‘expressed concern about declines in enrolment and graduation rates for science-related fields or about perceived shortages in science graduates’ (OECD, 2016a: 113). Therefore, the OECD authors of the 2015 PISA survey findings place the international results of students seeking science careers in a context of global economic goals and individual student aspirations. Their carefully constructed review of research and analysis of the relationship between student achievement and decisions to enter a career in science illustrates some of the challenges in establishing the role of schooling in raising STEM participation: Although economic theory links the ­number of scientists and engineers to innovation and growth (e.g. Aghion & Howitt, 1992; Grossmann, 2007),

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

the existence of such a link at the country level has been difficult to prove empirically (Jones, 1995; Aghion & Howitt, 2006). Without this proof, one is left to conclude that this link depends on contextual factors, such as the ‘distance to the frontier’ (the relative level of economic development), or that the number of scientists and engineers is a poor measure of their quality, or perhaps that, in the absence of other policy responses, increasing the number of science and engineering graduates will do little to improve competitiveness and innovation. (OECD, 2016a: 114)

And conclude their observations with the following: Ultimately, in most countries, the argument for increasing the number of science graduates rests on the hope that this larger supply of human resources for science and technology will generate future economic growth, through new ideas and technologies that are yet to be invented, rather than on the anticipated and more predictable needs of the economy in the absence of structural changes (Bosworth et  al., 2013; Salzman, Kuehn and Lowell, 2013). (OECD, 2016a: 114)

Thus, the OECD analysis of large-scale survey studies of education, achievement and career aspirations has added new knowledge of national and individual relationships to economic change; but the analyses have not necessarily given greater certainty to knowing how changes in student knowledge of STEM is related to structural changes in economic development. These large-scale surveys also include measurement of individual proclivities and attitudes that are derived from socialpsychological theories of student motivation. In the next section we examine which theories and measurement methods of noncognitive attributes have been adopted in the PISA surveys and review findings about whether the development of student motivation measurements have improved knowledge of science career choices. Have these measures improved understanding of national differences in student STEM career choices?

395

WHAT MOTIVATES STUDENTS TOWARDS STEM EDUCATION AND CAREERS? The number of research studies about science motivation has grown extensively over time through contributions of researches throughout the world in fields of science education, vocational development, and social psychology (e.g. Super, 1957, 1963, 1964, 1980; Miller & Brown, 1992; Miller & Kimmel, 2012; Wigfield et  al., 2015; CareersNZ, 2017). Non-cognitive concepts and measures include attitudes towards school work, attitudes towards occupations, self-efficacy in science and mathematics, and identity with a career field. Individual attitudes are influenced during adolescence at both school and home. Thus, career decisions are affected during this period of changing environment and personal experiences. These studies by science education researchers have been spurred on by the observation that students lose interest in science and mathematics as they age from elementary to secondary educational (Wigfield & Eccles, 2000; Ainley, Hidi, & Berndorff, 2002; Wigfield & Eccles, 2002; Osborne, Simon, & Collins, 2003; Ainley, 2006, 2010; Osborne & Dillon, 2008; Ainley & Ainley, 2011). Other social psychological and comparative educational researchers have sought to clarify models of the processes of individual student career choices and learning habits (e.g. Betz, Hammond, & Multon, 2005; Betz, Borgen, & Harmon, 2006; Betz & Hackett, 2006; Betz & Rottinghaus, 2006; Lent et al., 2006; Betz, 2007; Wang, Eccles, & Kenny, 2013). Broad theories of motivation have been developed by researchers in social cognitive theory without a specific focus on science or mathematics (Bandura, 1997; Hidi & Harackiewicz, 2000; Hidi, & Renninger, 2006; Nagengast, 2011; Riegle-Crumb, Moore, & Ramos-Wada, 2011; Sikora & Pokropek, 2012; Archer et al., 2013; Nugent et  al., 2015). Two popular theories about

396

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

motivation that have been applied to science careers include the role of extrinsic and intrinsic motivation (Ryan & Deci, 2009) and expectancy-values (Wigfield & Eccles, 2000; Wigfield et al., 2006; Wigfield, Tonks, & Klauda, 2009; Wang & Degol, 2017). Theories of student interest have also influenced the development of measurements of motivation that appear in international surveys (Krapp, 2002; Lent et  al., 2006; Lent & Brown, 2006: Meisenberg & Lynn, 2011; Krapp & Prenzel, 2011). The concepts of achievement motivation, intrinsic and extrinsic motivation, and expectancy-value theory formed the basis ­ of the conceptual frameworks for the PISA student questionnaires in 2006 and 2015 – the years set aside for the study of science achievement (OECD, 2016b). These theories were chosen because they provide educational policy makers with concepts that might be useful in designing and measuring the efficacy of science education programs (see the model in Wigfield et al., 2006). The 2006 and 2015 PISA surveys used different definitions of student affective attributes and dispositions. The 2006 survey defined four major areas of affective attributes: interest in science, support for scientific inquiry, self-belief as science learners, and responsibility towards resources and environments. Whereas the 2015 survey selected these five scales: interest in science topics, belief that science is instrumental for the future, feeling of joy in studying science, science self-efficacy (belief that the student is good in science), and epistemological beliefs about the nature of science (OECD, 2016b). Incorporating theories of student motivation was proposed to be a significant step towards understanding, and predicting, the involvement of students in STEM careers. The wide adoption of the Wigfield and Eccles (2000) expectancy-value theory as a main guide for the 2015 PISA science survey is indicative of the influence of their theoretical and empirical studies on motivation measurement. However, few efforts have been made

to validate the intended framework, individual items and resulting scales across many diverse countries (see Van de Vijver, Jude, & Kruger, this volume). The collection and analysis of data on student motivation, achievement and career aspirations in the TIMSS and PISA surveys have enriched our understanding of the complexity of human nature in different national contexts. The surveys have provided evidence that the comparisons of aggregate levels of behavior of individuals across countries may produce different, even paradoxical, relationships when compared with relationships observed at the level of individuals (OECD, 2016a, p. 113; Komatsu & Rappleye, 2017). Next, further analyses of these relationships at the country and individual level will be conducted to attempt to elucidate the relationships between country and individual and cognitive, non-cognitive and economic characteristics. Before we present these findings, we include a brief note on the data used in the analysis.

DATA USED IN THE ANALYSIS The proposed analysis could be undertaken with either large-scale survey, TIMSS or PISA, because both have similar student background questions. However, the coverage of countries by TIMSS is not as broad a representation of all countries as found in PISA. Therefore, to limit the length of analysis and presentation, this chapter will be limited to the 2015 PISA dataset. In the future, a replication of this analysis with the TIMSS 8th grade sample would contribute further to understanding international conditions of STEM employment. The OECD PISA survey is conducted every three years. The most recent cycle of PISA is the focus of the analysis presented here. The 2015 PISA survey collected expansive indicators of science achievement, student background and educational experiences that affect student choice. The 2015 survey

397

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

included a measure of student career expectations by age 30 in 68 countries and economies. The students’ occupational responses were coded by OECD into 580 occupational titles according to the International Standard Classification of Occupations (International Labour Office, 2012; OECD, 2017b). The authors reclassified 82 of the detailed classification of occupations into STEM and other occupational categories for comparison with STEM (shown in Table 22.1). Although each cycle of PISA includes student assessments in mathematics, science and reading in the 2006 and 2015 surveys, the subject of science was the special focus. The 2015 sample included 71 countries, one city, three US states, and several autonomous regions in Spain and Italy. The following analysis will be conducted on 68 countries (or economic units) that completed responses to the items necessary for analysis of career choice. Each country is represented by a randomized sample of 15-year-olds attending school. The sample sizes for each country range from around 3,400 in Iceland to about 23,000 in Brazil for a total sample of 519,000 students (OECD, 2017b: 130). These sample sizes are sufficiently large for comparing whole-country differences with a minimal level of sampling error. Reports and survey data are available on the OECD website for PISA (www.oecd.org/pisa/).

Since the occupational survey question permits only one occupational response, the reported occupational choice rates may be a low estimate of the number of students who would consider a career in any specific field. The non-response rates to this item ranged from 10% to 50% across the countries. Other items selected for this analysis include: achievement in science, science-­ efficacy, instrumental value for science, interest level in science classes, and enjoyment of science. Country-level characteristics used for grouping countries include per-capita gross domestic product (GDP) (OECD, 2017c) and world regions (Github, 2017). The regional distribution of the countries that participated in the 2015 PISA survey is widespread throughout the world, with representation in each sub-region (Table 22.2). The variation among the participating PISA countries is large enough to examine significant differences in aggregate national characteristics.

GEOGRAPHY, ECONOMY AND CAREER ASPIRATIONS To what extent are career aspirations for STEM careers affected by the students’ national environment? Under the assumptions that countries have unique features of

Table 22.1  Occupation aspiration by age 30 (sample size: PISA 2015) ILO occupation groups Total Scientist Engineer Technician software Technician engineering Technician, health Health professional Teachers Social science Business Other occupation Not applicable

Number of ILO occupations 577 10 14 25 33 13 20 15 5 71 371 8

PISA cases 476,852 9,443 36,833 17,626 12,161 4,258 65,755 28,764 14,188 22,991 173,716 91,117

Percentage of total reported 100 2 10 5 3 1 17 7 4 6 45 24

398

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Table 22.2  Number of 2015 PISA countries/economies by ISO region and sub-region ISO Sub-region

Total

Total Northern Africa Australia and New Zealand Caribbean Northern America South America Central America Central Asia Eastern Asia South-Eastern Asia Western Asia Eastern Europe Northern Europe Southern Europe Western Europe

73 2 2 2 2 7 2 1 6 5 8 8 10 11 7

Africa 2 2

historical, social and economic development, and that students within a geographic region are more likely to share common characteristics with each other than with students in other regions, this section groups countries according to income level (measured by GDP per capita) and average science achievement levels. Figure 22.1 provides a context of economic range of the countries in the analysis. It displays the number of PISA countries and economies by general economic level (average GDP, per capita). The range of economies represented here is very wide: about 20% of the countries have average incomes below $20,000 per capita per year and nearly a third of the countries have per capita incomes of $50,000 or more per year. If living standards affect the students’ aspirations for science occupations, the geographic diversity of PISA 2015 countries is well designed to detect that relationship. Figure 22.2 displays STEM occupational aspirations of students of the 68 PISA participating countries grouped into 13 world regions (according to Lukes’ classification; (Github, 2017). About 16% of the 15-yearolds expressed an aspiration to be employed

Americas 13

Asia 20

Europe 36

Oceania 2 2

2 2 7 2 1 6 5 8 8 10 11 7

in a science, engineering, or computer technology occupation by the age of 30. The proportion of students selecting fields in science or engineering varies across the regions from a low of 10% of countries in South-East Asia to 20% of students in countries of South America. Students in the four European regions are slightly below the PISA average in the proportion seeking STEM careers, whereas the USA and Canada are above average in selection. An equal proportion of students in each region aspired towards a career in a health profession field that requires training in science. The regional differences in student selection of STEM careers are not associated with levels of GDP per capita nor to country average science achievement. The lowest level of expectations for science and engineering careers are in countries with high GDP per capita, Western Europe and Eastern Asia, whereas the highest expectations are in Southern Europe, Western Asia and Central America. About 20% of students in Eastern Europe and Asia expected careers in the sciences, while nearly 40% of the students in North and South America aspired to these occupations by age 30. The percentages

399

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

20 18 16 14 12 10 8 6 4 2 0 Under 10K

10-20K

20-30K

30-40K

40-50K

50-100K

100K +

Figure 22.1  Number of countries in PISA 2015 by GDP per capita in PISA 2015

Central America Caribbean Western Asia South America Northern America Northern Africa Australia and New Zealand Grand Total Northern Europe Southern Europe Eastern Asia South-Eastern Asia Western Europe Eastern Europe 0.000

0.100

0.200

0.300

0.400

0.500

0.600

Proportion of respondents in 5 STEM occupations SCIENTIST

ENGINEER

TECHNICIAN software

Technician engineering

HEALTH PROF

Figure 22.2  Percentage of 15-year-olds who aspire to a STEM, health, or education profession for 13 world regions

of students in Central and South America expressing an expectation in science-­ engineering professions is especially high given the potential for employment within these countries.

The relationship between GDP per capita and student STEM career aspirations across countries is displayed in Figure  22.3 for STEM occupations and in Figure 22.4 for those choosing to be scientists only. These

400

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Percent of Country in STEM

0.400

y = 4E-07x + 0.1762 R² = 0.0334

0.350 0.300 0.250 0.200 0.150 0.100 0.050 0.000 3000

30000

300000

GDP per Capita (log scale)

Figure 22.3  Percentage of students aspiring to any STEM occupation by age 30 by log per-capita country income

Percent aspiring to Scientist

7%

y = 0.01 ln(x) - 0.08 R2 = 0.23

6% 5% 4% 3% 2% 1% 0% 2000

20000

200000

Country Per-capita income (log scale)

Figure 22.4  Percentage of students aspiring to a career as a scientist by per-capita level of country

scatterplots show that student choice to seek a STEM career is not higher for countries with greater income per capita (r=.04). However, students in higher income countries are more likely to aspire to be a research scientist (2–4% of students, Figure 22.4). Students in countries with very low or moderate per capita income levels are the least likely to expect a career in one of the STEM occupations. Students living in a very high-income level

country (which includes small countries such as Qatar, Macao, Luxembourg, Singapore and the Arab Emirates) display either very high or very low STEM aspirations compared with other countries. Thus, in general, students choosing selective fields of science and engineering is higher in countries with a high income per capita, but choice of technology fields is unrelated to economic status of a country.

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

The observation that student aspirations for a general STEM career are not associated with national economic levels requires additional exploration. Although economic and educational policies of Western European nations have encouraged young people to study STEM subjects (e.g. Gago et al., 2004), the PISA results indicate that students from Western European are less likely to aspire to a STEM careers compared with their peers in South American countries. These geographic differences in student aspirations produce international migration patterns that are encouraged by such policies as the use of H1-B visas in the United States to encourage overseas workers to take up highly skilled temporary jobs in the STEM sector (Smith, 2017; USCIS, 2018).

SCIENCE ACHIEVEMENT – INTERNATIONAL DIFFERENCES The PISA Sample of Countries and Their Characteristics While differences in levels of student achievement across countries is not news in the 21st century, that was not always the case. The size of differences in student achievement between countries was unknown until reliable statistical measurements of student achievement were conducted in the 1960s (see Chapter 4, this volume, for a fuller discussion). The large-scale international comparative surveys conducted by the International Association for the Evaluation of Educational Achievement (IEA) and OECD since 1965 have clearly documented that countries differ in levels of achievement and their achievement levels change over time (Mullis, Martin, & Loveless, 2016). The average student achievement levels for the 2015 PISA survey ranges from 330 in the Dominican Republic to 556 in Singapore, a range of two standard deviations of the student science test score (Figure 22.5). The

401

distribution of the average science and mathematics score for the 68 PISA 2015 countries or economies is not a normal bell-shaped distribution. The distribution is binomial, with fewer countries located at the middle of the distribution than at low or high levels. This distribution reflects the OECD selection process, which includes about 30 countries with lower levels of economic development in South America, Eastern Europe, Central Asia and Africa and another 30 countries in North and Western Europe, North America and the South Pacific (Figure  22.5). The wide range is useful for comparing the utility of the explanatory concepts of occupational career choice since many possible combinations may be created from the survey. The PISA country dataset allows for an analysis of the influence of two country characteristics that are likely to influence student occupational choice: income level of individuals (measured by GDP per capita) and geographic location in world regions. Although achievement levels are presented here as a static condition of a country, evidence is growing from repetition of the TIMSS and PISA surveys that student levels of achievement in mathematics, science or reading reported in these surveys may change over time. Recent measures of the amount of increase or decrease in 55 countries participating in PISA during 2006 and 2015 shows that 20 countries decreased by 10% of a standard deviation while 18 countries increased by the same amount (OECD, 2016b, 2017a). In TIMSS, the 8th grade test scores in mathematics and science achievement increased in 11 out of 16 countries and declined in five over a 20-year period of measuring (Mullis et al., 2016). The increases occurred in less developed countries and the decreases were mostly in Western Europe. Therefore, measurable achievement levels have changed in both directions over time by as much as a quarter of a standard deviation and by an average rank order of eight (out of

402

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

20 18 16

Math

Science

14 12 10 8 6 4 2 0

325-349 350-374 375-399 400-424 425-449 450-474 475-499 500-524 525-549 550-574 Average Mathematics and Science attainment quartiles

Figure 22.5  Number of PISA countries at intervals of achievement in mathematics and ­science: 2015

55 countries). Future studies should examine the effect of these changes on student engagement with science and technology careers in all countries. Each of the following sections are intended to help establish whether the large-scale survey measurement of individual student characteristics (attitudes and achievement in science) contain constructive concepts for improving the understanding of student career choices in STEM fields especially.

Achievement and Career Choice Students in countries with higher average achievement in science are more likely to choose a career in science or technology, but not in engineering (Table 22.3). However, the relationship is not a simple one, nor is it consistent across countries. This section explores the relationship of student achievement and career choice at the level of country differences and the level of individual student characteristics. It includes career selection by age 30 in natural science (including mathe-

maticians), engineering, technology and general STEM occupations. The relationship between average country achievement levels and occupational choice surprisingly indicates that different STEM occupations have opposite levels of association with country achievement levels (Table 22.3). While the proportion of students choosing to be a natural scientist or a software technician is higher in countries with high general achievement levels, the students who are most likely to choose a career in an engineering technology or profession are in lower achievement countries. Other occupations are included in the table for comparison to illustrate that STEM occupational choices are the most likely to be influenced by country average achievement. Note also that the collection of occupations into one STEM category results in no association with country aggregate achievement. In other words, the significance of identifying an occupation as ‘STEM’ (a combination of 85 occupations) has little value for predicting the relationship to other student characteristics, such as average achievement level.

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

Table 22.3  Correlation coefficients between average country science achievement and proportion of students seeking careers in 11 fields: 68 PISA countries 2015 Occupation by age 30

Country level correlation with average achievement (N=68)

Other occupation

0.550

Natural scientist (& Math)

0.503

Software technician

0.337

Health technician

0.215

Social science

0.186

Engineering technician

0.077

Teachers

−0.187

All STEM

−0.190

Business

−0.223

Health professional

−0.511

Engineer

−0.515

Achievement and Country Income The association between country achievement levels and career choice raises the question about the effect of income level of the countries. Is it possible that students who perform well in science are in high-income countries and therefore the connection to occupational choice is more determined by economics than by personal performance? Figures 22.6 and 22.7 address that question by showing the relationships between country economic level and student performance. Student achievement might be expected to be higher in high-income countries because of the greater availability of resources. Indeed, the fact that average student performance is associated with economic levels of the country in general is interpreted by some economists as evidence for a causal relationship between knowledge and income (Hanushek & Woessmann, 2015b). When aggregated into categories of GDP per capita (Figure  22.6), the average level of student performance in science increases with the level of per-capita income of the country

403

except at the very highest income levels, which includes unique economies (Luxembourg, Macao and Qatar) that have very high income per capita and relatively low achievement levels (see also Figure 22.7). When average country science scores are displayed in a two-dimension scatterplot, with the size of the GDP per capita converted to a log scale (to distribute income levels normally), the dominant trend is for student achievement levels to increase with a country’s level of economic performance (Figure  22.7). However, some countries fall significantly outside the regression line, illustrating that income is a contributing factor but is not a sufficient condition to predict country averages in science achievement. For example, students in Vietnam have a high score average and low per-capita income; the United Arab Emirates and Qatar have high income with low average scores; and the Dominican Republic has both very low scores and low income. These outliers may occur because of unique social or educational conditions that either motivate students to overcome obstacles or to remove educational opportunities altogether (Schleicher, 2018). It is an example of how large-scale surveys can point to special cases that might be overlooked with other methods of research.

Science Achievement and Career Aspirations The relationships shown in Figure 22.7 suggest that perhaps regional location plays a part in explaining how student achievement and occupational choice are connected. Significant geographic differences are noticeable in Table 22.4, which displays the level of association of student achievement with occupational choice in science (including mathematics) and engineering for 13 world regions. High-achieving students are more likely to choose to be a natural scientist,

404

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

550

500

450

400

350

300 Under 10K

10-20K

20-30K

30-40K

40-50K

50-100K

100K +

Figure 22.6  Mean science achievement by average GDP per capita: 68 PISA 2015 countries

600

P IS A S c ie n c e S c o r e 2 0 1 5

550

500

450

400

350

300 2000

Sig Jap Est CTP Fin Mca Can HK VN KorNA Aus Slv UKGer Net Ire Swt Bel Den PolPrt USANor Aut FrSwe OECD Spa Lat CR Rus Lux Ita Hun Lth Cro CAB Ice Isr Mal SlR KazGre Bul ChiMal UAE Uru Cpr Rom Mld Alb Turk T&T Thi CRc Qt Col Mex Mne Geo IndJor Bra Per Leb Tun Kos FYR Alg y = 0.0009x + 435 R² = 0.20 Dom

20000 Per capita Log Scale

200000

Figure 22.7  Average science score by log of per capita GDP

mainly in Western, Northern and Southern Europe (but not Eastern Europe or in other regions), whereas high-achieving students are more likely to choose engineering

professions in South America and Southern Europe. Student achievement has a very low relationship with choosing an occupation in a technology field in any region, with

405

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

Table 22.4  Average correlation (across individual countries) between student career choice at age 30 and achievement in science Region

Countries

Average correlation across countries between science achievement and occupational choice Natural Science or Math

Engineering

Technology

Health Professional

Grand Total

68

0.148

0.164

0.108

0.149

Northern Africa

2

0.017

0.170

0.136

0.144

 Caribbean

2

0.121

0.135

0.123

0.097

 Americas

12

0.120

0.179

0.098

0.049

  South America

6

0.107

0.195

0.085

0.037

  Central America

2

0.103

0.134

0.103

−0.023

  Eastern Asia

6

0.164

0.084

0.099

0.131

  South-Eastern Asia

4

0.126

0.127

0.069

0.135

  Western Asia

7

0.074

0.167

0.057

0.156

  Western Europe

7

0.203

0.180

0.087

0.208

  Northern Europe

10

0.189

0.149

0.108

0.140

  Southern Europe

10

0.183

0.202

0.117

0.205

  Eastern Europe Australia and New Zealand

8 2

0.124 0.209

0.162 0.202

0.175 0.172

0.201 0.169

Northern America

Asia

Europe

the possible exception of Eastern Europe, New-Zealand and Australia. Choosing to be a health profession is included in the table because of the high demands for science training that it requires. Interestingly, highachieving students who choose to go into health professions are found mainly in Europe. In this final section, the interrelationship of science achievement, income level and student attitudes towards science are considered together. Are students in countries with high average levels of student science achievement and per-capita income more likely to have high aspirations for a career in STEM fields? A scatterplot (Figure 22.8) of STEM occupational aspirations by average country achievement illustrates that countries with high levels of science aptitude do not comprise a high proportion of students who plan for STEM careers, nor are countries with low aptitude populated with low

STEM aspirations. Combining occupations into one STEM category has masked very significant different patterns for science and engineering occupations, as shown earlier in Table 22.3. Countries with low levels of student achievement and high levels of science expectations are mostly located in South America (Brazil, Colombia, Costa Rica, Dominican Republic, Mexico and Peru) or in the Middle East and Africa (Jordan, Lebanon, Qatar and Tunisia). The countries with high achievement but low science career expectations are mostly in Eastern Asia or Europe (China, Japan, Vietnam, Hong Kong, Singapore, Macao and Taipei, and Finland and Estonia). The regional consistency of student characteristics may be more important for describing student options for STEM fields than are the general educational and economic conditions of a country.

406

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Proportion seeking STEM occupation

0.400 0.350

ARE CRI QAT CHL PERJOR SVN TUR NOR MEX AUTDEU EST SGP LBN CHE HRV COL BGR MLT FRA LUXESP USAGBR NLD CAN BRA ARG SVKISLITA RUS LVA SWE NZL TWN MNETTO BEL PRT IRL ISR HUN LTU GRC TUN CZE AUS URY HKG DNK POLKOR CHN ROU FIN MDA MKD SRB DZA JPN MAC GEO THA VNM

DOM

0.300 0.250 0.200 0.150 0.100

IDN

0.050 0.000 300

350

400 450 500 Country average Science achievement

550

600

Figure 22.8  Proportion of students aspiring to a STEM career by age 30 by country level average science achievement: PISA 2015

Summary of Achievement and Careers The significance of these country-to-country comparisons of PISA results is that it supports the concerns of science educators that engineering and technology occupations are lacking in appeal, especially in many European countries (Osborne, Simon, & Collins, 2003; Osborne & Dillon, 2008). Bright and ambitious students in lowerincome nations may be more likely to seek social mobility opportunities in engineering and technology careers than are students in wealthier nations. The OECD summary report suggests that students in low-income countries may have unrealistic expectations for their future (OECD, 2016a: 114). Yet another possibility worthy of consideration is that students at age 15 have not yet solidified their understanding of occupations or labor markets and that their reply to survey questions is unreliable. Long-term longitudinal studies of students’ expectations and realizations would be required to fully understand the relationships between adolescent conceptual frameworks and their later career choices.

The next section explores evidence of student reported attitudes towards science education and career aspirations.

NON-COGNITIVE MEASUREMENT AND CAREER CHOICE This section applies the aggregate country characteristics and individual student measures to elaborate on the discussion by the OECD (2016a: 113–115) of specific career choices in STEM (natural scientist, engineer, engineering technician and software technician). The analysis combines the concepts of science achievement, attitude scales (science self-efficacy, science enjoyment, science instrumental value, science epistemology and science interest) and a measure of science activities. It re-examines whether the social-psychological measurement of student attitudes and interests are useful constructs for understanding student career aspirations across countries. The review of literature on motivation conducted by the OECD (2016a) and discussed earlier in this chapter provides support for expecting the PISA attitude

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

scales to be positively associated with the willingness of a student to seek a science, engineering or technology career. The non-cognitive scales chosen for this analysis are: 1 Enjoyment of science: student says they are having fun working on science. 2 Epistemological beliefs: student believes that science involves experimentation and that results can alter previously held notions. 3 Instrumental motivation: student believes that studying science will be useful in career. 4 Interest in broad science topics: student shows interest in specific topics, such as the universe, biosphere and force, and motion. 5 Science self-efficacy: student reports knowing how science is used to solve problems of the environment, medicine, earthquakes and the solar system. 6 An Index of science activities provides a measure of behaviors, rather than attitudes, such as buying books, watching science shows, reading science magazines and blogs, and attending science clubs.

These attitude scales measure the student’s orientation and beliefs towards science. The science activity scale (activities showing an interest in science) included in this set provides a behavioral check on the reliability and meaning of self-reported items on attitudes towards science. Each scale is composed of 4–7 items that have been scaled according to Item Response Theory (IRT) models (OECD, 2016a: 113). These five scales formed the first factor in a factor analysis of the 17 scales provided in the 2015 PISA results and thus appear to have common features related to occupational choice. The scales are not correlated with each other at the same level for each country. To describe the size of differences in relationships across countries, 3,716 correlation coefficients were computed for each pair of six attitude and science activity scales for students within 68 economies (14 countries did not administer the scales for epistemic knowledge, broad interest or science activities). The maximum and minimum size of

407

the correlation between each pair of attitude scales across the participating countries are summarized in Figure 22.9. The range of coefficient size varies across countries by 20 to 50 correlation points; a sign that non-cognitive scales do not have a common cross-national response pattern. The highest consistency was found between the scales of enjoyment, interest and instrumental value. The lowest consistency was found for science activities, epistemic beliefs and science self-efficacy. The wide variation in strength of relationships between the scales across countries indicates that the conceptual frameworks used by PISA for student motivation scales may not give equally fair estimates of motivation for students in one country compared with another. The associations are higher for OECD member countries than for other countries. Therefore, caution is necessary in interpreting the meaning of the scales equally in every country. The country-to-country variation in relationship of attitudes to occupational choice between countries is dependent on type of occupational choice (Figure 22.10). None of the scales was significantly associated with choosing a technician occupation. Students who choose careers in natural science are more likely to have a higher level of enjoyment of science, interest in general science, and see science as an instrumental value for their future than are students choosing engineering professions. The size of the relationship of the non-cognitive scales with occupational choice was approximately half the size of the relationship of cognitive science achievement scale (as reported by OECD, 2016a: 113).

Individual Versus Aggregate Analyses Throughout this chapter, the PISA survey of student orientation towards science have been analyzed as aggregate country

408

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Correlation coefficients between attitude pairs

0.80 Maximum

0.70

Minimum

0.60 0.50 0.40 0.30 0.20 0.10 0.00 -0.10

Enjoyment

Epistemological

Interest

Sci activities

Instrumental

Enjoyment

Interest

Self-efficacy

Instrumental

Instrumental

Self-efficacy

Instrumental

Enjoyment

Self-efficacy

Interest

Self-efficacy

Instrumental

Interest

-0.20

Self-efficacy

Attitude scales

Figure 22.9  Maximum and minimum bilateral correlations between pairs of six PISA attitude scales for participating countries: PISA 2015

indicators and as individual student-level indicators. The OECD and other researchers have noted that the aggregate and individuallevel analyses often appear to be paradoxical (Lu & Bolt, 2015; OECD, 2016a; Komatsu & Rappleye, 2017). The association of individual student attitudes with achievement is usually positive in most countries (i.e. highachieving students in science are more likely to enjoy the subject), whereas when averaged to the country level, the association of positive attitudes towards science is usually negatively associated with achievement (although the size of the relationship varies greatly between countries). For example, science self-efficacy is positively associated with student achievement at the individual level in 55 of 68 countries but the aggregate value of self-efficacy is negatively associated with science achievement across 68 countries

(r = −0.24). The correlations between science achievement and enjoyment of science ranges from 0.09 to 0.49 for individuals within 68 countries, whereas the country-to-country correlation of the means of science achievement to mean enjoyment of science is negative (−0.58). Altogether, the aggregate country-to-country relationships between five of the six attitude scales and student achievement are negative when computed across the 68 participating countries (the correlations ranged from −0.41 to −0.54). Only the scale on science epistemological beliefs is positively associated with science achievement at the country level. The PISA scale of science activities has implications for educational policy. Science educators often promote greater engagement with active science as a stimulation for science learning. However, the correlations

409

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

Attitude scale Max

Scientist

Min

Average

Epistemological Instrumental Interest in science Science enjoyment Science activities Science efficacy Engineer Epistemological Instrumental Interest in science Science enjoyment Science activities Science efficacy -0.05

0

0.05

0.1

0.15

0.2

0.25

Figure 22.10  Minimum, maximum and average correlation between student choice of scientist or engineer by age 30 with six attitude scales for 68 countries

between number of science activities and achievement is negative in 24 countries and positive in 31 countries (computed by authors but not shown in the text because of space). Positive correlations were common in European and Asian countries whereas a negative association was found in the Middle East (Israel, Qatar, Tunisia and the United Arab Emirates) and South America (Colombia, the Dominican Republic and Peru). These regional differences show that educational policies of actively engaging students with science domains to improve their interest and knowledge of science may be effective in some cultures but not others.

SUMMARY The intent of this chapter is to illustrate how research conducted with recent large-scale

comparative studies (which have grown to include samples of students in a diverse set of countries) can explore macro and micro influences on student plans for entering careers in science or engineering. The analysis considered how STEM occupational choice was associated with country and student achievement levels, attitudes towards science, aggregate country income level, and world region of residence. For an analysis of STEM career choices, the authors grouped occupations into natural scientists, engineers and technicians in computer technology and engineering. These occupations were compared with other professional-level occupations in related fields. The analysis model for investigating student career choice practices was intended to establish generalizations about student career choices while also preserving accurate descriptions of the variation of country characteristics as much as possible. The exercise

410

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

required methods to depict distributions of a range of differences as well as depict generalizable relationships. Thus, the analysis aggregated individual student-level measurements into country units, countries were aggregated into categories of world regions, per-capita GDP, student achievement levels, student attitudes and career choices. Having access to individual student-level characteristics in the publicly available database permitted the analysis to create measures of relationships between student characteristics at the individual level which also could be accumulated into larger systems of country units. These analyses of PISA data have identified a number of paradoxes that should stimulate future new research. While the relationship between country aggregate levels of achievement and career orientation across countries is negative, the relationship between achievement and career orientation within most countries is positive. Such a change in relationship probably occurs because the distribution of student interest within some countries is weighted towards negative ends of the measurement scale even though students who have higher achievement also have higher interest in science.

most likely to seek careers in engineering and technology. The relationship between engaging in science activities and science career choice is low in most countries, suggesting that policies of encouraging students to take science classes with higher levels of enjoyment or activity will not necessarily result in an increase of students seeking STEM careers. In fact, the reverse appears to be the case. Indeed, this analysis of achievement, career plans, motivations and countryspecific characteristics has discovered a complex set of relationships in some countries that did not fit the social-psychological theories of motivation. We would therefore argue that future cross-cultural research should be stimulated by these findings to seek new models of student motivation and career choice. Such work should include further studies of the education and social conditions of students in countries that have high levels of student expectation of a science career and low economic development to understand exactly why many of the relationships between macro and micro student characteristics are ‘paradoxical’.

Non-cognitive Scales

Value of Large-scale Research Model

Although the use of non-cognitive scales of student attitudes has developed extensively over the past 50 years (Wigfield et al., 2015), our analysis of 68 countries included in PISA 2015 found that few simple generalizations about student beliefs and behaviors are warranted. We observed differences in how attitudes relate to achievement and career choice in different regions. For example, countries in Europe and North America have strong positive relationships between achievement and motivation scales whereas countries in Eastern Europe, North Africa, South-East Asia, and Central and South America have low relationships between enjoyment of science and achievement. We found that students in less developed countries are the

The content of the PISA surveys constitutes a deep dive into the knowledge, beliefs and values of a half million 15-year olds on six continents. By design, survey research is limited by the content of frameworks of psychological and social behavior that were designed years prior to data collection. An advantage of an a priori framework is that the boundaries they provide assist in making comparisons between individuals and groups of individuals since all respondents respond to a common set of questions. The statistical comparisons provide equal measures of size and strength of relationships. The disadvantage of the design is that the researcher must interpret career choice decisions of students who reside in countries outside the

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

experience of the researcher. Also, the survey may include questions that are irrelevant to students in some economic units better than in others, or the questions that are relevant to the behaviors of students in some countries may have been excluded from the survey. Large-scale surveys across a wide variety of economic and social conditions require continuous analysis and evaluation and feedback to the study organizers to achieve more meaningful measurement procedures representative of all units sampled. The educational researchers who developed the first cross-national education surveys attempted to ignore the politics of international comparisons and concentrate on inventing new methods for studying the causes of educational achievement (Husén, 1972). One hope of the empirical researchers who initiated large-scale surveys of education was to identify causal paths to an increased quality and quantity of education that have generality (Husén, 1972). This analysis of student orientation towards careers has instead shown that international differences in average levels of achievement, attitudes and career orientations are very large and that regional differences exist but need explanation beyond the capacity of the single study. Today, extensive survey measures of student achievement for a large number of countries are easily obtained by researchers. Thus, the questions about relationships between economic development and student knowledge and interests can be described more faithfully. The rigor of their data collection methods ensures that the descriptions of country and student educational activities, learning and attitudes can reliably place individual country patterns in a context of economic and social experiences around the world. These comparative studies provide opportunities to examine human behavior across a wide variety of contexts. The results from these objective survey research methods may require modification and adjustment of personal beliefs about how education affects students.

411

REFERENCES Aghion, P., & Howitt, P. (1992) A model of growth through creative destruction. Econometrica, 60(2), 323–351. Aghion, P., & Howitt, P. (2006) Appropriate growth policy: A unifying framework. Journal of the European Economic Association, 269–314. Ainley, M. (2006) Connecting with learning: Motivation, affect and cognition in interest processes. Educational Psychology Review, 18, 391–405. doi:10.1007/s10648-006-9033-0. Ainley, M. (2010) Interest in the dynamics of task behavior: Processes that link person and task in effective learning. In T. C. Urdan and S. A. Karabenick (Eds.), Advances in motivation and achievement: The decade ahead: Theoretical perspectives on motivation and achievement (Vol. 16, Part A). United Kingdom: Emerald Group Publishing Limited. Ainley, M., & Ainley, J. (2011) A cultural perspective on the structure of student interest in science. International Journal of Science Education, 33(1) 51–57. Ainley, M., Hidi, S., & Berndorff, D. (2002) Interest, learning and the psychological processes that mediate their relationship. Journal of Educational Psychology, 94, 545–561. Archer, L., DeWitt, J., Osborne, J., Dillon, J., Willis, B., & Wong, B. (2013) ‘Not girly, not sexy, not glamorous’: Primary school girls’ and parents’ constructions of science aspirations. Pedagogy, Culture & Society, 21(1) 171–194. Bandura, A. (1997) Self-efficacy: The exercise of control. New York: Freeman. Betz, N., & Hackett, G. (2006) Career self-­ efficacy theory: Back to the future. Journal of Career Assessment, 14(1), 3–11. Betz, N. E. (2007) Career self-efficacy: Exemplary recent research and emerging directions. Journal of Career Assessment, 15(4), 403–422. http://doi.org/10.1177/1069072707305759. Betz, N. E., & Rottinghaus, P. J. (2006) Current research on parallel measures of interests and confidence for basic dimensions of vocational activity. Journal of Career Assessment, 14, 56–76. Betz, N. E., Hammond, M., & Moulton, K. (2005) Reliability and validity of response continua for the career decision self-efficacy

412

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

scale. Journal of Career Assessment, 13, 131–149. Betz, N., Borgen, F., & Harmon, L. (2006) Vocational confidence and personality in the prediction of occupational group membership. Journal of Career Assessment, 14, 36–55. Bosworth, D. et  al. (2013) The supply of and demand for high-level STEM skills. Evidence Report, No. 77. Rotherham, UK: UK Commission for Employment and Skills. Brown, S. D., & Lent, R. W. (Eds.) (2008) Handbook of counseling psychology (4th ed.) New York: Wiley. Bush, V. (1945) Science—the endless frontier: A report to the President. Washington, DC: US Government Printing Office. Business, Innovation and Skills (2013) The Perkins Review of Engineering Skills. London: Department for Business, Innovation and Skills. BIS/13/1269. CareersNZ (2017) Website on Career Theory. Available at: www.careers.govt.nz/resources/ career-practice/career-theory-models/ Cmd. 6824. (1946) Scientific man-power. Report of a committee appointed by the Lord President of the Council (The Barlow Report) (London, HMSO). Deci, E. L., & Ryan, R. M. (2008) Facilitating optimal motivation and psychological wellbeing across life’s domains. Canadian Psychology, 49, 14–23. Dow, P. B. (1991) Schoolhouse politics, lessons from the Sputnik era. Cambridge, MA: Harvard University Press. Gago, J. M., Ziman, J., Caro, P., Constantinou, C., Davies, G., Parchmann, I., Rannikmae, M., & Sjoberg, S. (2004) Increasing human resources for science and technology in Europe: Report of the High-Level Group on Human Resources for Science and Technology in Europe. Luxembourg: European Communities. Gelade, G. A. (2008) IQ, cultural values, and the technological achievement of nations. Intelligence, 36, 711–718. Github. (2017) Lukes, ISO-3166 Countries with Regional Codes. https://github.com/lukes/ ISO-3166-Countries-with-Regional-Codes/ blob/master/all/all.csv. Godin, B. (2002) Highly qualified personnel: Should we really believe in shortages? Project on the history and sociology of science

and technology statistics. Working paper no. 15 (Montreal, CSIIC). Available online at: www.csiic.ca/pdf/Godin_15.pdf (accessed 3 March 2015). Hanushek, E. A., & Woessmann, L. (2009) Do better schools lead to more growth? Cognitive skills, economic outcomes, and causation. IJ Econ Growth (2012), 17, 267–321. Hanushek, E. A., & Woessmann, L. (2015a) Universal basic skills: What countries stand to gain. Paris: OECD Publishing. Retrieved from http:// dx.doi.org/10.1787/9789264234833-en. Hanushek, E. A., & Woessmann, L. (2015b) The knowledge capital of nations: Education and the economics of growth. Cambridge, MA: The MIT Press. Hanushek, E. A., & Kimko, D. D. (2000) Schooling, labor-force quality, and the growth of nations. American Economic Review, 90, 1184–1208. Hanushek, E. A., & Woessmann, L. (2008) The role of cognitive skills in economic development. Journal of Economic Literature, 46(3), 607–668. Hidi, S., & Harackiewicz, J. (2000) Motivating the academically unmotivated: A critical issue for the twenty-first century. Review of Educational Research, 70, 151–180. Hidi, S., & Renninger, K. A. (2006) The fourphase model of interest development. Educational Psychologist, 41, 111–127. s15326985ep4102_4. Husén, T. (1972) Strategies of educational Innovation. The Australian Journal of Education, 162, 125. International Labor Office (ILO) (2012) International standard classification of occupations. ISCO–08. Geneva: ILO. Jenkins, E. W., & Donnelly, J. F. (2006) Educational reform and the take-up of science post-16. Paper presented at the Royal Society conference ‘Increasing the take-up of science post-16. March 16th 2006. London. Jon D. Miller, & Linda G. Kimmel (2012) Pathways to a STEMM Profession. Peabody Journal of Education, 87(1), 26–45. doi: 10.1080/0161956X.2012.642274. Jones, C. (1995) R & D-Based models of economic growth. Journal of Political Economy, 4(103), 759–784. Komatsu, H., & Rappleye, J. (2017) A PISA paradox? An alternative theory of learning

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

as a possible solution for variations in PISA scores. Comparative Education Review, 61(2). Krapp, A. (2002) An educational-psychological theory of interest and its relation to SDT. In Edward Deci and Richard M. Ryan (Eds.), The handbook of self-determination research (pp. 405–426). Rochester: University of Rochester Press. Krapp, A. and Prenzel, M. (2011) Research on interest in science: Theories, methods, and findings. International Journal of Science Education, 33(1), 27–50. doi: 10.1080/ 09500693.2011.518645 Leitch Review of Skills (2006) Prosperity for all in the global economy-World class skills. Final Report (London, HMSO). Lent, R. W., Tracey, T. J. G., Brown, S. D., Soresi, S., & Nota, L. (2006) Development of interests and competency beliefs in Italian adolescents: An exploration of circumplex structure and bidirectional relationships. Journal of Counseling Psychology, 53(2) 181–191. http://doi.org/10.1037/0022-0167.53.2.181. Lent, R. W., & Brown, S. D. (2006) On conceptualizing and assessing social cognitive constructs in career research: A measurement guide. Journal of Career Assessment, 14, 12–35. Lu, Y., & Bolt, D. M. (2015) Examining the ­attitude–achievement paradox in PISA using a multilevel multidimensional IRT model for extreme response style. Large-scale Assessments in Education: An IEA-ETS Research Institute Journal, 3(2). https://doi.org/10.1186/ s40536-015-0012-0. Lynn, R. & Meisenberg, G. (2010) National IQs calculated and validated for 108 nations. Intelligence, 38, 353–360. Lynn, R., & Vanhanen, T. (2012) Intelligence. A unifying construct for the social sciences. London: Ulster Institute. Meisenberg, G. & Lynn, R. (2011) Measures of human capital. Journal of Social Political and Economic Studies 36, 421-454. Panizza, U. Miller, J. D. & Brown, K. G. (1992a) The development of career expectations by American youth. In W. Meeus et. al. (Eds.), Adolescence, careers, and cultures. Berlin: Walter de Gruyter. Miller, J. D., & Brown, K. G. (1993) Persistence and career choice. In L. E. Suter (Ed.), Indicators

413

of science and mathematics education. Washington, DC: National Science Foundation. Miller, J. D., & Kimmel, L. G. (2012) Pathways to a STEMM profession. Peabody Journal of Education, 87(1) 26–45. Miller, J. D., & Brown, K. G. (1992b) Persistence and career choice. In Suter, L. (Ed.), Indicators of science and mathematics education. Washington, DC: National Science Foundation. Mitch, D. (2005, July 26) Education and economic growth in historical perspective. In R. Whaples (Ed.), EH.Net Encyclopedia. http://eh.net/encyclopedia/education-and-economicgrowth-in-historical-perspective/. Mullis, I. V. S., Martin, M. O., & Loveless, T. (2016) 20 years of TIMSS: International trends in mathematics and science achievement, curriculum, and instruction. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. Retrieved from h t t p : / / t i m s s 2 0 1 5 . o r g / t i m s s 2 0 1 5 / w p -­ content/uploads/2016/T15-20-years-ofTIMSS.pdf. Nagengast, B. et al. (2011) Who took the ‘×’ out of expectancy-value theory? A psychological mystery, a substantive-­ methodological synergy, and a cross-national generalization. Psychological Science, 22(8) 1058–1066, http:// dx.doi.org/10.1177/0956797611415540. National Academies of Science. (2010) Rising Above the gathering storm. Revisited. Washington D.C. National Commission on Excellence in Education. (1983) A nation at risk: The imperative on education reform. Washington, D.C.: U.S. Government Printing Office. National Science Board (2000) Science and technology policy: Past and prologue. A Companion to Science and Engineering Indicators, 2000. National Science Board, Washington, D.C. National Science Foundation (1951) First annual report of the National Science Foundation, 1950– 1951. Washington, DC: US Government Printing Office). Nugent, G. et  al. (2015) A model of factors contributing to STEM learning and career orientation. International Journal of Science Education, 37(7), 1067–1088. http://dx.doi.org/10.1080/09500693.2015 .1017863.

414

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

OECD (2007) PISA 2006: Science competencies for tomorrow’s world. Vol. 1: Analysis. Paris: OECD Publishing. http://dx.doi. org/10.1787/ 9789264040014-en. OECD (2016a) PISA 2015 results. Vol. I: Excellence and equity in education, PISA. Paris: OECD Publishing. http://dx.doi.org/10.1787/ 9789264266490-en. OECD (2016b) PISA 2015 assessment and analytical framework: Science, reading, mathematic and financial literacy, PISA. Paris: OECD Publishing. http://dx.doi.org/10.1787/ 9789264255425-en. OECD (2017a) PISA 2015 Results (Volume III): Students’ Well-Being. Paris: OECD Publishing. http:// dx.doi.org/10.1787/9789264273856-en. OECD (2017c) PISA 2015 technical report. Paris: OECD Publishing. OECD (2017b) What kind of careers in science do 15-year-old boys and girls expect for themselves? PISA in FOCUS. N 69, October. Paris: OECD Publishing. Retrieved from http://www.oecd.org/pisa/pisaproducts/pisain-focus-all-editions.htm. Osborne, J., & Dillon, J. (2008) Science education in Europe: Critical reflections. A Report to the Nuffield Foundation (January). Osborne, J., Simon, S., & Collins, S. (2003) Attitudes towards science: A review of the literature and its implications. International Jurnal of Science Education, 25(9), 1049–1079. Patton, M. (2013) ATE had role in the naming of STEM. ATE Central Blog. https://atecentral.net/ate20/22917/ate-had-role-in-thenaming-of-stem. Riegle-Crumb, C., Moore, C., & Ramos-Wada, A. (2011) Who wants to have a career in science or math? Exploring adolescents’ future aspirations by gender and race/ethnicity. Science Education, 95(3), 458–476. http:// dx.doi.org/10.1002/sce.20431. Ryan, R. M., & Deci, E. L. (2009) Promoting self-determined school engagement: Motivation, learning and well-being. In K. Wentzel, A. Wigfield and D. Miele (Eds.), Handbook of motivation at school (pp. 171– 195). New York: Routledge. Salzman, H., Kuehn, D., & Lowell, B. L. (2013) Current and proposed high-skilled guestworker policies discourage STEM students and grads from entering IT. Economic Policy

Institute. Retrieved from http://www.epi.org/ publication/current-proposed-high-skilledguestworker/. Schleicher, A. (2018) World Class: How to build a 21st century school system. Paris: OECD Publishing. Schofer, E., & Meyer, J. W. (2006) Student achievement and national economic growth. American Journal of Education, 113, 1–29. Shadbolt Review (2016) The Shadbolt review of computer sciences degree accreditation and graduate employability. Available at https:// www.gov.uk/government/publications/computer-scie nce-degree-accreditation-andgraduate-employability-shadbolt-review (accessed on 18 June, 2016). Shah, C., & Burke, G., (2003) Skills shortages: Concepts, measurement and implications. Working Paper No. 52. Melbourne, Australia: Centre for the Economics of Education and Training, Monash University. Sikora, J., & Pokropek, A. (2012) Gender segregation of adolescent science career plans in 50 countries. Science Education, 96(2), 234– 264. http://dx.doi.org/10.1002/sce.20479. Smith, Adam (1778) An Inquiry into the nature and causes of the wealth of nations. (2nd Ed.). London: W. Strahan; T. Cadell. https://books. google.com/books?id=PAQMAAAAYAAJ&d q=editions%3AHkmbBXCA1kcC&pg=PR1#v =onepage&q&f=true. Smith, E., (2017) Shortage or surplus? A longterm perspective on the supply of scientists and engineers in the USA and the UK. Review of Education, 5(2), 171–199. Steelman, J. R. (1948) Manpower for research. Bulletin of the Atomic Scientists, 4(2), 57–58. Super, D. E. (1957) The psychology of careers. New York: Harper & Row. Super, D. E. (1963) Towards making self-­ concept theory operational. In D. E. Super, R. Starishevski, N. Matlin and J. P. Jordaan (Eds.), Career development: Self-concept theory (pp. 17–31). New York: College Entrance Examination Board. Super, D. E. (1964) A developmental approach to vocational guidance: Recent theory and results. Vocational Guidance Quarterly, 13, 1–10. Super, D. E. (1980) A life-span, life-space approach to career development. Journal of Vocational Behavior, 16, 282–298.

WHAT CAN INTERNATIONAL COMPARATIVE TESTS TELL US?

Super, D. E. (1980). A life-span, life-space approach to career development. Journal of Vocational Behavior, 163, 282–298. http:// doi.org/10.1016/0001-87918090056-1. Teitelbaum, M. S. (2014) Falling behind? Boom, bust and the global race for scientific talent. Princeton, MJ: Princeton University Press. The Royal Society (2008) Science and mathematics education, 14–19: A ‘state of the nation’ report on the participation and attainment of 14–19 year olds in science and mathematics in the UK, 1996–2007. London: The Royal Society. Available at http://­ royalsociety.org/downloaddoc.a sp?id=5698 (accessed 2 June, 2009). The Taunton Committee (1868) Schools Inquiry Commission, Vol. I, Report of the commissioners, Command Paper 396 (London, HMSO). USCIS (2013) Characteristics of H1B specialty occupation workers, US Citizenship and Immigration Services. Washington, DC: US Department of Homeland Security. U.S. Citizenship and Immigration Services (USCIS) (2018) H-1B Specialty Occupations, DOD Cooperative Research and Development Project Workers, and Fashion Models. U.S. Citizenship and Immigration Services. https://www.uscis.gov/working-unitedstates/temporary-workers/h-1b-specialtyoccupations-dod-cooperative-research-anddevelopment-project-workers-and-fashionmodels. Wakeham Review (2016) Wakeham review of STEM degree provision and graduate employability. Available at https://www.gov. uk/government/publications/stem-degreeprovision-andgraduate-employabilitywakeham-review (accessed 8 June, 2016). Wang, M., & Degol, J. L. (2017) Gender gap in science, technology, engineering, and mathematics (STEM): Current knowledge, implications for practice, policy, and future directions. Educational Psychology Review, 29(1), 119–140 (online first, 13 January, 2016). http://dx.doi.org/10.1007/s10648015-9355-x.

415

Wang, M., Eccles, J., & Kenney, J. (2013) Not lack of ability but more choice: Individual and gender differences in choice of careers in science, technology, engineering, and mathematics. Psychological Science, 24(5). Weede, E. (2004) Does human capital strongly affect growth rates? Yes, but only if assessed properly. Comparative Sociology, 3, 115–134. Weede, E., & Kämpf, S. (2002) The impact of intelligence and institutional improvements on economic growth. Kyklos, 55(3), 361–380. Wigfield, A., & Eccles, J. S. (2000) Expectancyvalue theory of achievement motivation. Contemporary Educational Psychology, 25(1), 68–81. http://dx.doi.org/10.1006/ ceps.1999.1015. Wigfield, A., & Eccles, J. S. (2002) The development of competence beliefs and values from childhood through adolescence. In A. Wigfield and J. S. Eccles (Eds.), Development of achievement motivation (pp. 92–120). San Diego, CA: Academic Press. Wigfield, A., Eccles, J. S., Schiefele, U., Roeser, R. W., & Davis-Kean, P. (2006) Development of achievement motivation. In N. Eisenberg, W. Damon and R. M. Lerner (Eds.), Handbook of child psychology: Social, emotional, and personality development. Hoboken, NJ: John Wiley. Wigfield, A., Eccles, J., Fredericks, J., Simpkins, S., Roeser, R. W., & Schiefele, U. (2015) Development of achievement motivation and engagement. In Richard M. Lerner (Ed.), Handbook of child psychology and developmental science. New Jersey, US: John Wiley & Sons, Inc. doi: 10.1002/9781118963418. childpsy316. Wigfield, A., Tonks, S., & Klauda, S. L. (2009) Expectancy-Value Theory. In K. R. Wentzel and A. Wigfield (Eds.), Handbook of motivation in school (pp. 55–76). New York: Routledge Taylor Francis Group. Wolf, A. (2008) Educational expansion: the worms in the apple. Economic Affairs, 25(1), 36–40.

This page intentionally left blank

PART V

International Comparisons of Instruction

This page intentionally left blank

23 Comparative Research on Teacher Learning Communities in a Global Context Motoko Akiba, Cassandra Howard and Guodong Liang

INTRODUCTION Around the globe, teacher learning communities (TLCs), where teachers share norms, values, and practices for a common goal of supporting student learning, have been increasingly promoted as a promising approach to systemwide improvement of instruction and student learning (Huffman et  al., 2016; McLaughlin & Talbert, 2001, 2006; Wei, Darling-Hammond, Andree, Richardson, & Orphanos, 2009). As a result, approaches to TLCs, such as lesson study and Professional Learning Communities (PLCs), have spread to many countries in the last two decades (Huffman et al., 2016; Lewis & Lee, 2018; Stoll, Bolam, McMahon, Wallace, & Thomas, 2006; Vescio, Ross, & Adams, 2008). The global circulation of lesson study, for example, has led to the establishment of the World Association of Lesson Studies (WALS) in 2006, with seven founding member

countries and council members representing 12 countries around the world. The number of countries represented among presenters at the WALS conference has grown from 13 in 2007 to 35 in 2017, indicating the emergence of lesson study as a global model of teacher professional development. Researchers have also documented other forms of commonly practiced TLCs, including Practitioner Inquiry in Australia and the UK (Groundwater-Smith & Dadds, 2004), Teachers’ Networks and learning circles in Singapore (Hairon & Dimmock, 2012; Huffman et al., 2016; Tripp, 2004), and Critical Friends Groups in the US (Curry, 2008; Dunne, Nave, & Lewis, 2000). While TLCs have emerged as a recent reform initiative in many countries, in other countries, collaborative teacher learning has long been a part of teachers’ professional norms. Lesson study, for example, has been practiced for over a century in Japan (Fernandez & Yoshida, 2004; Makinae, 2010) and is an integral part of teachers’

420

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

work schedule, responsibility, and career advancement system. Likewise, keli in China has been practiced since 2003, but it stems from a deep-rooted tradition of collaborative inquiry into teaching materials and curriculum, joint lesson planning, and frequent classroom observations and post-lesson discussions (Huang & Bao, 2006; Ma, 1999; Paine & Ma, 1993). Other research has also noted a collective orientation among teachers in some non-Western countries where teacher collaboration for learning is a norm (Hofstede, 2001; Oyserman, Coon, & Kemmelmeier, 2002; Smith, 2016; Wang, Li, Tan, & Lee, 2017). Learning about cross-national differences in the histories, approaches, and supporting conditions of TLCs helps researchers better understand the uniqueness of TLCs in their own countries, identify alternatives for current practice, and recommend policies to support collaborative teacher learning (Noah, 1984). Considering the central role that collaborative teacher learning may play in deepening teachers’ knowledge, changing their beliefs, and improving instruction, comparative research on TLCs is a promising venue to identify conditions that may improve teaching and student learning. In this chapter, we propose a conceptual framework to guide comparative research on TLCs by identifying the key domains and aspects of TLC research in a global context. We discuss the elements of each domain based on findings from previous comparative and international research. We then introduce two studies we conducted with different approaches to international studies of TLCs – a quantitative cross-national study on teachers’ collaborative learning activities in 32 countries and a mixed-methods comparative study of lesson study in Japan and the US – to illustrate what comparative studies may offer in comparison to single-country studies, focusing on the strengths and limitations of each study approach and how these studies complement each other to enhance our understanding of collaborative teacher

learning in a global context. First, we define and give a brief overview of TLCs.

TEACHER LEARNING COMMUNITIES Various researchers have defined the characteristics of TLCs (Grossman, Wineburg, & Woolworth, 2001; Huffman et  al., 2016; Learning Forward, n.d.; Stoll et  al., 2006; Westheimer, 1998). Westheimer (1998), for example, applied social theories of community to identify five common themes – interaction and participation, interdependence, shared interests and beliefs, concern for individual and minority views, and meaningful relationships. In the Standards for Professional Learning, Learning Forward (n.d.) defined learning communities with three characteristics: commitment to continuous improvement, collective responsibility for the learning of all students, and alignment of individual, team, school, and school system goals. In the United Kingdom, Stoll et al. (2006) described five key characteristics of PLCs as shared values and vision, collective responsibility, reflective professional inquiry, collaboration, and group learning. Based on findings from an international project involving researchers from Mainland China, Hong Kong, Taiwan, Singapore, and the US, Huffman et al. (2016) defined PLCs as communities of learning in which educators collaboratively engage to foster a culture that enhances teaching and learning for all. What is common in these definitions is a vision of TLCs where teachers come together based on a shared goal and responsibility for student learning and engage in collaborative learning where reflective dialogues based on different views and ideas are valued and promoted for continuous improvement of student learning. In such communities, teachers develop meaningful relationships with one another and assume collective responsibility

Comparative Research on Teacher Learning Communities in a Global Context 421

for engaging in continuous professional learning about teaching and student learning. From an international and comparative perspective, however, the actual practice of TLCs varies widely across countries due to major differences in the culture of teaching (Alexander, 2000; Givvin, Hiebert, Jacobs, Hollingsworth, & Gallimore, 2005; Hiebert et al., 2005; Stigler & Hiebert, 1999) as well as the cultural role and professional status of teachers (Akiba, 2016; Akiba, Chiu, Shimizu, & Liang, 2012; LeTendre, 1994, 2000; Liang & Akiba, 2018; Tobin, Hsueh, & Karasawa, 2009; Tobin, Wu, & Davison, 1989; Vavrus & Bartlett, 2012). Therefore, it is important to understand the multiple layers of contexts at the global, national, regional, and school levels that influence and shape the nature and outcomes of TLCs. Comparative education research illuminates the differences in these contexts across national boundaries as well as global similarities to help researchers and policymakers better understand how the interaction of national contexts and global dynamics shapes education systems (Akiba, 2017; Akiba & LeTendre, 2018; Baker & LeTendre, 2005). Teacher learning is an especially fruitful area for comparative research because it allows us to leverage the globe as a site of natural experimentation to better understand the conditions and contexts that support one of the central driving forces behind improvement of teaching and student learning (Akiba, 2017). With the global popularity of TLCs, an increasing number of studies have examined the contexts, nature, and outcomes of TLCs in various national and cultural settings. Previous research has also documented how an increase in national/federal involvement during the last two decades has shaped the direction of teacher professional development and learning as a tool for implementing educational reforms and improving teacher quality (Frankham & Hiett, 2011; Hardy, Rönnerman, Furu, Salo, & Forsman, 2010; Jones, 2011; Osborn, 2007). Given increasing governmental involvement in teachers’

professional learning activities, it is important to identify the key areas of comparative research on collaborative teacher learning that may inform teacher policy and reform in various national contexts.

CONCEPTUAL FRAMEWORK To develop our conceptual framework of TLCs in a global context, we first conducted a review of comparative and international research on TLCs. To identify relevant research, we used the key words of singular and plural forms of ‘teacher learning community,’ ‘professional learning community,’ and ‘lesson study’ in the Educational Resources Information Center (ERIC) database and identified a total of 460 peer-reviewed journal articles written in English that reported empirical or conceptual studies of TLCs in K-12 settings. We found that these studies were conducted in 36 countries and most of them were published since the late 1990s, which supports the global spread of TLCs during the last two decades. Of these countries, the largest proportion of publications focused on TLCs in the US (222 studies, 48.3%), followed by Canada (29 studies), China (26 studies), United Kingdom (25 studies), Australia (18 studies), and Singapore (16 studies). An overwhelming majority of the studies had a single-country focus, as only five studies (1.1%) focused on two or more countries, but only three of them provided comparable data from these countries (Hargreaves et al., 2013; Jäppinen, Leclerc, & Tubin, 2016; Webb, Vulliamy, Sarja, Hämäläinen, & Poikonen, 2009). The other two studies simply described or discussed TLCs in multiple countries without empirical data. Our synthesis revealed five domains of research on TLCs that we organized to develop our conceptual framework. As shown in Figure 23.1, these include three levels of TLC contexts (global, national, and regional/ school), the nature of TLCs, and outcomes of

422

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Global Contexts Intergovernmental organizations International assessments International agreements, meetings and programs Teacher professional networks Human capital migration

National Contexts Professional status of teachers Cultural role of teachers Teacher workforce policies and systems

(Teacher recruitment, teacher education, hiring and distribution, induction, professional development, evaluation, and career advancement)

Regional and School Contexts

Nature of Teacher Learning Communities

Policies and Priorities

Individual and Group Orientation

(coherence and sustainability)

Structure (meeting schedule,

Time and Resources

amount, membership, format)

Access to Experts

Facilitator

Decision-Making Process Support of Collaboration, Teacher Leadership and Ownership

Goal and Focus Discourse Tools/Materials

Outcomes Improved teacher beliefs, knowledge and practice Improved group culture (shared value, trust, commitment) Improved student learning

Networks

Figure 23.1  Conceptual framework: teacher learning communities in a global context

TLCs. We found that most existing studies on TLCs have focused on the nature of TLCs, outcomes, and/or regional and school contexts in a single country. In the following section, we describe our conceptual framework by introducing findings from select articles from various countries that addressed the relevant domain(s) of TLCs as well as other related comparative and international studies on teachers and teaching. We provide a broad synthesis of findings for the domains of ‘global’ and ‘national’ contexts, reflecting the general focus of this literature. For the domains of ‘regional and school contexts’

and ‘nature of TLCs’ and ‘outcomes’, we complement our synthesis with findings from specific studies that highlight the type of research belonging to these domains. Our purpose is to broadly describe what existing research has revealed about these domains and why they are important to our understanding of TLCs.

Global Contexts First, it is important to understand that global contexts have increasingly influenced national

Comparative Research on Teacher Learning Communities in a Global Context 423

teacher policies during the last two decades (Akiba, 2013, 2017; Akiba & LeTendre, 2018; LeTendre & Wiseman, 2015; Paine & Zeichner, 2012). Intergovernmental organizations such as the OECD, UNESCO, and the World Bank produced influential reports on the conditions of teachers and needs for reforming the teaching workforce that are often cited by national governments (Gomendio, 2017; OECD, 2005, 2009, 2011; UNESCO Institute for Statistics, 2006; World Bank, 2012). The importance of reforming and supporting teacher learning was also communicated in two recent OECD reports (OECD, 2016a, 2016b). In addition, a nation’s ranking in international assessments such as the Programmes for International Student Assessment (PISA) and the Trends in International Mathematics and Science Study (TIMSS) in comparison to its perceived economic competitors has been used as a rationale or ‘warrant’ for large-scale reforms targeting teachers (Akiba & Shimizu, 2012; Avalos & Assael, 2006; Blomeke, 2006; LeTendre, Baker, Akiba, & Wiseman, 2001; Wiseman, 2010), including reforming professional development policies (Jones, 2011; Osborn, 2007). International agreements, meetings, and programs such as the annual International Summit on the Teaching Profession, the Bologna Process, Education for All, and Teach for All also continue to influence the way national governments engage in educational policy and reforms involving teachers (Paine & Zeichner, 2012). All of these global forces impact a nation’s teacher workforce system, which may support or hinder teacher practice in TLCs. Furthermore, Educational International, the international teachers union (Bascia, 2018; Sinyolo, 2018), and global online networks of teachers such as eTwinning (Blazic & Verswijvel, 2018) can influence the priority and approaches to TLCs. Finally, human capital migration, including teacher migration from developing countries to developed countries with a

teacher shortage (Bartlett, 2014; Brown & Stevick, 2014; Sharma, 2013, 2018), influence both the developing countries, which lose talented teachers, and the developed countries, which invite culturally different teachers into the teacher community.

National Contexts Second, it is important to understand national contexts – the professional status and cultural role of teachers, as well as teacher workforce policies and systems – and examine how they influence regional and school contexts and the nature and outcomes of TLCs within each country. In a country where teachers enjoy a high social and professional status with a higher salary, attractive benefits, and job security, professional learning time is often embedded into teachers’ work schedule and teachers have more autonomy to shape their own learning process (Akiba, 2016; Akiba et al., 2012; Liang & Akiba, 2018). When the teaching profession is a well-respected and well-paid occupation, there is also a larger number of teacher candidates and teachers are selected and hired from the most academically talented and committed group of candidates (Byun & Park, 2018; Han, Borgonovi, & Guerriero, 2018; Park & Byun, 2015). This, in turn, influences the level of content and pedagogical content knowledge teachers bring to collective learning in TLCs and its potential for improving teaching and student learning. The cultural role of teachers, including what qualities teachers expect children to develop (Tobin et  al., 1989, 2009), the cultural approach to teaching (Alexander, 2000; Givvin et  al., 2005; Hiebert et  al., 2005; Stigler & Hiebert, 1999), teachers’ role in social-emotional development (LeTendre, 1994, 2000), and student–teacher relationships (Vavrus & Bartlett, 2012) will likely shape teachers’ approach to collaborative learning. For example, if teachers value

424

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

the way students with different abilities learn from one another instead of learning through ability-based individualized tasks, those teachers may focus on how to contrast various approaches to solving a problem in a whole-class discussion instead of how to assign tasks based on student ability. On the other hand, if teachers are expected to maintain a strict hierarchical relationship with students, student-centered instruction where the teacher facilitates student learning instead of giving correct answers may not be appreciated by the parents and community members. A PLC group in this cultural context may focus on how to lead students to correct answers instead of how to allow students to express their thinking to deepen their understanding. National approaches to teacher workforce management and development also influence the quality and outcomes of TLCs either directly or indirectly through regional and school contexts. When teachers are recruited from the most academically accomplished and committed group of candidates and when teacher candidates are introduced to TLCs as part of their teacher preparation, new teachers are well prepared to contribute to TLCs. In addition, when regional educational authorities (REAs) hire and evenly distribute qualified teachers across schools, TLC groups will not suffer from the lack of experience, support, or knowledge that often occurs because young and under-qualified teachers are concentrated in high-poverty schools in many countries around the world (Akiba, LeTendre, & Scribner, 2007). Furthermore, national policies or approaches to new teacher induction, professional development, teacher evaluation, and career advancement could influence TLCs in multiple ways. If TLCs are recognized by the national or federal educational authorities and integrated into teacher development and advancement processes, new teachers are inducted and mentored through their participation in TLCs, and teacher learning and leadership are promoted via TLCs.

Regional and school contexts Regional educational authorities (REAs) (e.g., state departments of education, regional ministries of education, local education bureaus, school districts) and schools are important local contexts of TLCs as they influence educational policies and priorities, resource allocation, and governance structure. Specifically, professional development policies and priorities influence the coherence and sustainability of TLCs. If TLCs are not supported by REAs, teachers will be pulled towards multiple incoherent learning priorities set by the REAs and schools, unable to obtain sufficient time and resources necessary for sustaining TLCs. Access to content and pedagogical experts, such as the knowledgeable other in lesson study (Takahashi, 2014), deepens collaborative teacher learning, but it can be costly, and REAs and schools’ investment is critical for supporting teachers’ rich learning opportunities in TLCs. When REAs and schools actively involve teachers in decision-making processes regarding professional development policies, priorities and resources, teachers are able to communicate the importance of TLCs and secure necessary supports from REAs and school leaders. Finally, REAs’ and schools’ support of teacher collaboration, teacher leadership and ownership is critical for the quality and sustainability of TLCs. If REAs and school leaders do not trust or support teacher leadership and ownership as a key driver of improvement of instruction and student learning, they may support traditional top-down, short-term professional development courses or seminars instead of TLCs. Various single-country studies have examined regional and school contexts of TLCs. In a qualitative study of two urban elementary schools in Monterrey, Mexico, Flores, Rodríguez, and García (2015) identified time constraints due to school involvement in multiple projects and teachers’ work structure as a major challenge for creating a

Comparative Research on Teacher Learning Communities in a Global Context 425

space of reflection via PLC. Likewise, time and resources, including schedule, material resources, and communication conducive to collective learning, along with trusting teacher relationships and leadership support of teacher learning, were found to be associated with the organizational capacity of PLCs in a survey of 992 teachers in 76 elementary schools in the Netherlands (Sleegers, Brok, Verbiest, Moolenaar, & Daly, 2013). The importance of principal leadership for PLCs was also identified by two studies conducted in China. Based on a survey of 215 primary school teachers in Southwestern China, Zheng, Yin, Liu, and Ke (2016) found that principal leadership regarding goal setting, high expectation and trust, support of collaboration, and resource provision for instructional improvement was associated with the level of PLC, and this relationship was mediated by trust among teachers. Sargent and Hannum (2009) also found in their mixed-methods study of teacher survey and interviews conducted in 71 villages in rural Gansu Province in China that principal support of collaboration, shared decisionmaking, and instructional improvement was associated with collaborative lesson planning. These international studies demonstrate the important role of regional and school contexts for supporting or hindering teachers’ collaborative learning through TLCs.

Nature of TLCs A large number of the existing international studies on TLCs have focused on understanding the nature of TLCs to examine what goes on inside TLCs. The nature of TLCs can be understood based on seven key aspects: (1) individual and group orientations, such as willingness to learn, shared professional norms and language, and trust among teachers; (2) the structure, including meeting schedules, amount of meeting time, membership of the community (e.g., grade level team, subject area group), and format (online

or face to face); (3) the facilitator’s role and orientation for guiding the group learning process; (4) the goal and focus of the group learning, (5) the nature of discourse among teachers; (6) the tools and materials available to the group to enhance teacher learning; and (7) the networks with other TLC groups and content and pedagogical experts inside and outside the school. For example, based on a survey of 207 teacher learning teams in Singapore, Ning, Lee, and Lee (2015) found that a team’s orientation towards collectivism, as defined by a prioritization of group interests over selfinterest and collegiality defined by respect, trust, norms of critical inquiry and improvement, predicted the level of team collaboration. Akiba, Murata, Howard, and Wilkinson (2019) found that facilitator’s focus on student thinking in lesson study groups was strongly associated with increased teacher knowledge, self-efficacy, and expectation for students. Based on video data and case study data of teacher discourses in lesson study in London, Warwick, Vrikki, Vermunt, Mercer, and Halem (2016) also found that a group focus on student outcomes allowed teachers to develop pedagogy that addresses student needs. They further found that dialogic space characterized by supportive moves in interaction allowed teachers to agree on changes in their teaching. A synthesis of eleven studies on PLCs in the US and the UK conducted by Vescio et al. (2008) also reported the importance of a consistent focus on student learning and instructional strategies to support collaborative teacher learning. Two studies conducted in South Africa and the UK found the importance of tools and materials in TLC contexts. Based on a qualitative study of one PLC in South Africa, Fataar and Feldman (2016) found that a pedagogical tool that promoted students’ active participation and flexibility in pedagogical approach allowed teachers to move from a focus on keeping order and discipline to a more participatory approach in their teaching. In the UK, Wake, Swan, and Foster’s

426

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

qualitative study (2016) reported that a lesson study group’s use of carefully designed artifacts on anticipated student approaches and the nature of progression in problemsolving, allowed teachers to guide a research lesson and reflection in post-lesson discussions, facilitating situated learning in the contexts of classrooms and group learning. In addition, a qualitative study on lesson study conducted in three schools in Australia found that teachers perceived that a large number of observers at the research lessons and the post-lesson discussions, and the insight provided by the knowledgeable other were critical factors contributing to the success of lesson study (Grove, Doig, Vale, & Widjaja, 2016), indicating the importance of networks for enhancing collaborative teacher learning.

Outcomes of TLCs Various outcomes of TLCs need to be examined in order to understand whether and how TLCs lead to positive teacher, group, and student outcomes. Effective TLCs should improve teachers’ beliefs about teaching and student learning, content and pedagogical content knowledge, and instructional practice for providing rich learning opportunities to students. TLCs can also improve the group culture, including development of shared norms and values, trust among the group members, and group commitment to continuous learning and improvement of teaching and student learning. These individual and group changes as a result of TLC activities should improve student learning opportunities and achievement. Based on a survey of over 2,000 teachers in China, Sargent (2015) found that frequency of participation in PLCs was associated with teachers’ use of innovative teaching methods, including open-ended questions and inquiry learning. Vanblaere and Devos (2016) also found, based on a survey of 490 teachers in Belgium, that encouraging student expression and reflective dialogue in PLCs was

associated with perceived change in practice. Likewise, qualitative studies of lesson study groups found that teacher participation led to improved teaching practice with a focus on student discourse, student thinking, and questioning strategies in Australia (Gee & Whaley, 2016), and improved use of teaching materials and learning tasks and promotion of student ownership of learning and grouping in Malaysia (Lim, Kor, & Chia, 2016). Teachers in this Malaysian study also improved their anticipation of student responses in lesson planning. Other studies have reported improved beliefs, knowledge, and group culture in addition to improved teaching practice as a result of teacher participation in PLCs or lesson study. Collective learning and application through PLCs were associated with teacher commitment to supporting student learning and social integration in Hong Kong (Lee, Zhang, & Yin, 2011). In India, Kumar and Subramaniam (2015) reported that, through participation in PLCs, teachers shifted their role from reliance on textbooks to establishing the connections between contexts and representations based on their deepened knowledge of integers and increased emphasis on students’ understanding. In addition to improving knowledge, teachers reported an improved group culture in qualitative studies of lesson study in the Philippines and the UK. Elementary school science teachers in Gutierez’s study (2016) in the Phillippines reported that lesson study helped build a collaborative and professional working environment, and secondary mathematics teachers in a study conducted by Cajkler, Wood, Norton, and Pedder (2014) reported a stronger sense of teacher community in addition to improvement in their understanding of students and student-centered approaches. While there is sufficient evidence on the benefits of TLCs in improving teacher beliefs, knowledge, and teaching practice as well as group culture, few empirical studies on the impact of TLCs on student learning are available outside the US or the UK. In a

Comparative Research on Teacher Learning Communities in a Global Context 427

synthesis of PLC studies, Vescio et al. (2008) identified eight studies – seven of which were conducted in the US and one in the UK – that examined the impacts of PLCs on student achievement growth, all of which reported improved student learning as a result of PLCs. For example, Bolam, McMahon, Stoll, Thomas, and Wallace (2005) used a national data on student assessment in the UK and reported a statistically significant relationship between PLC characteristics and student achievement at both elementary and secondary levels. Louis and Marks (1998) similarly reported that schools in the US with a higher level of PLCs had higher student achievement. A significant learning gain was also observed in PLCs where teachers engaged in structured, sustained, and supported instructional discussions on the relationship between teaching and student work in the studies conducted by Supovitz (2002) and Supovitz and Christman (2003). Only a small number of experimental studies are available on the effectiveness of TLCs on student learning. Lewis and Perry (2014, 2017) conducted a randomized field trial of 39 lesson study groups across the U.S. and found that teachers in experimental lesson study groups who were supported by rich mathematics resources improved their knowledge and student achievement more than the control group teachers. Although not specifically focused on TLCs, prior experimental studies on professional development programs that incorporated teacher collaboration in the US also found a positive impact on student achievement growth. Saunders, Goldenberg, and Gallimore (2009) conducted a quasiexperimental study of grade-level teams in 15 Title I schools and found that experimental schools that used explicit protocols focused on students’ needs and how to instructionally address them improved student achievement more than control schools. Focusing on elementary science, Heller, Daehler, Wong, Shinohara, and Miratrix (2012) compared the impacts of c­ollaboration-based professional development programs and found

that Teaching Cases and Looking at Student Work, which focused on analysis of student work and classroom tasks that reveal students’ conceptual understanding, improved the conceptual knowledge of teachers and students. In summary, prior research conducted in various national contexts points to the importance of examining these five domains of TLCs. Existing studies on TLCs focused on the nature of TLCs, regional or school contexts, and the outcomes of TLCs, and they found that school priorities, time and resources, decision-making process, and principal support of collaboration are important factors influencing the level and nature of TLCs. There is also convincing evidence in various national contexts that TLCs can improve individual and collective teacher outcomes as well as student learning, especially when TLCs focus on student learning and instructional strategies. However, more research is needed to reveal how global and national contexts influence regional and school contexts and the nature and outcomes of TLCs. One way to address this need is through comparative research that reveals the role of global and national contexts by contrasting two or more countries. Such research could usefully inform policymakers and administrators in supporting TLCs in the midst of a global trend of adopting and adapting TLCs such as PLCs and lesson study.

ILLUSTRATION OF COMPARATIVE STUDIES ON TEACHER LEARNING COMMUNITIES We introduce two different types of comparative research on TLCs: a quantitative study that uses a large-scale international database to understand global and cross-national patterns of teacher collaborative activities in 32 countries and a comparative mixed-methods study of lesson study in Japan and the US. Because these two types of studies fall within

428

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

the small number of comparative and crossnational studies on TLCs that examined two or more countries, they are illustrative of the types of studies that are needed to be conducted by a larger number of researchers in various regions around the globe.

Study 1: Cross-national study of teacher collaborative activities in 32 countries International databases are a rich source for understanding global and cross-national trends in key dimensions of education because of the availability of extensive data collected from school leaders, teachers, and students. Our review of all existing international databases found three databases – the Trends in International Mathematics and Science Study (TIMSS), the Progress in International Reading Literacy Study (PIRLS), and the Teaching and Learning International Study (TALIS) – that include survey data collected from teachers about their professional learning activities (Akiba, 2015). Among these databases, TALIS includes the most detailed survey data, gathered from 32 countries, which enabled us to address the following research questions: (1) What types of teacher collaboration are most common around the globe? and (2) How does the frequency of teacher collaboration vary across 32 countries in eight regions? This study addresses the amount and focus of TLCs – two aspects for understanding the nature of TLCs in Figure 23.1. We conducted a secondary analysis of survey data collected from a nationally representative sample of lower secondary teachers in the 2013 TALIS dataset. Table 23.1a and 23.1b present the comparison of frequencies of teacher collaboration per year with four foci: (a) Discussion of students: Teachers engage in discussions about the learning development of specific students; (b) Exchange materials: Teachers exchange teaching materials with colleagues; (c) Common assessment:

Teachers work with other teachers to ensure common standards in evaluations for assessing student progress; and (d) Observation: Teachers observe other teachers’ classes and provide feedback. Teachers’ responses to these multiple-choice questions were recorded as 0=Never, 1=Once a year or less, 3=2–4 times a year, 7.5=5–10 times a year, 18=1–3 times a month, and 36=Once a week or more, so that the number represents the frequency of engaging in each type of collaborative learning activities per year. For the first research question, we can see from the international mean that, across the 32 countries, discussion of students is the most common, with the international mean of 19.3 times a year, followed by exchange of materials (14.2 times a year) and establishing common assessment (12.7 times). Peer observation and feedback is the least common with only 3.6 times per year on average across 32 countries. All of these four foci are important for professional learning, but observation of teaching and student learning is an especially powerful way to discuss specific aspects of teaching and student learning. Because previous studies have identified the critical importance of observation of teaching and student learning for changing teacher beliefs and practice (Clarke & Hollingsworth, 2002; Cohen & Ball, 2001; Opfer & Pedder, 2011), the lack of opportunities for teachers to observe teaching and engage in discussion of detailed aspects of teaching and student learning in many countries is concerning. For the second research question, Table 23.1a and 23.1b also show a major crossnational variation in each of four foci of collaboration. The frequency of discussions of the learning development of specific students varied from 4.6 times per year in Korea to 27.3 times per year in Spain. The frequency for exchanging materials and common assessments ranged from 8.7 times in Korea to 24.4 times in Australia, and from 4.6 times in Korea to 17.5 times in Australia, respectively. Observation of teaching with feedback, which occurs less frequently across

Comparative Research on Teacher Learning Communities in a Global Context 429

Table 23.1a  Comparison of teacher collaboration activities from TALIS 2013 teacher survey Discussion of students

Exchange materials

Country

Region

N

Average frequency per year

Region

N

Average frequency per year

1 2

Spain Sweden

SE NE

3,248 3,159

27.3 26.7

1 2

Australia England (UK)

AO WE

1,913 2,344

24.4 21.9

3

Finland

NE

2,687

25.4

3

Norway

NE

2,773

19.7

4

England (UK)

WE

2,342

5

Australia

AO

1,909

25.0

4

Israel

ME

3,221

18.1

24.6

5

Abu Dhabi (UAE)

ME

2,207

18.1

6

Poland

EE

7

Czech Republic

EE

3,786

24.3

6

Portugal

SE

3,588

18.0

3,191

24.1

7

Singapore

AO

3,093

17.3

8

Estonia

9

France

NE

3,043

23.5

8

Denmark

NE

1,583

16.8

WE

2,810

23.5

9

Alberta (Canada)

NA

1,728

16.1

10

Romania

11

Alberta (Canada)

EE

3,244

22.9

10

Flanders (Belgium)

WE

3,045

16.1

NA

1,725

22.4

11

Slovak Republic

EE

3,449

15.4

12

Norway

13

Italy

NE

2,758

21.8

12

United States

NA

1,851

15.2

SE

3,266

21.1

13

Malaysia

AO

2,952

15.2

14 15

Latvia

NE

2,073

21.1

14

Spain

SE

3,250

14.4

Bulgaria

EE

2,904

20.6

15

Czech Republic

EE

3,195

13.8

16 17

United States

NA

1,852

20.2

16

Bulgaria

EE

2,906

13.6

Israel

ME

3,208

19.3

17

France

WE

2,807

13.5

18

Denmark

NE

1,584

19.2

18

Japan

AO

3,454

13.5

19

Serbia

SE

3,781

18.6

19

Italy

SE

3,264

13.3

20

Abu Dhabi (UAE)

ME

2,203

18.3

20

Chile

CSA

1,505

12.9

21

Portugal

SE

3,584

17.4

21

Poland

EE

3,785

12.7

22

Singapore

AO

3,085

17.3

22

Netherlands

WE

1,791

12.4

23

Malaysia

AO

2,953

17.0

23

Finland

NE

2,686

11.9

24

Croatia

SE

3,595

16.8

24

Sweden

NE

3,151

11.5

25

Japan

AO

3,463

16.8

25

Latvia

NE

2,072

10.7

26

Brazil

CSA

13,117

15.4

26

Serbia

SE

3,778

10.5

27

Chile

CSA

1,497

14.8

27

Brazil

CSA

13,016

10.3

28

Slovak Republic

EE

3,435

13.6

28

Mexico

CSA

3,091

10.1

29

Netherlands

WE

1,790

12.5

29

Romania

EE

3,239

10.0

30

Mexico

CSA

3,094

10.1

30

Estonia

NE

3,026

9.6

31

Flanders (Belgium)

WE

3,038

9.9

31

Croatia

SE

3,591

8.9

32

Korea

AO

2,793

4.6

32

Korea

AO

2,811

International mean

19.3

Country

International mean

8.7 14.2

Note: AO=Asia & Oceania, ME=Middle East, NE=Northern Europe, EE=Eastern Europe, WE=Western Europe, SE=Southern Europe, NA=North America, CSA=Central/South America

430

THE SAGE HANDBOOK OF COMPARATIVE STUDIES IN EDUCATION

Table 23.1b  Comparison of teacher collaboration activities from TALIS 2013 teacher survey (continued) Common assessments Country

Observation

Region

N

Average frequency per year

Country

Region

N

Average frequency per year

ME SE

2,188 3,246

7.9 6.1

1 2

Australia Abu Dhabi (UAE)

AO ME

1,913 2,201

17.5 17.4

1 2

Abu Dhabi (UAE) Italy

3

Czech Republic

EE

3,186

17.3

3

Denmark

NE

1,581

6.0

4

Spain

SE

3,232

16.7

4

Japan

AO

3,469

5.8

5

Poland

EE

3,787

16.7

5

Romania

EE

3,230

5.3

6

Sweden

NE

3,163

16.3

6

Norway

NE

2,757

5.1

7

Singapore

AO

3,088

15.6

7

England (UK)

WE

2,344

5.1

8

Israel

ME

3,180

14.9

8

Singapore

AO

3,089

4.5

9

Norway

NE

2,738

14.6

9

Poland

10

England (UK)

WE

2,339

14.1

10

Chile

11

Portugal

SE

3,576

14.0

11

Slovak Republic

12

United States

NA

1,855

13.8

12

Mexico

13

Malaysia

AO

2,954

13.5

13

Australia

14

Italy

SE

3,258

13.3

14

Czech Republic

15

Romania

EE

3,237

13.3

15

16

Latvia

NE

2,068

13.2

16

17

Alberta (Canada)

NA

1,727

13.0

17

18

Estonia

NE

3,020

12.9

18

19

Finland

NE

2,680

12.9

20

Chile

CSA

1,496

12.4

21

Bulgaria

EE

2,852

22

Slovak Republic

EE

3,438

23

Brazil

CSA

24

Denmark

NE

25

Croatia

26

Serbia

27 28

EE

3,783

4.5

CSA

1,512

4.4

EE

3,428

4.3

CSA

3,073

4.2

AO

1,912

4.0

EE

3,191

3.9

Sweden

NE

3,164

3.5

Latvia

NE

2,067

3.2

Serbia

SE

3,768

3.1

Korea

AO

2,791

3.1

19

Netherlands

WE

1,788

3.0

20

Malaysia

AO

2,951

3.0

12.3

21

Alberta (Canada)

NA

1,729

2.9

12.2

22

Bulgaria

EE

2,893

2.7

13,038

11.7

23

United States

NA

1,854

2.7

1,581

10.4

24

Estonia

NE

3,023

2.7

SE

3,586

10.3

25

Portugal

SE

3,574

2.6

SE

3,741

10.1

26

Israel

ME

3,214

2.1

France

WE

2,796

9.0

27

Brazil

CSA

12,966

2.1

Mexico

CSA

3,080

8.9

28

Finland

NE

2,683

1.8

29

Netherlands

WE

1,783

8.5

29

France

WE

2,799

1.2

30

Flanders (Belgium)

WE

3,029

8.5

30

Spain

SE

3,242

1.1

31

Japan

AO

3,454

7.7

31

Croatia

SE

3,573

1.0

32

Korea

AO

2,804

4.6

32

Flanders (Belgium)

WE

3,042

0.8

International mean

12.7

International mean

3.6

Note: AO=Asia & Oceania, ME=Middle East, NE=Northern Europe, EE=Eastern Europe, WE=Western Europe, SE=Southern Europe, NA=North America, CSA=Central/South America

Comparative Research on Teacher Learning Communities in a Global Context 431

countries, ranged from 0.8 times in Belgium (Flanders region) to 7.9 times in Abu Dhabi in United Arab Emirates per year. We further examined if there are regional differences in the level of collaborative learning activities. Table 23.2 presents the mean frequencies of four types of activities in eight regions: Asia and Oceania, the Middle East, Northern Europe, Eastern Europe, Western Europe, Southern Europe, North America, and Central/South America. The ANOVA

of overall mean differences across the eight regions as well as post-hoc Tukey t-tests of mean difference for each pair of regions reported in Table 23.2 shows a statistically significant difference overall and between most of the pairs of regions in the level of all four types of collaborative learning activities. The data show some notable regional patterns. Policymakers and administrators in Central or South America regions may wonder why their teachers’ level of collaborative

Table 23.2  Comparison of teacher collaboration activities by regions Discussion of students Region

1 2 3 4 5 6 7 8

Northern Europe (NE) North America (NA) Eastern Europe (EE) Southern Europe (SE) Middle East (ME) Western Europe (WE) Asia & Oceania (AO) Central/South America (CSA)

Exchange materials 1

Average frequency per year

Diff.

Region

23.42 21.29 21.14 20.09 18.90 17.75 15.61 14.40

F=765.2*** 2

NE > (NA, EE) > SE > ME > WE > AO > CSA

1 2 3 4 5 6 7 8

Middle East (ME) Western Europe (WE) North America (NA) Asia & Oceania (AO) Northern Europe (NE) Eastern Europe (EE) Southern Europe (SE) Central/South America (CSA)

Common assessments Region

1 2 3 4 5 6 7 8

Middle East (ME) Eastern Europe (EE) Northern Europe (NE) North America (NA) Southern Europe (SE) Asia & Oceania (AO) Central/South America (CSA) Western Europe (WE)

Average frequency per year

Diff.1

18.12 16.05 15.63 15.15 13.12 13.11 12.98 10.52

F=351.7*** ME > (WE, NA, AO)3 > (NE, EE, SE)2 > CSA

Observation

Average frequency per year

Diff.1

15.94 14.44 13.67 13.42 12.79 11.32 11.30 9.98

F=232.5*** ME > EE > (NE, NA, SE)4 > (AO, CSA)2 > WE

Region

1 2 3 4 5 6 7 8

Middle East (ME) Eastern Europe (EE) Asia & Oceania (AO) Northern Europe (NE) North America (NA) Southern Europe (SE) Central/South America (CSA) Western Europe (WE)

Average frequency per year

Diff.1

4.45 4.20 4.17 3.54 2.83 2.76 2.63 2.30

F=143.5*** (ME, EE, AO)2 > NE > (NA, SE, CSA)2 > WE

*p