The Cost-Effectiveness of 22 Approaches for Raising Student Achievement [1 ed.] 9781617354045, 9781617354021

As a consequence of the federal "No Child Left Behind" (NCLB) law, there is tremendous pressure on school prin

155 61 2MB

English Pages 246 Year 2011

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement [1 ed.]
 9781617354045, 9781617354021

Citation preview

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

This page intentionally left blank.

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement Stuart S. Yeh University of Minnesota

INFORMATION AGE PUBLISHING, INC. Charlotte, NC • www.infoagepub.com

Library of Congress Cataloging-in-Publication Data Yeh, Stuart S.   The cost-effectiveness of 22 approaches for raising student achievement / Stuart S. Yeh.     p. cm.   Includes bibliographical references.   ISBN 978-1-61735-402-1 (pbk.) -- ISBN 978-1-61735-403-8 (hardcover) -ISBN 978-1-61735-404-5 (e-book) 1. Academic achievement--Cost effectiveness. 2. School improvement programs--Cost effectiveness. I. Title. II. Title: Cost-effectiveness of twenty-two approaches for raising student achievement.   LB1062.6.Y44 2011   371.2’07--dc22                           2011005762

Copyright © 2011 Information Age Publishing Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the publisher.

Printed in the United States of America

Contents



List of Figures....................................................................................... xi

1

The Achievement Gap........................................................................... 1 Research Regarding the Development of Self-Efficacy.................. 2 Control, Engagement, and Achievement............................................. 2 Learned Helplessness......................................................................... 5 Performance Feedback........................................................................ 7 Autonomy and Standards................................................................ 10 Summary......................................................................................... 10 Rapid Assessment............................................................................. 11 Rigorous Studies of Rapid Assessment.............................................. 14 Conclusion........................................................................................ 16

2

The Cost-Effectiveness of Five Policies for Improving Student Achievement........................................................................... 17 Effect Size Estimates........................................................................ 18 Rapid Assessment............................................................................ 19 Increased Spending.......................................................................... 19 Voucher Programs............................................................................ 19



List of Tables.......................................................................................xiii Preface.................................................................................................. xv Acknowledgements............................................................................ xvii Introduction........................................................................................xix

v

vi    Contents

Charter Schools................................................................................ 21 Increased Accountability................................................................... 23 Cost-Effectiveness Analysis.............................................................. 24 Costs of Rapid Assessment................................................................ 25 Cost to Increase Educational Expenditure by 10 Percent.................... 28 Costs of Voucher Programs............................................................... 28 Costs of Charter Schools................................................................... 32 Costs of Accountability..................................................................... 33 Summary......................................................................................... 36 Sensitivity Analysis I........................................................................ 37 Sensitivity Analysis II...................................................................... 38 Conclusion........................................................................................ 38

3

The Cost-Effectiveness of Class Size Reduction.............................. 41 Empirical Studies............................................................................. 42 Meta-Analyses................................................................................. 43 Tennessee......................................................................................... 43 California........................................................................................ 44 Wisconsin........................................................................................ 44 RAND Cost Estimates...................................................................... 46 Summary......................................................................................... 48 Discussion......................................................................................... 49

4

The Cost-Effectiveness of Comprehensive School Reform............ 53 Evidence Regarding CSR................................................................ 55 Unexplained Variation in Results.................................................... 55 Summary......................................................................................... 58 Cost-Effectiveness of CSR................................................................ 59 Comparisons of Cost-Effectiveness................................................. 65 Sensitivity Analysis I........................................................................ 67 Sensitivity Analysis II...................................................................... 68 Discussion......................................................................................... 68

5

The Cost-Effectiveness of Value-Added Teacher Assessment........ 73 Background...................................................................................... 74 Costs of Teacher Replacement........................................................ 75 Total Cost to Society of Gordon et al.’s (2006) Proposal..................... 78

Contents    vii

A Less Expensive Version................................................................. 78 Cost-Effectiveness of Gordon et al.’s (2006) Proposal.................. 79 Conclusion........................................................................................ 80

6

The Cost-Effectiveness of NBPTS Teacher Certification............... 85 Costs of NBPTS Certification......................................................... 89 Effects of NBPTS Certification....................................................... 90 Relative Cost-Effectiveness.............................................................. 94 Discussion......................................................................................... 96

7

The Cost-Effectiveness of Raising Teacher Quality........................ 99 Need for Cost-Effectiveness Studies............................................. 101 Setting a Minimum SAT Test Score.............................................. 102 The Impact of Employing High-Ability Teachers........................ 105 The Cost-Effectiveness of Employing High-Ability Teachers..... 106 Sensitivity Analysis I...................................................................... 106 Sensitivity Analysis II.................................................................... 107 Sensitivity Analysis III................................................................... 107 Discussion....................................................................................... 108

8

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement....................................................................................... 113 Search Protocol...............................................................................116 Method............................................................................................117 Results..............................................................................................117 Rapid Assessment...........................................................................117 Comprehensive School Reform........................................................ 121 Cross-Age Tutoring........................................................................ 122 Computer-Assisted Instruction........................................................ 122 Longer School Day......................................................................... 122 Teacher Education......................................................................... 123 Teacher Experience......................................................................... 123 Teacher Salary............................................................................... 123 Summer School.............................................................................. 123 More-Rigorous Math Courses......................................................... 123 Value-Added Teacher Assessment.................................................... 124 Class Size Reduction...................................................................... 125

viii    Contents

Ten Percent Increase in Spending................................................... 126 Full-Day Kindergarten................................................................... 127 Head Start..................................................................................... 127 High-Standards Exit Exam............................................................ 127 NBPTS Teacher Certification......................................................... 130 Licensure Test Scores...................................................................... 131 High-Quality Preschool.................................................................. 132 Additional School Year................................................................... 133 Voucher Programs.......................................................................... 133 Charter Schools.............................................................................. 136 Discussion....................................................................................... 138

9

Shifting the Bell Curve..................................................................... 141 Effects on Educational Attainment and Earnings...................... 142 Rapid Assessment........................................................................... 143 Costs of Rapid Assessment............................................................ 144 Benefits of the Proposed Intervention......................................... 146 Value of Less Crime........................................................................ 147 Welfare Savings............................................................................. 147 Tax Savings.................................................................................. 148 Benefit–Cost Analysis.................................................................... 149 National Social Benefit–Cost Ratio................................................ 149 Federal Treasury Benefit–Cost Ratio............................................... 150 State Social Benefit–Cost Ratios..................................................... 150 State Fiscal Benefit–Cost Ratios...................................................... 150 Sensitivity Analysis I...................................................................... 151 Sensitivity Analysis II.................................................................... 151 Sensitivity Analysis III................................................................... 154 Sensitivity Analysis IV................................................................... 154 Sensitivity Analysis V..................................................................... 154 Discussion....................................................................................... 155



Conclusion.......................................................................................... 159 Control, Engagement, and Achievement......................................... 161

Contents    ix

A B

Notes to Table 5.1.............................................................................. 167

C

Notes to Table 8.1.............................................................................. 175



Notes................................................................................................... 177

Opportunity Cost to Society of Employing the Labor of a Worker Who Replaces a Bottom-Quartile Teacher........................ 171

Chapter 1........................................................................................ 177 Chapter 2........................................................................................ 177 Chapter 3........................................................................................ 178 Chapter 4........................................................................................ 179 Chapter 5........................................................................................ 181 Chapter 6........................................................................................ 188 Chapter 7........................................................................................ 191 Chapter 8........................................................................................ 192



References.......................................................................................... 193 About the Author............................................................................... 223

This page intentionally left blank.

List of Figures

Figure 1.1 Figure 1.2 Figure 8.1

Language arts task output without performance feedback..... 8 Language arts task output with performance feedback.......... 9 Relative magnitude of 22 effectiveness–cost ratios.............. 140

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, page xi Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

xi

This page intentionally left blank.

List of Tables

Table 2.1 Table 2.2 Table 2.3 Table 2.4 Table 2.5 Table 2.6 Table 2.7 Table 3.1

Table 3.2 Table 4.1 Table 4.2 Table 4.3 Table 5.1

Effect Sizes in Standard Deviation Units for Key Studies of Voucher Programs............................................................... 21 Effect Sizes in Standard Deviation Units for Key Studies of Charter Schools................................................................... 23 Cash Costs to Implement Reading Assessment...................... 26 Cash Costs to Implement Math Assessment........................... 26 Annual Per-Pupil Cost to Society of a Voucher Program....... 32 Annual Per-Pupil Cost to Society of a Charter School Program.................................................................................... 34 Effect Size, Annual Cost per Student, Total Annual Cost, and Effectiveness–Cost Ratio......................................... 37 Average Annual Cost per Enrolled Student for Every 1-Student Reduction in Class Size Down to Class Size Limits of 15, 17, and 20 Students............................................ 47 Effectiveness–Cost Ratios for Various Class Size Reduction Effect Size and Cost Estimates......................................... 48 Annual Costs per Student to Implement Success For All, the School Development Model, and Direct Instruction................62 Effectiveness–Cost Ratios for CSR Models with the Strongest or Highly Promising Evidence of Effectiveness.......... 64 Comparison of Effect Size, Cost, and Effectiveness–Cost Ratios........................................................................................ 66 Cost to Society of Replacing One Teacher............................. 77

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages xiii–xiv Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

xiii

xiv    List of Tables

Table 5.2

Table 6.1

Table 6.2

Table 8.1

Table 9.1

Comparison of Effect Sizes, Costs, and Effectiveness– Cost Ratios for Various Interventions to Raise Student Achievement............................................................................. 81 Signaling Effect Sizes for NBPTS-Certified Teachers in Comparison with Teachers Who Were Never Certified by NBPTS.................................................................................. 93 Comparison of Effect Sizes, Costs, and Effectiveness– Cost Ratios for Various Interventions to Raise Student Achievement............................................................................. 95 Comparison of Effect Sizes, Costs, and Effectiveness– Cost Ratios for Various Interventions to Raise Student Achievement........................................................................... 118 Benefits and Costs of Raising Student Achievement by State.................................................................................... 152

Preface

T

his book synthesizes research regarding factors influencing each child’s healthy psychological development, academic engagement, and academic achievement. This synthesis overturns conventional notions about “what works” in improving student engagement and achievement, yet it is based on results that are arguably well-accepted among researchers. The contribution of the book is, therefore, in showing how existing research literature fits together in a way that has not previously been understood or accepted. The challenge is to explain to the reader how wellaccepted results can be organized into a new view that overturns conventional thinking. This requires tracing details of the analyses that support this new view. I present these details in the hope that the reader will gain an appreciation for the rigor of the analyses. Skeptics will find that I have provided essential information necessary to replicate the results. I draw upon publicly accessible sources of information and justify key analytical decisions. The chapters of this book are adapted with minor changes from nine of my previously published journal articles. As a result, every analysis in the book has undergone peer review, including the cost-effectiveness analyses. Numerous analytical challenges were addressed in arriving at the proper identification and estimation of relevant costs. Readers should keep in mind that while there is uncertainty in estimating those costs, it is unlikely that the magnitude of the differences in cost-effectiveness can be entirely explained by uncertainty in the estimates. The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages xv–xvi Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

xv

xvi    Preface

While the majority of the book is devoted to details of the cost-effectiveness analyses that will be of interest only to researchers, I hope that all readers will come away with an appreciation for the logic and the basis for the conclusion that there is an alternative way of understanding and viewing the persistence of the achievement gap and how that gap can be addressed.

Acknowledgements

I

wish to thank Joseph Ritter for his contributions as co-author of the article that is adapted here as chapter 5. I also wish to thank Nicola Alexander, Edward Haertel, Henry Levin, Darrell Lewis (posthumously), Arthur Reynolds, Karen Seashore, and Judy Temple for comments on earlier drafts of articles that are adapted here as book chapters. Figure 1.1 is adapted from Journal of Experimental Child Psychology, (8)1, Smith, D. E. P., Brethower, D., and Cabot, R., Increasing Task Behavior in a Language Arts Program by Providing Reinforcement, p. 48, copyright 1969, with permission from Elsevier. Figure 1.2 is adapted from Journal of Experimental Child Psychology, (8)1, Smith, D. E. P., Brethower, D., and Cabot, R., Increasing Task Behavior in a Language Arts Program by Providing Reinforcement, p. 58, copyright 1969, with permission from Elsevier. Chapter 1 is adapted from Assessment in Education, 17(2), Yeh, S. S., Understanding and Addressing the Achievement Gap Through Individualized Instruction and Formative Assessment, pp. 169–182, copyright 2010, with permission from Taylor and Francis. Chapter 2 is adapted from American Journal of Evaluation, 28(4), Yeh, S. S., The Cost-Effectiveness of Five Approaches for Raising Student Achievement, pp. 416–436, copyright 2007, with permission from Sage.

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages xvii–xviii Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved. xvii

xviii    Acknowledgements

Chapter 3 is adapted from Educational Research Review 4(1), Yeh, S. S., The Cost-Effectiveness of Class Size Reduction, pp. 7–15, copyright 2009, with permission from Elsevier. Chapter 4 is adapted from Education Policy Analysis Archives, 16(13), Yeh, S. S., The Cost-Effectiveness of Comprehensive School Reform and Rapid Assessment, copyright 2008, with permission from Education Policy Analysis Archives. Chapter 5 is adapted from Journal of Education Finance, 34(4), Yeh, S. S., & Ritter, J., The Cost-Effectiveness of Replacing the Bottom Quartile of Novice Teachers Through Value-Added Teacher Assessment, pp. 426–451, copyright 2009, with permission from the University of Illinois Press. Chapter 6 is adapted from Evaluation Review, 34(3), Yeh, S. S., The CostEffectiveness of NBPTS Teacher Certification, pp. 220–241, copyright 2010, with permission from Sage. Chapter 7 is adapted from Educational Research Review, 4(3), Yeh, S. S., The Cost-Effectiveness of Raising Teacher Quality, pp. 220–232, copyright 2009, with permission from Elsevier. Chapter 8 is adapted from Journal of Education Finance, 36(1), Yeh, S. S., The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pp. 38–75, copyright 2010, with permission from the University of Illinois Press. Chapter 9 is adapted from Evaluation and Program Planning, 32(1), Yeh, S. S., Shifting the Bell Curve: The Benefits and Costs of Raising Student Achievement, pp. 74–82, copyright 2009, with permission from Elsevier.

Introduction

T

he analyses presented in this book suggest that the most popular approaches for raising student achievement—voucher programs, charter schools, class size reduction, comprehensive school reform, increased educational accountability, and increased teacher quality—are relatively inefficient. The analyses identify a particular approach for raising student achievement that is not only more cost-effective in dollar terms, but efficient in the sense that it does not rely on unusual investments in the time required to obtain results. Lacking this knowledge, teachers are currently forced to rely on extremely inefficient approaches that take enormous amounts of time, both during the school day and throughout the K–12 learning years. This is experienced in terms of the reduced time that is available to teach subjects other than math and reading, as schools resort to double periods of math, double periods of reading, and enormous amounts of remedial instruction that directly reduce the time available for other subjects, including science, art, and music. This book integrates an array of research findings into a coherent narrative that aims to explain the persistence of the achievement gap between disadvantaged students and their more advantaged peers. This explanation differs from the socioeconomic and cultural explanations offered by other researchers. In essence, I argue that researchers have overlooked a pattern of research findings that point away from current beliefs regarding “what works” and points toward a very specific problem underlying poor student engagement and achievement. The current failure to identify this probThe Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages xix–xxii Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

xix

xx    Introduction

lem is analogous to a hypothetical failure of medical professionals to recognize an epidemic disease and, instead, attribute high rates of mortality to inadequate resources, accountability, and competition. However, if the cause of illness is a specific disease, it makes little sense to increase funding indiscriminately, increase accountability for doctors and patients, increase competition, reform hospitals, stiffen credentialing of doctors, replace the bottom quartile of doctors, and reduce overcrowding in hospitals. While an argument can be made for each of these proposals, they clearly would be less effective than an intervention based on a precise identification of the specific causative disease. If it is true that there is a specific problem underlying poor student engagement and achievement, then the nature of the problem should be revealed through a careful review of existing research literature. This review is provided in chapter 1. An intervention based on this diagnosis should be especially effective, measured in terms of the impact per dollar— in other words, its cost-effectiveness. This evidence is provided in chapters 2 through 8. The combination of this evidence—a diagnosis based on existing research literature plus an intervention that is cost-effective—constitutes evidence regarding the validity of the proposed diagnosis. While research literature is open to multiple interpretations, an intervention that appears to be highly cost-effective and is based on a specific interpretation of the literature lends support to the validity of that particular interpretation of the literature. Chapter 1, “The Achievement Gap,” frames the topic by reviewing key research studies suggesting an important modification of current views regarding the origin of, and effective treatments for, the academic achievement gap between disadvantaged students and their more advantaged peers. Currently, age-graded schools and group tests label students as “below” and “above” average, inadvertently demoralizing below-average students, depressing effort and achievement, and perpetuating the gap in achievement between poor students and their more affluent peers. Analysis of the research literature suggests that the psychological experience of school for both high- and low-achieving students may be altered through a structure where instruction and assessment are individualized, students are challenged at their own levels, and each student receives objective assessment information confirming that he or she is successfully advancing to higher levels. Chapter 2, “The Cost-Effectiveness of Five Policies for Improving Student Achievement,” compares the cost-effectiveness of voucher programs, charter schools, a 10 percent increase in spending per pupil, increased educational accountability, and the implementation of rapid assessment sys-

Introduction    xxi

tems in which student performance in math and reading is rapidly assessed between 2 and 5 times per week. The analysis suggests that rapid assessment is more cost-effective than the alternatives. Chapter 3, “The Cost-Effectiveness of Class Size Reduction,” compares the cost-effectiveness of class size reduction with the cost-effectiveness of rapid assessment using four alternative sets of assumptions. Chapter 4, “The Cost-Effectiveness of Comprehensive School Reform,” compares the cost-effectiveness of 29 comprehensive school reform (CSR) models with the cost-effectiveness of rapid assessment. Chapter 5, “The Cost-Effectiveness of Value-Added Teacher Assessment,” addresses the growing consensus among educational researchers that student achievement may be improved by using value-added assessments of teacher performance to identify, select, and retain high-performing teachers. The cost-effectiveness results reported in chapter 5 suggest that the use of value-added teacher assessments to identify and replace lowperforming teachers would be an extremely costly and inefficient approach for raising student achievement. Chapter 6, “The Cost-Effectiveness of NBPTS Teacher Certification,” analyzes the cost-effectiveness of certification by the National Board for Professional Teaching Standards (NBPTS). NBPTS certification is believed by many educators to be a promising approach for identifying high-quality teachers. However, the results suggest that improving teacher quality through NBPTS certification is not an efficient approach for raising student achievement. Chapter 7, “The Cost-Effectiveness of Raising Teacher Quality,” analyzes the cost-effectiveness of raising teacher quality by introducing a requirement that all teacher candidates must meet a minimum SAT score of 1000. However, the results suggest that this approach is not an efficient approach for raising student achievement because it would be necessary to raise teacher salaries by 45 percent in order to attract a sufficient number of teachers with the requisite cognitive aptitude. Chapter 8, “The Cost-Effectiveness of 22 Approaches for Raising Student Achievement,” synthesizes the results of the previous seven chapters and integrates original analyses and existing literature regarding the cost-effectiveness of 22 strategies for raising student achievement. The results suggest that rapid assessment is more efficient than the available alternatives. Chapter 9, “Shifting the Bell Curve: The Benefits and Costs of Raising Student Achievement,” estimates the social benefits and costs of implementing rapid assessment in grades 1 through 8 for all American students

xxii    Introduction

and through grade 12 for the lowest-performing 40 percent of all American students. The results suggest that the intervention would increase lifetime earnings by an average of $769,000 per student. Increases in tax revenue would allow the federal government or individual state governments to recoup the cost of the intervention. The Conclusion summarizes the cost-effectiveness results. The pattern of results is analyzed with regard to the literature reviewed in Chapter 1.

1 The Achievement Gap

T

his chapter suggests an important modification of current views regarding the origin of, and effective treatments for, the academic achievement gap between disadvantaged students and their more advantaged peers. Currently, age-graded schools and group tests label students as “below” and “above” average, inadvertently demoralizing below-average students, depressing effort and achievement, and perpetuating the gap in achievement between poor students and their more affluent peers. Analysis of the research literature suggests that the psychological experience of school for both high- and low-achieving students may be altered through a structure where instruction and assessment are individualized, students are challenged at their own levels, and each student receives objective assessment information confirming that he or she is successfully advancing to higher levels. Thus, properly designed classroom assessments may have a powerful psychological effect on children as well as providing formative information to improve instruction. This chapter reviews basic research regarding child development that is relevant for understanding effective ways of designing instruction and assessment. The review suggests that rapid classroom The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 1–16 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

1

2    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

assessment may improve student engagement, effort, and achievement by influencing student perceptions of their own abilities to control their academic performances, as well as providing formative information. The review suggests the process by which this is achieved and the characteristics of individualized instruction and assessment that are most effective in promoting children’s academic self-efficacy in a practical way. Section I synthesizes findings regarding the development of competence and learned helplessness, and factors influencing persistence and intrinsic motivation, and suggests the process through which small differences in early achievement are magnified by the current structure of schools. Section II reviews evidence suggesting that the characteristics of a specific type of individualized instruction and assessment system may be especially suited to remediate these differences.

Research Regarding the Development of Self-Efficacy In a classic article, White (1959) presented evidence suggesting that Darwinian natural selection, operating over millions of years, favored individuals who genetically enjoy exploration and manipulation of their environment, serving to establish this proclivity as a general tendency across the species. Consequently, this tendency appears as an instinctive, innate trait, driving behavior and learning throughout the developmental years from birth (Lewis, Sullivan, & Brooks-Gunn, 1985; Watson, 1972) through secondary school. When the environment is successfully controlled, enjoyment is expressed as a sense of efficacy or competence. Thus, humans have a need to feel competent: The fundamental need for competence is built into human psychophysiology and is manifest in the infant’s ability to detect contingencies between the self and desired and undesired events (Watson, 1966), in the infant’s joy at contingent stimulation (Finkelstein & Ramey, 1977), in the disruptive effects of noncontingency (Seligman, 1975), and in the centrality of control across the life span (Baltes & Baltes, 1986). (Skinner, Zimmer-Gembeck, Connell, Eccles, & Wellborn, 1998, p. 17)

Control, Engagement, and Achievement Healthy psychological development is associated with the development of competence-related beliefs. Deficits in these beliefs are associated with feelings of hopelessness, depression, and, in some cases, death (Peterson, Maier, & Seligman, 1995). Thus, it is troubling that competence-related

The Achievement Gap    3

beliefs decline during early adolescence (see Anderman & Maehr, 1994; Eccles et al., 1989; Eccles, Wigfield, & Schiefele, 1998 for review; Jacobs, Lanza, Osgood, Eccles, & Wigfield, 2002; Marsh, 1989; Wigfield, Eccles, Mac Iver, Reuman, & Midgley, 1991). A longitudinal study of 1,608 New York children across grades 3 through 7 found that the average child’s sense of competence and control over his or her academic environment rose to a peak in grade 5, then declined rapidly (Skinner et al., 1998). Examination of variation in individual trajectories suggests a possible explanation: individual differences in children’s perceptions of control predicted individual differences in their 5-year trajectories of academic engagement: Children whose trajectories of perceived control remained high or even increased slightly over time also sustained their initially high levels of engagement, whereas children whose perceived control was declining from third to seventh grade showed corresponding patterns of deteriorating engagement in the classroom. (Skinner et al., 1998, p. 148)

Thus, individual differences in the development of control and capacity beliefs were especially strong predictors of corresponding trajectories of academic engagement: These findings suggest that individual perceptions of perceived control laid down by the third grade, especially perceptions that reflect a pessimistic view about the strategies needed to succeed in school, can shape the development of children’s engagement for years to come . . . consistent with the notion of perceived control and engagement as parts of dynamic cycles in which perceived control influences engagement and is in turn influenced by the performances that result. (Skinner et al., 1998, p. 149)

Beliefs about perceived control influenced engagement, which influenced academic performance (Skinner, Wellborn, & Connell, 1990; Skinner et al., 1998). Academic performance, in turn, fed back in a reciprocal effect to influence control beliefs: “At each age, individual performance (as indexed by school grades) had an effect on perceptions of control” (Skinner et al., 1998, p. 154). Significantly, the entire sequence was triggered by early academic achievement: “Children’s 5-year trajectories of control were launched by children’s early academic performances, as indexed by their grades prior to the third grade” (Skinner et al., 1998, p. 150). The reciprocal relationship between academic performance and control beliefs extends into the upper grades as well. Using panel data from a nationally representative sample of 8,802 American students, Ross and Broh (2000) found that academic achievement in 8th grade predicted an

4    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

internal locus of control in 10th grade, which in turn predicted academic achievement in 12th grade. By far the strongest predictor of academic achievement in 12th grade is earlier academic achievement. Significantly, all of the effects of parental education and income are mediated by academic performance (Ross & Broh, 2000). A large-scale study involving a random sample (Brookover et al., 1978) as well as meta-analytic studies (Findley & Cooper, 1983; Kalechstein & Nowicki, 1997) confirms that children’s beliefs about their degree of control over academic outcomes are strong predictors of academic achievement. The picture that emerges is that children who start school academically and socially behind their peers begin to doubt their own abilities, undermining engagement, effort, and performance. By the 7th grade, a record of cumulative failure makes it nearly impossible to maintain a sense of control (Skinner et al., 1998). These children are especially likely to become anxious, angry, passive, apathetic, and depressed. Conversely, children who start school academically and socially ahead of their peers and whose early efforts are met with academic success tend to maintain confidence in their academic abilities and engagement in school, further enhancing their academic skills and achievement. These syndromes—one negative, the other positive—provide insight into the process that perpetuates differences in academic performance and potentially explains the persistence of the academic achievement gap between children from poor families and their more privileged peers. One of the strongest predictors of children’s performance in school is individual differences in their perceived control. Reviews of hundreds of studies document robust relations between children’s cognitive performance and different aspects of their perceived control . . . Experimental and intervention studies have demonstrated that the relation between control beliefs and academic functioning is a reciprocal one. Experimental studies show that changes in children’s perceptions of efficacy, contingency, and control do in fact produce changes in quality and persistence of problem solving and performance on academic tasks (e.g., Diener & Dweck, 1978; Dweck, 1991; Schunk, 1991; Seligman, 1975). At the same time, intervention studies also reveal that, when children’s actual academic performance is improved (e.g., through direct tuition of metacognitive skills), these changes in level of performance have a direct effect on children’s self-efficacy judgments, sense of control, ability estimates, and perceived competence (e.g., Bouffard-Bouchard, 1989; Redlich, Debus, & Walker, 1986; Schunk, 1981; Schunk & Swartz, 1991). (Skinner et al., 1998, p. 1)

Thus:

The Achievement Gap    5 When children believe that they can exert control over success in school, they perform better on cognitive tasks. And, when children succeed in school, they are more likely to view school performance as a controllable outcome. (Skinner et al., 1990, p. 22)

Learned Helplessness The classic work on learned helplessness provided a breakthrough in understanding the cause of the loss of control experienced by poorly performing students. Experiments demonstrated that prolonged exposure to noncontingency, where effort fails to result in positive outcomes, produced a syndrome of helpless behavior, including passivity, apathy, and depressed performance, even on tasks where regular contingencies between behavior and outcomes were restored (Hiroto & Seligman, 1975; Klein, FencilMorse, & Seligman, 1976; Peterson et al., 1995). If prolonged noncontingency produces helpless behavior, passivity, and apathy, and if this behavior is resistant to change, it is especially important to ensure that regular contingencies between effort and academic achievement are established early and maintained throughout the primary and secondary school years. However, the prevailing experience for many children is a lack of contingency between their efforts and academic outcomes, leading to the passivity and apathy noted by Goodlad (1984). The ubiquitous age-graded school, age-specific curriculum, and nearly universal practice of administering the same reading and math tests to entire groups of students in order to compare them serves primarily to differentiate and label students who perform below average as “slow” or even “retarded” students, no matter how hard they try (Ames, 1992; Mac Iver, 1988; Mac Iver, Reuman, & Main, 1995; Tyack & Cuban, 1995). The experience of schooling for these students involves daily reminders that they are not up to par, that their efforts do not produce success. Lack of contingency arises if the dominant form of student evaluation involves comparisons with other students. Butler (2006) randomly assigned 7th- and 8th-grade students to two conditions, experimentally manipulated student expectations about whether each student would be evaluated in comparison with other students or in comparison with his or her own previous performance, and found that students in the latter condition exhibited a greater degree of task orientation, higher levels of intrinsic motivation, and more performance improvement than students in the first condition. The results suggest that when students know that performance will be judged based on each student’s prior performance, they have stronger be-

6    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

liefs that ability is acquired through effort, they are more likely to acquire new problem-solving strategies, they have stronger intrinsic motivation, and they perform better. Thus, the practice of comparing students to each other has negative effects that can be reversed by substituting the practice of comparing each student’s performance with his or her prior performance. Not only is it important to ensure that students are evaluated in relation to each student’s previous progress, but that the actual tasks they are presented with are individualized to ensure that task difficulty is matched to student ability. Lack of contingency arises if learning tasks are determined by teachers who set the level of difficulty at each group’s average level of ability, resulting in tasks that are difficult and frustrating for students who fall below the group average, and if success is measured by absolute performance against this standard rather than individual progress based on previous performance (Mac Iver et al., 1995). Below-average students would experience less success and more failure in daily learning tasks, compared with students whose academic performance is above average. Lack of contingency may also arise if above-average students routinely achieve success regardless of their level of effort (Mac Iver et al., 1995). Thus, it is likely that the vast majority of children experience a lack of contingency between their efforts and academic outcomes. This lack of contingency may be the primary cause of the striking decline in children’s intrinsic motivation for learning over the elementary and secondary years (Gottfried, Fleming, & Gottfried, 2001; Harter, 1981; Skinner et al., 1998) Numerous studies have controlled for individual differences in achievement motivation, experimentally manipulated task difficulty, and found that persistence improves when task difficulty is individualized (Blankenship, 1992; Danner & Lonsky, 1981; Dougherty & Harbison, 2007; Drucker, Drucker, Litto, & Stevens, 1998; Harter, 1978; Kennelly, Dietz, & Benson, 1985; Nygard, 1977). Task difficulty influences task involvement and feelings of competence, which influence intrinsic motivation (Vallerand & Reid, 1984). The evidence suggests that helpless, passive, disengaged student behavior may be addressed by individualizing task difficulty and performance expectations so that each student experiences a schedule of success and failure that establishes a strong contingency between effort and success. For example, Kennelly et al., (1985) randomly assigned matched students ages 9 to 11 who were previously identified as “helpless” to three treatment groups. Each group was assigned to solve arithmetic problems, but the difficulty of the problems was varied across the groups, resulting in differential success rates: 100 percent, 76.9 percent, and 46.2 percent, respectively. The group achieving success 76.9 percent of the time demonstrated significantly better persistence in the face of failure (Kennelly et

The Achievement Gap    7

al., 1985). Persistence, defined as unwillingness to stop performing a task, is central to the construct of motivation (Ryan & Deci, 2000). The optimal schedule involves individualization of task difficulty for each student on a daily basis so that success is achieved on 77 percent of trials, but students experience failure on a sufficient number of trials to experience a modest challenge to their competence. This schedule is an effective treatment for learned helplessness. In practice, it requires individualization of task difficulty and performance expectations on a daily basis. Individualization of task difficulty should not be confused with conventional individualization of instruction, which typically involves the division of work into learning-activity packages that each student must complete. Only the rate of task completion, rather than task difficulty, is individualized, resulting in small effects overall (Bangert, Kulik, & Kulik, 1983; Hirsch, 1976; Miller, 1976; Schoen, 1976). Consistent with Kennelly et al. (1985), however, individualized systems of instruction can achieve moderate (Hartley, 1978; Kulik, 1981) or even large (Block & Burns, 1976) effect sizes when used with populations of students whose abilities are well-matched to fixed levels of task difficulty.

Performance Feedback Significantly, the basic relationships among individualized assessment, task difficulty, task persistence, and intrinsic motivation are moderated by informational performance feedback. Objective, positive feedback tends to increase perceived competence and intrinsic motivation. As students’ perceptions of their abilities in an academic subject increase, not only do they try harder, but they also enjoy the subject more (Mac Iver, Stipek, & Daniels, 1991). Subjects who received positive performance feedback later displayed higher levels of intrinsic motivation compared with subjects who performed at the same rate of success, but did not receive the performance feedback (Ryan, Mims, & Koestner, 1983). Positive effects of feedback on student engagement and achievement have been demonstrated in numerous studies dating back to the 1960s. For example, Smith, Brethower, and Cabot (1969) found that having students chart their progress significantly improved motivation and output. Figure 1.1 shows student output, without feedback, on language arts tasks. Figure 1.2 shows that student output accelerated dramatically with feedback. In a second study, Robinson, DePascale, & Roberts (1989) randomly assigned 5th- and 6th-grade students to two groups. Both groups of students worked on identical sets of math problems in the same classroom at the

8    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Figure 1.1  Language arts task output without performance feedback.

same time with the same teacher. In the first session, neither group received feedback. In the second session, Group 1 received feedback, while Group 2 did not. In the third session, both groups received feedback. In the fourth session, neither group received feedback. The results showed that whenever a group received feedback, students in that group completed more problems with greater accuracy, compared with the baseline condition. Whenever feedback was withdrawn, the completion and accuracy rates dropped. The design of this study virtually rules out any explanation other than the conclusion that feedback caused improved student engagement and achievement. It is difficult to attribute the results of this experiment to individual differences in student characteristics, teacher characteristics, classrooms, or schools. The research design controlled for those differences. Several meta-analyses and reviews have been conducted regarding the effect of feedback on student achievement. A recent review of research

The Achievement Gap    9

Figure 1.2  Language arts task output with performance feedback.

summarized the results of previous meta-analyses regarding feedback and found an average effect size of 0.79 SD (Hattie & Timperley, 2007). The meta-analyses included studies that experimentally compared the achievement of students who were frequently tested with a group of similar students who received the same curriculum but were not frequently tested. The results suggest that feedback is most effective when it is nonjudgmental, involving frequent testing (2–5 times per week), and presented immediately after a test. Under these conditions, the meta-analyses and reviews of feedback interventions suggest that the effect size for testing feedback is no lower than 0.7 SD (Bangert-Drowns, Kulik, Kulik, & Morgan, 1991; Black & Wiliam, 1998; Fuchs & Fuchs, 1986; Kluger & DeNisi, 1996), equivalent to raising the achievement of an average nation such as the United States to the level of the top five nations (Black & Wiliam, 1998). When teachers were required to follow rules about using the assessment informa-

10    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

tion to change instruction for students, the average effect size exceeded 0.9 SD, and when students were reinforced with material tokens, in addition to the frequent testing, the average effect size increased even further, exceeding 1.1 SD (Fuchs & Fuchs, 1986). Emotionally neutral (i.e., testing) feedback that is void of praise or criticism “is likely to yield impressive gains in performance, possibly exceeding 1 SD” (Kluger & DeNisi, 1996, p. 278). Thus, effective feedback “includes feedback about how well a task is being accomplished or performed, such as distinguishing correct from incorrect answers . . . This type of feedback is most common and is often called corrective feedback . . . By itself, corrective feedback can be powerful. From various meta-analyses, Lysakowski and Walberg (1982) reported an effect size of 1.13, Walberg (1982) reported 0.82, and Tenenbaum and Goldring (1989) reported 0.74, all of which are substantial effects.” (Hattie & Timperley, 2007, p. 91). Thus, one promising strategy is to utilize technology that automates the task of providing corrective feedback to students so that teachers may focus on providing comments about strategies for improvement. Since research literature indicates that tutoring is much more effective than recitation, placing teachers in the role of tutors makes them more effective teachers, with an average effect size of 0.40 SD (Cohen, Kulik, & Kulik, 1982). These results are moderated by a second set of literature regarding the influence of task autonomy and performance standards.

Autonomy and Standards Autonomy in task performance (Patrick, Skinner, & Connell, 1993; Perlmutter & Monty, 1977; Ryan, Connell, & Deci, 1985; Zuckerman, Porac, Lathin, Smith, & Deci, 1978) and an accelerating standard of performance (McMullin & Steffen, 1982; Ryan et al., 1985) increase intrinsic motivation, suggesting that task difficulty must be continually adjusted upward to maintain an optimal level of challenge to each learner’s competence. Significantly, computer video games typically provide individualization of task difficulty, performance feedback, autonomy in task execution, and an accelerating standard of performance, where students advance to higher levels of difficulty after successfully completing less-difficult tasks. This may explain why many students who are disengaged in school are highly engaged when playing video games.

Summary There is ample reason to think that interventions targeting children’s beliefs about their degree of control over academic outcomes may be espe-

The Achievement Gap    11

cially effective in improving student achievement. The research reviewed above suggests that the most promising interventions will individualize assessment, task difficulty, and performance expectations for each student on a daily basis so that students achieve success on a regular basis, and provide performance feedback, autonomy in task execution, and an accelerating standard of performance. From this research, a theory of learning may be deduced: individualization of assessment, task difficulty, and performance expectations for each student on a daily basis, in combination with performance feedback, autonomy in task execution, and an accelerating standard of performance, ensures that students achieve success and feel successful on a daily basis, fostering student engagement, increased effort, and further improvements in achievement, in a virtuous cycle.

Rapid Assessment One approach for raising student achievement that incorporates the effective elements identified above involves the implementation of a particular type of rapid assessment system. Rapid assessment may be defined as systems that provide autonomy in task execution, an accelerating standard of performance, and formative testing feedback to students and teachers regarding student performance in math and reading 2–5 times weekly, while individualizing task difficulty and performance expectations so that students achieve success on a daily basis. I use the term formative testing, consistent with Scriven’s (1967) original usage, to denote evaluative assessments designed to improve performance.1 The concept of rapid assessment is embodied by two popular programs, Reading and Math Assessment.2 Rapid Assessment programs have been implemented in classrooms in over 65,000 schools (Northwest Regional Educational Laboratory, 2006), including statewide implementation of Reading Assessment in Idaho (Renaissance Learning, 2002). Reading Assessment individualizes task difficulty by matching students with books at appropriate reading levels. First, books in the school’s library are labeled and shelved according to Flesch-Kincaid reading levels, which are based on reading vocabularies. Second, students select books to read based on their interests and their reading levels, according to the results of the STAR Reading test, a norm-referenced computer-adaptive test (Renaissance Learning, no date). After finishing a book, each student takes a computer-based quiz, unique to the book, that is intended to monitor basic reading comprehension (Rapid Assessment Corporation has created more than 120,000 quizzes). Multiple-choice quiz items test student knowledge and comprehension of each book’s characters and plot (for fiction books) or declarative knowl-

12    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

edge (for nonfiction books). Students advance to more difficult books, as measured by Flesch-Kincaid reading levels, if they consistently achieve criterion comprehension scores of 85 percent. Math Assessment individualizes task difficulty through computer software that prints individualized math assignments based on each student’s achievement level, determined initially by performance on the norm-referenced, computer-adaptive STAR Math test. Students complete the math problems, which range from simple arithmetic items suitable for 1st-grade students to calculus items for moreadvanced students, and use a classroom device to scan their bubble answer sheets, which are scored by the classroom computer. When math problems are answered incorrectly, subsequent problem sets include (but are not limited to) similar problems, ensuring that students eventually learn the skills required to solve those problems. Students advance to new problem sets after achieving criterion scores of 75 percent, and advance to new objectives after achieving criterion scores of 85 percent on end-of-objective tests. Rapid assessment provides only corrective feedback—information about whether answers are correct or incorrect. It is not an instructional program and, thus, teachers must provide instruction, as they would normally. At the early grade levels, the corrective feedback provided with regard to math items assesses student ability to perform basic arithmetic; at advanced levels, the items assess student ability to understand and apply advanced concepts in algebra, geometry, precalculus, and calculus. The items assess both declarative and procedural knowledge, including recall as well as application items. The feedback provided by rapid assessment is corrective feedback— “Level 1” feedback in Hattie and Timperly’s (2007) scheme. Hattie and Timperly (2007) point out that corrective feedback by itself can be very powerful. Thus, it is clear that software that automates the provision of corrective feedback can have powerful effects on learning and achievement. Rapid assessment is not intended to minimize the role of the teacher in providing comments about strategies for improvement. Instead, rapid assessment utilizes the advantages of technology to effectively relieve teachers of the burden of providing corrective feedback, facilitating the ability of teachers to focus on Hattie and Timperly’s (2007) Level 2 and Level 3 feedback—comments by teachers regarding strategies for improvement, both specific process-oriented strategies and general self-regulatory strategies. The rapid assessment programs are designed to provide individualized comparisons of each student’s performance compared with his or her previous level of performance, similar to the Incentives for Improvement Program (Mac Iver & Reuman, 1993/1994). The programs effectively hide each student’s performance relative to his or her peers, thereby short-circuiting the

The Achievement Gap    13

ability of students to compare their performances. This is accomplished in reading by providing reading comprehension scores specific for each book that is read. Since books are matched to students according to reading level, a low-achieving student has the same ability to achieve a high comprehension score as a high-achieving student (because they read different books). In math, students receive sets of math problems that are individualized according to each student’s ability level. Therefore, low-achieving students have the same ability to achieve high completion and accuracy scores as high-achieving students. In both programs, performance expectations are individualized by teachers who set performance goals. Expectations for each student are based on his or her past performance, rather than relative comparisons across students. Performance feedback is provided to students through computer-generated reports, which each student receives after completing a book or set of math problems. The reports provide basic feedback regarding each student’s comprehension or accuracy score, book level or number of objectives successfully completed, and progress toward goals. This information is available to each teacher electronically on his or her classroom computer, immediately upon student completion of each book or set of math problems. Students making slow progress or with low reading comprehension or math accuracy scores are electronically flagged for intervention by the teacher. Autonomy in task execution is achieved by permitting students to select the books they read from a wide range of fiction and nonfiction literature in the school’s library, and to complete math problem sets with minimal supervision. The majority of students work independently during each class period, with the teacher tutoring individual students or a small group of students on reading or math problem-solving strategies. In reading, students are encouraged to read more difficult books through points awarded according to each book’s length and Flesch-Kincaid reading level (as long as students meet the 85 percent comprehension criterion). Points are typically factored into student grades. In math, performance expectations are automatically raised as students automatically advance to new and more difficult objectives after successfully completing tests that signal mastery of previous objectives. These tests are scheduled by the computer after students consistently complete math problem sets while meeting the corresponding accuracy criterion. The percentage of objectives met each quarter is typically factored into student grades. Teachers using rapid assessment reported that individualization of task difficulty and individualized performance expectations hid achievement

14    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

differentials, allowed students to experience repeated incremental success, and fostered student feelings of success, confidence, and emotional commitment (Yeh, 2006a). The task structure allowed each student to control his or her rate of progress, and this control gave them enjoyment, while the accelerating standard of performance fostered pride and excitement in academic achievement as students monitored their progression from one level to the next through regular performance feedback suggesting that they were successful most of the time (Yeh, 2006a). Finally, there were very clear markers of achievement, on a daily basis, which were visible to each respective individual student, but not visible to their peers (Yeh, 2006a). Individualization of the curriculum in math and reading allowed students who were achieving at levels several grades apart to work side by side in the same classroom (Yeh, 2006a). Each student received separate reports of his or her individualized math and reading assessments on a daily basis (Yeh, 2006a). Those results provided objective confirmation to each student regarding his or her progress (Yeh, 2006a).

Rigorous Studies of Rapid Assessment Two randomized experiments evaluated the effectiveness of the Reading Assessment program (Nunnery, Ross, & McDonald, 2006; Ross, Nunnery, & Goldfeder, 2004). The first experiment, involving 1,665 Memphis students (a district where 71 percent of all students are eligible for free/reducedprice lunch), found an average effect size of 0.270 SD per grade in grades K through 6, over a 9-month school year (Ross et al., 2004). Using HLM, the second experiment, involving 978 students (89.9 percent African American and 83 percent eligible for free/reduced-price lunch) found an average effect size of 0.175 SD per grade in grades 3 through 6, over a 9-month school year (Nunnery et al., 2006). In both studies, the effect sizes were achieved with an unusually disadvantaged population of students. The only randomized study of Math Assessment, involving 1,880 students in grades 2 through 8, in 80 classrooms and seven states, found an effect size of 0.324 SD over a 7-month period after controlling for treatment integrity (Ysseldyke & Bolt, 2007). The only national, peer-reviewed quasiexperimental evaluation of Math Assessment, involving 2,202 students in grades 3 through 10, in 125 classrooms in twenty-four states, found that students in the treatment group gained an average of 0.392 SD per grade over one semester (18 weeks), compared with students not receiving Math Assessment (at pretest the scores of treatment and comparison students were not significantly different) (Ysseldyke & Tardrew, 2007).

The Achievement Gap    15

The effect sizes in the two studies of Reading Assessment (Nunnery et al., 2006; Ross et al., 2004) were depressed because they were conducted with highly disadvantaged populations (this also explains why the Reading Assessment results were depressed compared with the Math Assessment results). Effect sizes in the studies of Reading and Math Assessment were depressed in comparison with effect sizes derived from reviews and meta-analytic studies (Bangert-Drowns et al., 1991; Black & Wiliam, 1998; Fuchs & Fuchs, 1986; Hattie & Timperley, 2007; Kluger & DeNisi, 1996), because the former studies were conducted under classroom conditions with no special motivational incentives, while the reviews and meta-analytic results were derived from studies conducted under more highly controlled research conditions. The results for Reading Assessment suggest that younger students benefit more than older students (Nunnery et al., 2006; Ross et al., 2004). It appears that students in the upper grades might be less motivated by information regarding reading comprehension and book difficulty level, suggesting that, to be effective, the program has to be implemented early. The results for Math Assessment suggest that older students benefit as well as younger students. In studies involving the effects of frequent testing, a question that arises is whether gains that are attributed to the treatment might be an artifact resulting from alignment of the formative tests with the criterion measures. With regard to Reading Assessment, the frequent book quizzes aim to assess each student’s reading comprehension with regard to individual books that are individually selected by each student from a large library. Thus, the content of the quizzes is aligned with the content of individual books, rather than the criterion STAR Reading assessment. With regard to Math Assessment, students are assigned individualized math problem sets that are tailored and aligned with state standards for mathematics instruction in grades 1 through 7, as well as standards for basic math, pre-algebra, algebra 1, algebra 2, geometry, probability and statistics, precalculus, and calculus in the secondary grades. The content of the STAR Math criterion test is aligned with national standards but is also aligned with the content of the math problem sets (which are rapidly scored and provide rapid performance feedback) to the extent that the state and national standards overlap. A limitation of the studies of Reading and Math Assessment is that they do not provide information about long-term effects and, thus, it is not clear whether effects achieved over 7- or 9-month periods can be sustained over longer periods of time. Results also depended on the level of program implementation; the positive effect of the program disappeared if teachers did not implement the program (Ysseldyke & Bolt, 2007).

16    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Conclusion Individualization of book selections and math assignments for each student’s achievement level creates a task structure where task difficulty is individualized. In combination with individualization of performance expectations and rapid, test-based performance feedback, these characteristics ensure that students receive a steady dose of positive, objective feedback, reinforcing student beliefs that they are competent learners who are able to master new skills and knowledge every day. Autonomy in task execution and an accelerating standard of performance foster student feelings of pride and accomplishment. In this way, the experience of schooling can be restructured to meet students’ psychological need to feel competent, promoting engagement, commitment to work effort, and a self-reinforcing cycle of increased achievement. Teachers reported that the rapid assessment intervention allowed them to individualize and target instruction and provide more tutoring, reduced drill and practice, and improved student readiness for critical thinking activities, resulting in a more balanced curriculum (Yeh, 2006a). Teachers reported that the assessments provided a common point for discussion, increased collaboration among teachers to improve instruction and resolve instructional problems, and supported both new and experienced teachers in implementing sound teaching practices (Yeh, 2006a). To the extent that rapid assessment enables teachers to provide individualized instruction, keeping each student in his or her own zone of proximal development, students are more likely to be successful and feel that they can control their performances. Feelings of control reinforce effort, which improves achievement, reinforcing feelings of control and engagement. In contrast, the current structure of schooling fails to provide a supportive technology that would allow teachers to routinely tailor instruction to each student and to systematically provide objective feedback to students, showing them that they can control their academic performance. The frequent result is the enervating classroom experience described by Goodlad (1984). Changing the psychological experience of school for both high- and low-achieving students likely requires a structure wherein instruction is individualized, students are challenged at their own levels, and each student receives objective information confirming that he or she is successfully advancing to higher levels.

2 The Cost-Effectiveness of Five Policies for Improving Student Achievement

I

n recent years, policymakers and educational researchers have focused on four policy alternatives for improving student achievement: (a) increased educational spending, (b) voucher programs,(c) charter schools, and (d) increased accountability, based on evidence that these approaches may be effective in improving student performance (Carnoy & Loeb, 2002; Greene, Peterson, & Du, 1999; Greenwald, Hedges, & Laine, 1996; Hanushek & Raymond, 2005; Howell, Wolf, Campbell, & Peterson, 2002; Hoxby, 2002; Hoxby & Rockoff, 2005; Rouse, 1998). However, the research suggests very modest effect sizes in comparison with a fifth approach, involving the implementation of rapid assessment systems that provide testing feedback to teachers and students regarding student performance in math and reading 2–5 times per week (Bangert-Drowns et al., 1991; Fuchs & Fuchs, 1986; Kluger & DeNisi, 1996). What is needed is a comparison of relative effect sizes and relative cost-effectiveness so that policymakers have the information needed to identify the most promising policy alternatives. This chapter compares the relative cost-effectiveness of the five poliThe Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 17–40 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

17

18    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

cies, using best-evidence estimates drawn from available data regarding the effectiveness and costs of rapid assessment, increased spending, voucher programs, charter schools, and accountability, using a conservative methodology for calculating the relative effectiveness of rapid assessment. Two sensitivity analyses suggest that the basic conclusion—that rapid assessment is much more cost-effective than the alternatives—is robust to changes in the underlying assumptions. Cost-effectiveness analysis systematically compares the costs and outcomes of alternative policies in order to aid decisions about the most efficient course of action (Levin, 1988; Levin & McEwan, 2001). In the area of education, cost-effectiveness analysis assesses outcomes in educational terms, such as student achievement, whereas cost-benefit analysis assesses outcomes in terms of monetary value. Importantly, cost-effectiveness analysis provides a standard framework and techniques that permit policymakers to compare disparate interventions (Levin & McEwan, 2001). Although systemic education reforms have multiple goals, they can be compared with classroom-level interventions since they aim to improve student achievement. Policymakers wish to know which approach was most cost-effective, regardless of whether it was a systemic or a classroom-level intervention. The basic strategy in cost-effectiveness analysis is to derive results for educational effectiveness by using standard evaluation procedures or studies and to combine that information with cost data that are derived from the ingredients approach (Levin, 1988; Levin & McEwan, 2001). Importantly, the evaluation of effectiveness is separable from the evaluation of costs (Levin, 1988). Thus, estimates of effectiveness can be derived from published evaluations and then combined with estimates of costs derived through the ingredients approach: (a) identification of ingredients, (b) determination of the value or cost of the ingredients, and (c) analysis of the costs in an appropriate decision-oriented framework (Levin, 1988; Levin & McEwan, 2001). The costs of an intervention are defined as the value of the resources that are given up by society to implement the intervention, regardless of how those costs are reported by government entities (Levin, 1988; Levin & McEwan, 2001). These resources are the ingredients of the intervention, and the social value of those ingredients constitute its overall cost (Levin, 1988; Levin & McEwan, 2001).

Effect Size Estimates Effect-size estimates were drawn from the most rigorous evaluations of Rapid Assessment, increased educational expenditure, voucher programs, charter schools, and increased educational accountability.1

The Cost-Effectiveness of Five Policies for Improving Student Achievement    19

Rapid Assessment Two randomized studies of Reading Assessment are available. The mean effect size (after averaging the grade-level effect sizes where they overlap, in grades 3 through 6) is 0.279 SD per grade in grades K through 6, with an unusually disadvantaged population of students (Nunnery et al., 2006; Ross et al., 2004). Two rigorous evaluations (one randomized and one large quasi-experimental study) of Math Assessment are available. The mean effect size is 0.358 SD (Ysseldyke & Bolt, 2007; Ysseldyke & Tardrew, 2007). Averaging the effect-size estimates for Reading and Math Assessment produces an overall effect size of 0.319 SD.

Increased Spending Estimates of the effects of increased spending on student achievement are bounded by two meta-analyses (and associated followup studies). Hanushek’s (1986, 1989, 1997) meta-analyses suggest that there is only a weak relationship between educational spending and student achievement, while Greenwald et al. (1996) drew a more optimistic conclusion based on their meta-analysis and estimated that a 10 percent increase in per-pupil expenditure would increase student achievement in math and reading by 0.083 SD per year. Thus, 0.083 SD represents an upper-bound estimate of the effect of a 10 percent increase in preexisting patterns of educational spending.

Voucher Programs The best studies of voucher programs used random assignment or lottery-based assignment designs. At most, these studies suggest small positive effects on student achievement for students who received vouchers. Three randomized field trials in New York City; Washington, DC; and Dayton, Ohio, suggest that vouchers boosted achievement by a statistically insignificant 1.8 National Percentile Rank (NPR) points in “total achievement” (math and reading) over 2 years (0.9 NPR points or 0.572 normal curve equivalent [NCE] points per year, equal to an effect size of 0.027 SD)2 (Howell et al., 2002). In Milwaukee, where the use of a lottery system to allocate vouchers approximated a random-assignment design, Rouse (1998) estimated that students selected for the Parental Choice Program showed zero gains in reading, but gained 1.5–2.3 NCE points per year in math (an average of 1.9 NCE points, equal to an effect size of 0.090 SD) over comparison group

20    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

students, while Greene et al. (1999) estimated that after 4 years, reading scores were 5.84 NCE points higher (an annual gain of 1.460 NCE points or an effect size of 0.069 SD), while math scores were 10.65 NCE points higher (an annual gain of 2.663 NCE points or an effect size of 0.126 SD). The results suggest that the effects of voucher programs on voucher recipients are small when implemented in urban districts such as New York; Washington, DC; Dayton; and Milwaukee. There is no evidence to suggest that effect sizes are larger when voucher programs are implemented on a larger scale or if the focus is on the effects of increased competition on students who remain in traditional public schools, rather than the effects on voucher recipients. Hoxby (2002) analyzed the effects of the Milwaukee voucher program, which she selected, along with two charter school initiatives, “because they are the only ones in which the choice schools can, legally, garner a large enough share of enrollment to provide a nonnegligible amount of competition for the regular public schools” (p. 142). Hoxby reported annual gains of 2.1 NPR (1.176 NCE) points per year in reading (an effect size of 0.056 SD) and 2.8 NPR (1.568 NCE) points per year in math (an effect size of 0.074 SD) (see Wisconsin Department of Public Instruction, 2007a, Tables 1 and 3, 1996 norm conversions for 4th grade, for the 25th through the 50th national percentiles). While Greene and Winters (2004) also found positive effects of competition from vouchers, their study confounded the effect of vouchers with the branding effect of Florida’s school accountability system (Carnoy, 2001, Appendix A). In Florida, students in schools that receive a grade of F twice during any 4-year period are eligible to receive vouchers to attend another public school or a private school. Over a 1-year period, students eligible for vouchers gained 5.9 NPR (3.3 NCE) points in math on the Stanford Achievement Test, compared with students in all other Florida public schools (Greene & Winters, 2004). However, independent analyses strongly suggest that much of this effect may be attributable to the galvanizing effect on school staff of being branded an “F” school, rather than any effect due to vouchers (Carnoy, 2001). Subtracting the spurious branding effect results in a pure voucher effect size of only 0.02 SD (Carnoy, 2001). This evidence suggests that it is appropriate to focus attention on the direct effect of voucher programs on voucher recipients, rather than any indirect effect on neighboring public schools. The average effect size across the three key studies of the direct effects of voucher programs is 0.032 SD in reading and 0.081 SD in math, or 0.057 SD overall (Table 2.1).

The Cost-Effectiveness of Five Policies for Improving Student Achievement    21 Table 2.1  Effect Sizes in Standard Deviation Units for Key Studies of Voucher Programs Study

Reading

Math

Average

Howell et al., 2002 Rouse, 1998 Greene et al., 1999

0.027 0.000 0.069

0.027 0.090 0.126

0.027 0.045 0.098

Average

0.032

0.081

0.057

Charter Schools Since charter schools are extremely heterogeneous, it would be useful to differentiate the types of charter schools that are most effective. While it is possible that researchers will be able to provide this information in the future, the evidence that is presently available fails to demonstrate that any particular type of charter school is significantly more effective than the others. Thus, policymakers are forced to fall back on a more basic question: whether charter school programs are, in general, an efficient use of public resources. Ideally, large-scale randomized experiments comparing the student achievement of students who were randomly accepted into charter schools with the performance of students who were randomly rejected and attended traditional schools would be used to estimate the effect of charter schools on student achievement. However, no true random-assignment studies have been completed thus far, and the two lottery-based studies that approximated a random-assignment design involved only a total of 4 charter schools, which is inadequate for estimating the average impact of charter schools. Averaging the effect sizes from these two studies suggests an effect of approximately 0.018 SD, but no generalizations can be made to the larger population of charter schools. As with voucher programs, the indirect effect of charter school competition on public school performance is uncertain. Holmes, Desimone, and Rupp (2003, 2006) estimated the galvanizing effect of the presence of North Carolina’s charter schools on public school performance and concluded that charter school competition increased traditional school performance by about 1 percent. However, Bifulco and Ladd (2005, 2006), using a stronger methodological approach, concluded that North Carolina’s charter schools exerted a negative effect on charter school student performance— an effect that was not offset by positive effects due to increased competition and was not explained by observable or unobservable differences in the students who attend charter schools (Bifulco & Ladd, 2005, p. 62).

22    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

A prime concern when random assignment is not used is controlling selection bias—the threat that differences in outcomes are attributable to observable or unobservable differences between students who attend charter schools and comparison students who attend traditional schools. However, this threat can be controlled through the use of panel data with multiple observations of performance for each student who switches between a traditional school and a charter school (or the reverse). Thus, students serve as their own controls and the benefit of charter schools is estimated by comparing achievement gains while attending a charter school to achievement gains while attending a regular public school. This method, used by 3 large-scale studies, controls for student maturation and history effects, as well as selection bias and, therefore, has many of the key strengths of a random-assignment design.3 These studies are used to derive the estimate of charter school effects that is used in the present chapter’s cost-effectiveness analysis. Despite differences in the charter school programs, the studies consistently found negative effect sizes. Bifulco and Ladd (2006) evaluated the effect of North Carolina’s charter schools on student achievement, finding a negative effect size of 0.095 SD on reading scores and a negative effect size of 0.160 SD on math scores. Sass (2006) evaluated the effect of Florida’s charter schools on student achievement, finding zero effect on reading scores and a negative effect size of 0.019 SD on math scores. Hanushek, Kain, Rivkin, and Branch (2007) evaluated the effect of Texas’ charter schools on student achievement, finding a negative effect size of 0.17 SD on combined reading and math scores. The effect of competition from charter schools on the performance of traditional public schools was inconsequential (Bifulco & Ladd, 2006; Sass, 2006). The average effect size across these key studies of charter schools was a negative 0.088 SD in reading and a negative 0.116 SD in math, or a negative 0.102 SD overall. However, in all 3 studies, negative effects were largest in the first year of charter school operation and diminished over time. The average effect size for charter schools after 5 years in operation was 0.005 SD (Table 2.2). The overall conclusion—that charter school effect sizes are small—does not change even if the larger group of studies using less-rigorous methods is considered. An analysis of every study published in the 2000–2005 period regarding the link between charter schools and academic achievement concluded that, “Regardless of the methods used, the results are mixed, some positive about charters and some negative, with null or mixed findings the most common” (Hill, Angel, & Christensen, 2006, p. 140).

The Cost-Effectiveness of Five Policies for Improving Student Achievement    23 Table 2.2  Effect Sizes in Standard Deviation Units for Key Studies of Charter Schools All Years of Operation Study

Reading

Math

Bifulco & Ladd, 2006 Sass, 2006 Hanushek et al., 2005

–.095 .000b –.170

–.160 –.019b –.170

–.128 –.009b –.170

Average

–.088

–.116

–.102

5th Year of Operation

Average Reading

Math

Average

–.064a .032b .060

–.092a .034b .060

–.078a .033b .060

.009

.001

.005

Bifulco and Ladd note that charter schools in operation for 5 years exhibited an anomalous dip in performance; therefore, estimated impacts for charter schools in operation for 4 years were substituted. b Using coefficients from the restricted value-added model. a

Increased Accountability Increased educational accountability may be defined as the implementation of high-standards exit exams. A majority of states have now implemented high school exit exams that students must pass to graduate from high school, affecting 70 percent of students nationwide (Gayler, Chudowsky, Hamilton, Kober, & Yeager, 2004) and, with few exceptions, requiring mastery of material at the high school level (Center on Education Policy, 2005). Studies by Amrein and Berliner (2002a, 2002b) concluded that there is no evidence that states that implemented high-stakes tests demonstrated improved student achievement on various measures such as the Scholastic Achievement Test (SAT), American College Test (ACT), Advanced Placement (AP) exams, or the National Assessment of Educational Progress (NAEP). However, both their methodology and their findings have been challenged (Braun, 2004; Rosenshine, 2003). In contrast, three studies have escaped methodological criticism (Carnoy & Loeb, 2002; Grodsky, Warren, & Kalogrides, 2009; Hanushek & Raymond, 2005). Carnoy and Loeb (2002) modeled gains in student achievement as a function of differences in degree of accountability pressure, including pressure exerted through the implementation of exit exams, and found that significantly stronger accountability was associated with modest increases in student achievement at the 8th-grade level, but effects of increased accountability on 4th-grade scores were not significant. A more sophisticated study employed a longitudinal panel design with controls for fixed state effects as well as measures of parental education,

24    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

school spending, and race (Hanushek & Raymond, 2005). Hanushek and Raymond (2005) characterized their study as an evaluation of the impact of “consequential accountability.” Importantly, their research design essentially tested the impact of the entire package of state reforms which occurred around the dates they estimated that “consequential accountability” was implemented in each state that was included in their study. The design did not allow the researchers to distinguish the effect of one form of consequence from another. However, mapping the dates that Hanushek and Raymond (2005) used (to split baseline performance from post-intervention performance) to the dates when the corresponding states implemented exit examinations testing high school level material (dates from Dee & Jacob, 2007, Table A1, adjusted to reflect that implementation of the new accountability system typically occurs 4 years prior to the first date that diplomas are withheld under the system), it is clear that in most cases, these dates coincide—meaning that Hanushek and Raymond’s (2005) results may be interpreted as a test of the impact of implementing this type of exit exam. This study found that high-stakes testing, where consequences are linked to student achievement, improves achievement by 0.2 SD over 4 years, or 0.05 SD per year.4 Grodsky et al. (2009) conducted the only evaluation of the impact of exit exams on student achievement that used nationally representative NAEP data and controlled for state fixed effects as well as gender, race, parental education, grade level, a measure of “home resources,” age, and difficulty of the exit exam. This study, which provides the best available evidence for the impact of exit exams, found no significant effect on student achievement. While the results are not statistically significant, they imply mixed effect sizes of 0.051 SD in reading and negative 0.062 SD in math, or a negative 0.011 SD averaged across reading and math.

Cost-Effectiveness Analysis The next step is to integrate the preceding synthesis of the literature regarding effect sizes for each of the five alternative interventions with information about the cost of each intervention. The resulting cost-effectiveness analysis is intended to illustrate the relative cost-effectiveness of each intervention. The standard method of calculating a cost-effectiveness ratio for a given intervention is to divide the intervention’s costs by the intervention’s effect size (Equation 1) (Levin, 1988).

Cost–Effectiveness Ratio = Cost/ES

(1)

The Cost-Effectiveness of Five Policies for Improving Student Achievement    25

An alternative is to calculate the inverse, an effectiveness/cost ratio (Equation 2).

Effectiveness–Cost Ratio = ES/Cost

(2)

Finally, a relative effectiveness–cost ratio may be calculated to compare the effectiveness–cost ratio of rapid assessment with the effectiveness–cost ratio of an alternative intervention (Equation 3):

Relative Effectiveness–Cost Ratio (REC) =

ES RA / Cost RA ES Alt / Cost Alt

(3)

where ESRA is the effect size (in standard deviation units) of rapid assessment; ESAlt is the effect size in standard deviation units of a given alternative intervention; CostRA is the cost of rapid assessment; and CostAlt is the cost of the alternative intervention. Costs include opportunity costs, that is, the foregone benefits of the best alternative use of the resources needed to implement an intervention. Since ratios of effectiveness–cost ratios are sensitive to small changes in denominators, the purpose of this type of comparison is to provide policymakers with general guidance about the gross magnitude of differences in cost-effectiveness, rather than claims about the precise size of those differences.

Costs of Rapid Assessment Tables 2.3 and 2.4 list the cash costs associated with the implementation of Reading Assessment and Math Assessment. In addition to the core software programs (either Reading Assessment or Math Assessment), it is assumed that a diagnostic assessment (either STAR Reading or STAR Math) is purchased and implemented for each student receiving the rapid assessment intervention. In addition, mark-scan devices are purchased for each math classroom. For the purpose of calculating the costs per student, it is assumed that initial one-time costs are averaged over an enrollment of 500 students in 25 classrooms per building. Initial fees, teacher and administrator training, and the cost of the scanners are amortized over 7 years, which is (arbitrarily) assumed to be the life of the program (schools that choose to continue using the programs for a longer period of time would effectively reduce the annual cost). For reading, the costs include access to 100,000 book quizzes for every student. For math, the costs include access to Math Assessment grade-level libraries tagged to state standards for grades 1 through 7, as well as multiple

26    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement Table 2.3  Cash Costs to Implement Reading Assessment

Item

Fixed Cost (per school)

Annual Variable Cost (per student)

Annual Variable Cost (500 students)

Total Annual Cost (per student)a

Reading Assessment STAR Reading Reading Assessment Trainingb

$1,499.00 $1,499.00 $5,736.50

$4.00 $0.39 —

$2,000.00 $195.00 —

$4.43 $0.82 $1.64

Total

$8,734.50

$4.39

$2,195.00

$6.89

After 10% Discount

$7,861.05

$3.95

$1,975.50

$6.20

Source: Rapid Assessment Corporation, August 8, 2006 a Assuming fixed costs are spread over 500 students and averaged over a 7-year implementation period. b $149/full day training × (37.5 teachers + 1 administrator). Table 2.4  Cash Costs to Implement Math Assessment

Item Math Assessment STAR Math AccelScan scanners and cards Math Assessment trainingb

Fixed Cost (per school)

Annual Variable Cost (per student)

Annual Variable Cost (500 students)

Total Annual Cost (per student)a

$1,499.00 $1,499.00 $7,875.00c $5,736.50

$4.00 $0.39 $8.10d —

$2,000.00 $195.00 $4,050.00 —

$4.43 $0.82 $10.35 $1.64

Total

$16,609.50

$12.49

$6,245.00

$17.24

After 10% Discount

$14,948.55

$11.24

$5,620.50

$15.52

Source: Rapid Assessment Corporation, August 8, 2006 a Assuming fixed costs are spread over 500 students and averaged over a 7-year implementation period. b $149/full day training × (37.5 teachers + 1 administrator). c Assuming 25 classrooms per school × $315 per scanner. d 180 instructional days per 9-month year × 1 mark card @ $.045.

subject area libraries for the secondary grades (pre-algebra, algebra 1, algebra 2, geometry, probability and statistics, precalculus, calculus, basic math, chemistry, and physics). The assessment programs are simple to implement; thus, an administrator could instruct each teacher regarding the use of the

The Cost-Effectiveness of Five Policies for Improving Student Achievement    27

software. However, the Rapid Assessment Corporation offers full-day training sessions costing $149 per teacher, and the cost analysis assumes that every classroom teacher and one administrator (for every 500 students) undergoes a full-day training session for Math Assessment and a full-day training session for Reading Assessment. In addition, the cost analysis assumes a 50 percent teacher turnover rate during the 7 year implementation period and assumes that each new teacher receives a full-day training session for Math Assessment and a full-day training session for Reading Assessment. Implementation requires that each classroom of students has access to one computer and one printer (math problems are printable so that students can work individually without using a computer). Based on a nationally representative survey, 93 percent of all instructional classrooms were online by the year 2003, implying that students in those classrooms had access to at least one classroom computer, and a linear extrapolation of recent trends in online access suggests that 100 percent of classrooms had access to a classroom computer by the year 2006 (Parsad & Jones, 2005). In addition, researchers note that available computer resources are frequently underutilized (Cuban, 2001). While most classrooms that have a computer also have a printer, the printer might be cheap and unreliable. However, since Internet-connected computers can be linked through a local area network (LAN) to print from any printer in the same building, it is feasible to utilize high-capacity printers in the school’s media center (Reilly, no date). It is also possible to print to a Xerox machine in the same building if the Xerox is equipped with a LAN card (Reilly, no date). For these reasons as well as the author’s observations of actual program operations, the cost analysis assumes that it would be feasible to implement the rapid assessment programs without purchasing additional computers or printers. Thus, the cash costs of rapid assessment are primarily the costs listed in Tables 2.3 and 2.4. The annual cost per student in 2006 dollars is $9.45 in reading and $18.89 in math, adjusted for the opportunity costs of teacher training time ($3.02 per student) and adjusted for the opportunity costs created by large upfront fixed costs.5 Since the use of rapid assessment technology does not supplant normal reading and math activities, the opportunity costs of using rapid assessment primarily involve the time required by teachers to monitor students. Through classroom observations and interviews with teachers and administrators, as well as review of program documents, the researcher verified that during designated periods of the day devoted to reading and math, the majority of students read books selected from the school library or work on printed sets of math problems (Yeh, 2006a). Students who complete a book sit at the classroom computer to take a brief comprehension quiz. Students who complete a set

28    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

of math problems scan their bubble sheets. Teachers typically tutor individual students or small groups of students. No additional time is allocated to reading or math instruction beyond standard 60-minute daily periods of reading and math instruction, nor is that time used in a way that is much different than standard reading and math learning activities. The primary difference is that books are selected according to each student’s reading level, math problems are assigned according to each student’s math level, and students and teachers are able to quickly diagnose areas where students are having difficulty. Thus, to the extent that Reading and Math Assessment activities do not displace the reading and math activities that may be expected in the absence of rapid assessment, and given that the assessments are self-administered by students and scoring and reporting is handled by computer software, the opportunity costs of implementing the program primarily involve the time required by teachers to ensure that students select and read appropriate books, take the comprehension quiz without assistance from other students, and complete and scan their answers to assigned math problems.6 According to teachers who were interviewed, the time saved due to the program’s scoring and student progress monitoring features, which replace the time-consuming conventional tasks of grading math homework and assessing reading comprehension, more than offset the opportunity costs of helping students to select books and monitoring student use of the classroom computer (Yeh, 2006a). Thus, the annual cost of rapid assessment, including the opportunity costs of operating the program, remains a total of $9.45 in reading and $18.89 in math, or a total of $28.34. If the effect size for rapid assessment is 0.319 SD, the effectiveness-cost ratio is 0.01125618.

Cost to Increase Educational Expenditure by 10 Percent Total expenditure per pupil in constant 2004–2005 dollars equals $10,464 (National Center for Education Statistics, 2005d). Ten percent equals $1,046.40, or $1,118.83 in 2006 dollars, meaning it would cost $1,118.83 to increase per pupil expenditure by 10 percent. If the corresponding effect size is 0.083 SD, the effectiveness-cost ratio is 0.00007418. Equation 3 indicates that the achievement gains per dollar from rapid assessment are a multiple of 152 times the gains that accrue through a 10 percent increase in preexisting patterns of educational expenditures.

Costs of Voucher Programs Significant costs to society would arise to the extent that voucher programs pull students at random from public school classrooms, making it

The Cost-Effectiveness of Five Policies for Improving Student Achievement    29

difficult (if not impossible) for public schools and districts to reduce expenditures at the same rate that students use vouchers to switch to private schools. For example, in the study by Howell et al. (2002), very small numbers of students were offered vouchers in the baseline year: 1,300 in New York; 809 in the District of Columbia; and 515 in Dayton, Ohio. If these 2,624 students were randomly pulled from the 1,427 public schools in those three cities (1,213 in New York, 165 in District of Columbia, and 49 in Dayton)(National Center for Education Statistics, 2000, 2001) and, within each school, were randomly pulled from an average of four grade levels, only 0.46 students per grade level, per school, used vouchers to transfer to private schools. Similar calculations suggest that the same issue arises in each of the major voucher programs. As of January 2006, a total of 5,734 students participated in Cleveland’s voucher program (C. R. Belfield, 2006). Based on the proportion (21 percent) of students who transferred from Cleveland’s public schools in the 2000–2001 school year (Schiller, no date), I estimate that 1,204 of the 5,734 students transferred from Cleveland’s 122 public schools (National Center for Education Statistics, 2005a): an average of 9.87 students per school, or 2.47 students per grade level per school (assuming an average of 4 grade levels per school) (the present cost-effectiveness analysis focuses on students who transferred from the public schools rather than recipients of vouchers who were previously home-schooled or never attended public schools, since the purpose of the analysis is to determine the effectiveness of alternative ways of improving the achievement of students currently enrolled in public schools). During the 2004–2005 school year, 3,074 students used vouchers to transfer from Milwaukee’s public schools to private schools (Wisconsin Department of Public Instruction, 2006), suggesting that an average of 13.08 students were pulled from each of Milwaukee’s 235 public schools (U.S. Department of Education, 2005a), or an average of 3.27 students per grade, per school (assuming an average of 4 grade levels per school). Florida had enrolled 734 students into the Opportunity Scholarship Program by November, 2005 (Florida Department of Education, 2006), or an average of 0.05 students per grade level per school, assuming an average of 4 grade levels per school in each of the 3,700 public schools in Florida (U.S. Department of Education, 2005f). However, on January 5, 2006, the Florida Supreme Court issued a ruling declaring the private school option of the Opportunity Scholarship Program unconstitutional (Florida Department of Education, 2006).

30    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

On average, therefore, very few students—between 0.05 and 3.27 students per grade level per school—used vouchers to transfer to private schools in the city of New York, the District of Columbia, Dayton, Cleveland, Milwaukee, and Florida. This would not have allowed any of the public school building principals to eliminate teaching positions in response to the transfers, and there would have been no decrease in overhead, facilities, or administrative costs. At the same time, there are far fewer private schools and, thus, voucher programs funnel public school students into a small number of classrooms which quickly fill up and require more buildings, teachers, and administrators, in proportion to the ratio of public to private school students (9 to 1; see National Center for Education Statistics, 2005b). If public school students enroll in private schools, the private school classrooms fill up nine times faster than public school classrooms are depleted. Thus, voucher programs typically create a need for private schools to add teachers, administrators, and facilities, without corresponding reductions in public school costs. These inefficiencies increase the total costs of teaching the same number of students (both public and private). While the real cost of educating the remaining public school students barely declines, state funding declines by a fixed amount every time a student uses a voucher to transfer to a private school (Miles & Roza, 2004). By law, school districts must balance their budgets; when revenues are reduced, expenditures must also be reduced. When building principals are unable to save costs by consolidating classrooms and reducing teaching positions, the reduction in revenue associated with declining enrollment is typically offset by reducing other services—art, music, extracurricular programs, and other support services (see Horn and Miron, 2000, who identified the same issue when students transfer to charter schools). These reductions in expenditures can be mistakenly attributed to cost savings, for example, when expenditures are regressed on factors including student enrollment in order to calculate the reduction in expenditure associated with a decline in enrollment (see, for example, Gottlob, 2004). However, expenditures decline because of the forced reduction in services, not because of savings generated through reduced enrollments. The most reliable way to determine the amount of any cost savings would be to analyze actual financial records from local school districts, rather than imputing these savings by using the per-pupil expenditure amounts used by state governments to allocate funding to local districts. This type of analysis of local district records has apparently not been performed with regard to voucher programs, (see U.S. General Accounting Office, 2001, regarding the Cleveland and Milwaukee voucher programs). However, the cost savings that might accrue if the largest voucher program (in Milwau-

The Cost-Effectiveness of Five Policies for Improving Student Achievement    31

kee) is tripled in size may be estimated from the cost savings generated as a consequence of the 9,870 Philadelphia public school students who transferred to charter schools by the 2000–2001 school year (Pennsylvania Economy League Eastern Division, 2001). The school district was able to recover 17.38 percent of the costs to educate the transfer students ($10,100,000 of the $58,120,000 costs, including the $3,220,000 share of transportation and administrative costs attributable to the students who transferred), implying that 82.62 percent of the costs are additional costs to society. However, there is no evidence that take-up rates for large-scale voucher programs would approach the take-up rate of Philadelphia’s charter school program and, thus, there is no evidence that savings would approach Philadelphia’s 17.38 percent figure. The real social cost of educating large numbers of students in private schools (who are currently educated in public schools) is difficult to estimate for several reasons: private school tuition figures exclude costs that are offset by corporate and noncorporate subsidies (U.S. General Accounting Office, 2001), as well as the cost of services that would be required by many students (and, by law, are currently provided by public schools, but not private schools), including transportation, free and reduced-price meals, special education, vocational education, and services for students with disabilities and limited English proficiency (C. R. Belfield, 2006; Levin, 1998; Levin & Driver, 1997). While comprehensive cost data are not available for private schools, such data are available for charter schools (Nelson, Muir, & Drown, 2003). Charter schools are subject to much of the same competitive cost pressures as private schools, resulting in teacher salaries that are substantially lower than traditional public school salaries (Nelson et al., 2003). For example, in Texas and Minnesota, charter school teacher salaries are more than $13,760 lower, and in Pennsylvania, more than $17,513 lower, in inflation-adjusted dollars, than comparable host-district teacher salaries (Nelson et al., 2003). Thus, charter school cost information was combined with estimates of cost savings resulting from students switching to private schools, and cost estimates for transportation, record-keeping, information systems to monitor and assess voucher eligibility of both students and schools, and adjudication services that would be required by a voucher system, to obtain an inflation-adjusted final estimate of the per-pupil cost to society when typical public school students use vouchers to transfer to private schools and receive comparable services: $9,646.01 (Table 2.5). If the effect size for voucher programs is 0.057 SD, the effectiveness-cost ratio is 0.00000591. Equation 3 indicates that the achievement gains per dollar from rapid assessment are 1,905 times the gains that accrue through voucher programs.

32    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement Table 2.5  Annual Per-Pupil Cost to Society of a Voucher Program Cost Category

2006 Dollars

Instruction, Administration, Facilities & Operations Food Service Special Education Transportation Cost Savings (17.38%) Voucher Record-Keeping and Monitoring Information Services Adjudication Services Othere

$7,098.88a $323.00a,b $1,082.55a,b $2,005.57c,d ($1,826.64) $68.19c $50.81c $49.82c $793.83a

Total

$9,646.01

Nelson et al. (2003), adjusted to 2006 dollars. Adjusted for the additional costs of mandatory services provided by public schools but not charter schools. c Levin & Driver (1997), adjusted to 2006 dollars. d Students frequently travel further to reach private schools, on routes and schedules incompatible with traditional public school bus transportation. In Cleveland, for example, some voucher students’ homes are too remote to be served efficiently by buses, forcing the Cleveland Municipal School District to pay an estimated $1,372–$2,058 per pupil per year to transport these students by taxi (an extra $800–$1,486 above the $572 cost of traditional school bus transportation) (Neas, 2001), adjusted to 2006 dollars. e Including compensatory, vocational, bilingual, and career and technology education. a

b

Costs of Charter Schools When students attend public charter schools, many of the costs to educate the students are identical to the costs of educating the same students in traditional public schools. In the simplest case, if a regular public school is converted to a charter school, and if revenues and expenditures (which are measures of resources expended by society on charter schools) remain identical, as well as enrollment and educational services, there are no additional costs (or savings) to society as a result of implementing the charter school. However, to the extent that charter schools are not simply converted public schools, but instead are newly created schools that require new facilities, new administrators, and new teachers, and pull students from widely scattered classrooms in traditional public schools, society pays for the creation and maintenance of a larger number of school-level administrative units, facilities, and teachers (Nelson, 1997). As illustrated by the calcula-

The Cost-Effectiveness of Five Policies for Improving Student Achievement    33

tions regarding voucher programs, only a fraction of these costs are offset through savings in the sending public schools and districts. The cost to hire charter school teachers will largely be a new cost with very little offset due to reduced need for teachers in the sending schools, which will rarely be able to reduce their own costs in proportion to the costs of establishing and maintaining the new charter schools. In essence, more facilities, administrators, and teachers are required to teach the same total number of students (Nelson, 1997). Thus, Horn and Miron (2000) found that when regular public schools lose scattered students to a charter school, there are no cost savings. However, each lost student took away $6,000 of state funding. The loss of this revenue was absorbed by the local district, typically through reduced funding of art, music, and extracurricular programs as well as support staff and services. Expenditures were reduced by cutting services rather than through savings attributable to student transfers to charter schools. For the purpose of the charter school cost analysis, the public school cost savings attributable to students transferring to charter schools, based on Philadelphia’s experience, was estimated to be 17.38 percent of the costs (including administrative and transportation costs) to educate transfer students (see above). The annual per-pupil, inflation-adjusted cost to society was estimated using a methodology similar to that used with regard to voucher programs, except that costs specific to voucher programs (voucher record-keeping, monitoring, information, and adjudication services) were dropped and the actual costs for student transportation incurred in Philadelphia were used, resulting in a figure of $8,086.30 per transfer student (Table 2.6). If the effect size for charter school programs is 0.005 SD, the effectiveness–cost ratio is 0.00000062. Equation 3 indicates that the achievement gains per dollar from rapid assessment are 18,155 times the gains that accrue through charter schools.

Costs of Accountability Estimating the costs of implementing stronger educational accountability is difficult, because there is no standard set of activities associated with increased accountability. However, the core attribute is the implementation of standardized tests where results are linked with consequences for students or schools. A majority of states have implemented high school exit exams, which students must pass in order to graduate from high school. By the year 2009, these requirements will affect 70 percent of students nationwide (Gayler et al., 2004). With few exceptions, the exams require mastery of material at the high school level (Center on Education Policy, 2005).

34    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement Table 2.6  Annual Per-Pupil Cost to Society of a Charter School Program Cost Category

2006 Dollars

Instruction Administration Facilities & Operations Food Service Special Education Transportation Cost Savings (17.38%) Otherd

$4,249.51a $1,404.01a $1,445.36a $323.00a,b $1,082.55a,b $322.09c ($1,534.05) $793.83a

Total

$8,086.30

Nelson et al. (2003), adjusted to 2006 dollars. Adjusted for the additional costs of mandatory services provided by public schools but not charter schools. c Pennsylvania Economy League Eastern Division (2001), adjusted to 2006 dollars. d Including compensatory, vocational, bilingual, and career and technology education. a

b

Thus, the analysis of the costs of implementing increased educational accountability focuses primarily on two costs: (a) the costs of implementing the standardized achievement tests and (b) the incremental social costs of denying diplomas to students who would otherwise receive them (due to the incremental increase in the number of students denied diplomas solely because exit exams raise the graduation bar). Costs are calculated as deviations from the pre-existing status quo (before the implementation of exit exams and higher graduation standards). If no additional students are denied diplomas as a result of raising the graduation bar, the only costs that would be included in the cost-effectiveness analysis would be the costs of administering the exams. The calculation of costs does not assume there are no benefits of raising the graduation bar; those benefits are measured in terms of increased student achievement. While some researchers assert that extraordinary increases in remediation services and teacher professional development will be necessary in order to meet the federal No Child Left Behind (NCLB) mandate that all students meet proficiency standards by the year 2014, the cost of these enhancements is not included in the present cost analysis because the enhanced level of services exceeds the level of educational services associ-

The Cost-Effectiveness of Five Policies for Improving Student Achievement    35

ated with the 0.05 SD estimate of student achievement gains, derived from achievement data from the 1992–2002 period, which is used in the present cost-effectiveness analysis (Hanushek & Raymond, 2005). Presumably, the implementation of these enhancements would be associated with an impact estimate for educational accountability that exceeds 0.05 SD. In 2003, the General Accounting Office estimated that if every state used its existing mix of multiple-choice and open-response question types, the cost of the assessments would be $3.9 billion over 7 years (U.S. General Accounting Office, 2003), excluding the costs of alternative assessments for students with disabilities, expenditures for English-language proficiency testing, and expenditures incurred by school districts, which could increase the total to $7 billion (National School Boards Association, 2003). Thus, the lower figure is a conservative estimate of the costs of implementing the state-mandated tests, equaling $4.3 billion over 7 years (or $618,407,341 per year) in 2006 dollars after adjusting for inflation, or $12.73 for each of the 48,574,000 students that were projected to be enrolled in 2006 (U.S. Department of Education, 2005d). Warren, Jenkins and Kulick (2006) provide the current best estimate of the incremental increase in the fraction of students who potentially would be denied diplomas as a consequence of implementing stronger educational accountability nationwide, defined as the implementation of exit exams that require mastery of high school level material. The achievement of 28 graduating classes between 1975 and 2002 was modeled based on data from each of the 50 states and the District of Columbia, yielding 1,428 state-years as the units of analysis for models estimating the effect of the implementation of exit exams. After adjusting for an array of covariates, high school completion rates were 2.1 percentage points lower in states with this type of exit exam, controlling for state and year fixed effects, implying that an extra 89,908 students would be denied diplomas annually (0.021 X national public student 9th grade enrollment of 4,281,345 in academic year 2004–2005 (U.S. Department of Education, 2005b). To calculate the economic value of the lost diplomas, I drew upon Flores-Lagunes and Light’s (2004, 2007) estimate of the “sheepskin” (signaling) effect, as well as Census Bureau estimates of lifetime earnings by level of education (Day & Newburger, 2002), to calculate the discounted present value of the lifetime difference in income (a measure of the economic output lost to society) between an individual who graduates from high school (but completes no further schooling) and an individual who does not graduate but is identical with regard to years of schooling, actual work experience, and age. The wage premium is 12.5 percent, equal to a

36    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

$95,869 increase over the $766,951 earned by a high school dropout over the course of a lifetime, in present value terms (Day & Newburger, 2002), assuming that real earnings grow at a rate equal to a discount rate of 3.5 percent (see Sullivan, 1997). The total cost to society for each cohort of 89,908 students who are denied diplomas each year is $8,619,390,052. Dividing the total cost of lost diplomas for each cohort by national public-student 9th grade enrollment of 4,281,345 in academic year 2004–2005 (U.S. Department of Education, 2005b), implies that it would cost $2,013.24 per pupil to raise the graduation bar. This figure is considerably higher than Yeh’s (2007) estimate. The total cost of increased educational accountability is the cost of the assessments ($12.73), plus the discounted present value of the cost to society of lost economic output as a result of the shift toward standards-based exit-exam requirements ($2,013.24), or $2,025.97. Combining this cost figure with Hanushek and Raymond’s (2005) 0.05 SD impact estimate suggests an upper-bound effectiveness-cost ratio equal to 0.00002468, implying that the achievement gains per dollar through rapid assessment are 456 times larger than the gains through accountability testing. Alternatively, this cost figure in combination with Grodsky et al.’s (2009) impact estimates suggests that the relative advantage for rapid assessment may be considerably underestimated.

Summary For each intervention, Table 2.7 summarizes the effect size, annual cost per student, total annual cost to implement the intervention for every student in U.S. public schools enrolled in prekindergarten through 12th grade, and the effectiveness–cost ratio. Rapid assessment is substantially more effective and less costly than the alternative interventions, resulting in an effectiveness–cost ratio that is between 152 and 18,155 times larger than the ratios for the alternatives. A dollar spent on rapid assessment raises achievement 18,155 times as much as a dollar spent on charter schools. While the magnitude of the relative effectiveness-cost ratios may appear to be improbable, they mainly reflect the very small effect sizes and large costs of increased spending, voucher programs, charter schools, and educational accountability. To draw an analogy, if “Investment A” returns $181.55 per year (equivalent to 18,155 cents), that investment is 18,155 times as profitable as “Investment B,” which only returns 1 cent per year. The cost-effectiveness of rapid assessment only seems improbable because it is being compared with highly ineffective and costly alternatives.

The Cost-Effectiveness of Five Policies for Improving Student Achievement    37 Table 2.7  Effect Size, Annual Cost per Student, Total Annual Cost, and Effectiveness–Cost Ratio Intervention Rapid Assessment Increased Spending Vouchers Charter Schools Accountability

Effect Size (SD)

Cost per Student

0.319 0.083 0.057 0.005c 0.050d

$28.34 $1,118.83 $9,646.01 $8,086.30 $2,025.97

Total Cost Effectiveness–Cost ($Billions) Ratioa $1.4 $54.3 $468.5b $392.8b $98.4

0.01125618 0.00007418 0.00000591 0.00000062c 0.00002468

Standard Deviation/Cost Per Student in 2006 Dollars If every student used a voucher or enrolled in a charter school, there might be additional cost savings after a period of adjustment involving the closure (or conversion) of public schools. However, there is no evidence that universal availability of voucher or charter school programs would substantially alter perpupil costs because existing take-up rates are low. c Using alternate estimate of the effect size (0.005 SD) after initial shakedown period. d Hanushek and Raymond’s 0.050 SD effect-size estimate. a

b

Sensitivity Analysis I A key assumption is that schools do not need to purchase additional computers and printers in order to implement rapid assessment. This assumption is based on research implying that virtually all classrooms would have had access to at least one online computer by the year 2006 (Parsad & Jones, 2005), plus the ability of all online computers to print from a highcapacity printer in a school’s media center (Reilly, no date), and was verified through interviews with teachers and observations of classrooms where rapid assessment was used. However, if each classroom requires new equipment, an entire system, including a computer, monitor, keyboard, mouse, software, service plan, and printer, might be purchased from Dell for $653. Assuming that a complete system is purchased for every classroom of 20 students and is amortized over a 7-year period, the annual cost per student is $4.66, raising the total cost of rapid assessment to $11.78 per student in reading and $21.22 in math, or a total of $33 per student. The effectivenesscost ratio falls to 0.00966667, but rapid assessment remains 130 times as cost-effective as a 10 percent increase in per pupil expenditure, 1,636 times as cost-effective as voucher programs, 15,591 times as cost-effective as charter schools, and 392 times as cost-effective as educational accountability.

38    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Sensitivity Analysis II A second assumption, based on interviews with teachers and observations of classrooms where rapid assessment is used, is that the rapid assessment software saves more teacher time (primarily time that would otherwise be spent grading math homework and assessing reading comprehension) than is consumed in noninstructional tasks such as supervising student use of the computer, scanner, and printer. However, if teachers do not save time and, instead, lose 15 minutes per day (or 1.25 hours per week), the annual cost is $81.06 per student, assuming 20 students per teacher and an annual teacher salary of $51,880, adjusted for inflation (U.S. Department of Education, 2005c). The total cost for rapid assessment increases to $109.40 per student, reducing the effectiveness-cost ratio to 0.00291590, but rapid assessment remains 39 times as cost-effective as a 10 percent increase in perpupil expenditure, 493 times as cost-effective as voucher programs, 4,703 times as cost-effective as charter schools, and 118 times as cost-effective as educational accountability.

Conclusion Comparisons of effect sizes suggest that rapid assessment is potentially a more promising approach for improving student achievement than increases in the preexisting pattern of spending, voucher programs, charter schools, or exit exams. The results presented above suggest that rapid assessment is 4 times as effective as a 10 percent increase in per-pupil expenditure, 6 times as effective as voucher programs, 64 times as effective as charter schools, and 6 times as effective as educational accountability. Rapid assessment is 152 times as cost-effective as increases in preexisting patterns of educational expenditures, 1,905 times as cost-effective as voucher programs, 18,155 times as cost-effective as charter schools, and 456 times as cost-effective as educational accountability. In concrete terms, the research findings suggest that improving student achievement in math and reading by an average of 0.319 SD (or approximately 3 months of learning) for every student would take less than 1 year and cost $1.4 billion, using rapid assessment. To achieve the same effect size through increased spending would take 4 years and $208.9 billion. Voucher programs would take 6 years and $2.6 trillion to achieve the same effect. Charter schools would take 64 years and $25.1 trillion, and increased accountability would take 6 years and $627.8 billion. These calculations suggest the magnitude of the differences in effectiveness and cost. If rapid assessment is more effective than the alternatives examined here, an obvious question is why it has not been more widely adopted.

The Cost-Effectiveness of Five Policies for Improving Student Achievement    39

Although teachers in 65,000 schools have adopted one or more of the rapid assessment programs (Northwest Regional Educational Laboratory, 2006), only a minority of those schools have adopted the programs schoolwide. Reading specialists are concerned that Reading Assessment does not include a measure of oral reading comprehension or teacher observations of reading behavior, that the Flesch-Kincaid reading index that is used to establish the readability level of trade books does not account for the reader’s level of interest in particular topics, and that reading comprehension is assessed through a multiple-choice test that does not allow for written responses (Biggers, 2001). Teachers are also concerned that student progress is measured through a point system that rewards students for reading long books with high levels of difficulty (determined by the Flesch-Kincaid readability index) without accounting for student effort; and company literature refers to the use of pizza lunches, skating parties, and recognition buttons to reward students who make rapid progress, posing the threat of extinction once these rewards are withdrawn and the possibility that the internal motivation to read is reduced (Biggers, 2001). Teacher praise and constructive feedback may be more effective (Biggers, 2001). In addition, specialists express concern that Reading Assessment is not an instructional program, does not mention the teacher’s role in providing direct instruction in reading strategies, is not a balanced reading program, some schools rely too heavily on it, and initial costs are substantial (Biggers, 2001). Specialists cite research suggesting that students achieve at higher rates when free-reading time is combined with direct instruction in reading activities and reading extension activities (Biggers, 2001). Specialists, as well as researchers, question the early research studies evaluating the effectiveness of Reading Assessment (Biggers, 2001; Krashen, 2003, 2005). Similar concerns have been expressed by math teachers about the Math Assessment program: it does not replace good instruction, can be misused, and does not foster critical thinking (WikEd, 2006), although research suggests that the program can be effective with gifted students, with an effect size of 0.45 SD compared with gifted students who do not receive Math Assessment (Ysseldyke, Tardrew, Betts, Thill, & Hannigan, 2004). Perhaps most importantly, since neither program is a complete curriculum, they are generally not approved for purchase with state funds (see, for example, California State Board of Education, 1999). .

Resolving these concerns probably depends on a better understanding of the strengths and limitations of rapid assessment. Rapid assessment is neither designed to supplant regular instruction nor to teach criticalthinking skills. Instead, it is better understood as a tool designed to support teachers, providing useful information about student progress so that

40    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

teachers might be more effective teachers of basic skills. However, to the extent that this reduces the amount of time required to teach those skills and improves student preparation for learning critical-thinking skills, rapid assessment may also help teachers who aim to spend more time on criticalthinking activities. Finally, it is likely that implementation matters. If teachers are inflexible about allowing students to read books outside their established reading levels, students may become frustrated. If teachers do not monitor students, it is possible that some of them will share their answers to the reading and math quizzes with other students. Students may choose to read books full of illustrations in order to rack up points. However, if rapid assessment is understood as a tool with specific strengths and limitations, and if it is implemented with care by teachers and principals, the available evidence suggests that it can be more effective in raising student achievement in math and reading than a 10 percent increase in existing patterns of educational spending, voucher programs, charter schools, or increased educational accountability.

3 The Cost-Effectiveness of Class Size Reduction

M

uch attention has been devoted to the hypothesis that reducing class size may improve student achievement in math and reading, yet there is a dearth of studies comparing the cost-effectiveness of class size reduction with other strategies. In principle, comparative cost-effectiveness analyses would permit the identification of the most efficient approach for raising student achievement. The implementation of the most efficient (instead of a less efficient) approach would conserve social resources that would then become available to pursue additional educational goals beyond the important goals of raising student achievement in math and reading (such as proficiency in art and science). One reason that comparative studies are difficult is that most educational approaches have multiple intended learning outcomes and, thus, it is not possible to ensure a complete match across all of the intended outcomes for each approach. Cost-effectiveness analysis is limited to a comparison along one primary dimension, such as math and reading achieve-

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 41–52 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

41

42    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

ment as measured by standardized tests. Researchers generally do not have access to the results of alternative measures, such as essays, that have been administered on a standardized basis and graded in a way that allows valid estimates of gains in achievement from pre- to post-test. However, the use of standardized tests to measure reading comprehension and math problem solving is standard practice in educational cost-effectiveness studies, because reading comprehension and math problem solving are key intended outcomes, and standardized tests are generally accepted as the best available measures, despite their flaws (Phelps, 2005). While the development and use of better measures of student achievement is an important topic, it is beyond the scope of current methods in cost-effectiveness analysis (see the standard text by Levin and McEwan, 2001), and the need for improved measures is not sufficient reason to discard cost-effectiveness techniques based on the use of standardized test results. The use of cost-effectiveness techniques to identify an efficient means to raise student achievement does not assume that math and reading achievement are the most important educational outcomes. Nor does it assume that standardized tests of math and reading are the best methods of measuring math and reading achievement. It only assumes that raising (math and reading) achievement is one of many important goals, that standardized tests represent one important way to measure achievement, and it is desirable to identify efficient ways of raising student achievement so that social resources are not wasted. The alternative is that society continues to rely upon costly, inefficient approaches that divert social resources from the pursuit of other worthy goals.

Empirical Studies A decade of research suggests that reducing class sizes may improve student achievement (Finn, Gerber, Achilles, & Boyd-Zaharias, 2001; Greenwald et al., 1996; Krueger, 2002; Maier, Molnar, Percy, Smith, & Zahorik, 1997; Molnar, Smith, & Zahorik, 1999, 2000, 2001; Nye, Hedges, & Konstantopoulos, 2001). This research, punctuated by Tennessee’s landmark Project STAR randomized field trial, led Wisconsin and California to launch class size reduction initiatives statewide. Evaluations of these initiatives have now been completed, along with RAND analyses of the costs of implementing class size reduction. Using these results, and using standard cost-effectiveness techniques (Levin & McEwan, 2001), the cost-effectiveness of class size reduction was compared with the cost-effectiveness of rapid formative assessment.

The Cost-Effectiveness of Class Size Reduction    43

Meta-Analyses Two major syntheses of the research literature regarding the effects of student–teacher ratio on student achievement have been completed. While both syntheses focused on studies of the effects of student–teacher ratio rather than class size reduction, they were included to demonstrate that the conclusion of the current analysis does not depend on whether the focus is on studies of student–teacher ratio or studies of actual class size reduction (see Table 3.2). Hanushek (1989; 1997; 1999) reviewed the evidence, including 277 separate estimates of the effect of student–teacher ratios on student achievement and found little, if any, effect associated with class size reduction: “This distribution of results, symmetrically distributed about a zero effect, is what one would expect if there were no systematic relationship between class size and student performance” (Hanushek, 1999, p. 147). Greenwald et al. (1996) criticized Hanushek’s vote-counting methodology, used a different meta-analytic approach, and drew a more optimistic conclusion, estimating an effect size of 0.0295 SD for every 3 student reduction in class size (no breakdown was given for math and reading separately). Thus, Greenwald et. al.’s (1996) figure may be considered an upper-bound estimate of the average impact of student–teacher ratio drawn from published meta-analyses of the research literature. In addition to these published meta-analyses, evaluations of three statewide class size reduction initiatives warrant special attention. Evaluations of Tennessee’s initiative are noteworthy because students and teachers were randomly assigned, within buildings, to small and large classes, permitting impact estimates with relatively strong internal validity. The California and Wisconsin initiatives are noteworthy because they provide data regarding the effects of large-scale class size reduction and also specific lessons about the dangers of attempting to generalize the results of the Tennessee experiment beyond the research context.

Tennessee In 1985, the Tennessee legislature funded a randomized experiment to evaluate class size reduction called “Project STAR,” in which a total of 6,500 K–3 students and their teachers were randomly assigned to three conditions within 79 schools: (a) a class of 13–17 students with a regular teacher, (b) a regular-sized class of 22–26 students with a regular teacher, and (c) a class of 22–26 students with a regular teacher and a teacher’s aide. Numerous analyses have been conducted of the STAR data. Perhaps the most positive

44    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

assessment, by Finn et al. (2001), employed hierarchical linear modeling, controlled for an array of covariates, and isolated the impact of class size reduction according to duration of exposure. In grade 2, the achievement advantage for students who participated in small classes for 1, 2, and 3 years was 0.12 SD, 0.24 SD, and 0.36 SD, respectively, in reading, or an average of 0.12 SD per year, and 0.16 SD, 0.24 SD, and 0.32 SD, respectively, in math, or an average of 0.129 SD per year. The average effect size across reading and math was 0.125 SD per year.

California California’s class size reduction initiative was implemented beginning in 1996. Local school districts received $650 (later increased to $800) for every student in a class of 20 or fewer students in the targeted grades (K–3). As a result, average class size was reduced from 29 to a maximum of 20 in the targeted grades (Bohrnstedt & Stecher, 2002). The state of California contracted with a consortium led by RAND and the American Institutes for Research to evaluate the effectiveness of the initiative. Early estimates suggested small positive effects. After controlling for an array of student, teacher, and school factors and adjusting for differences in prior achievement between 3rd-grade students in reduced and non-reduced size classes, the achievement of 3rd-grade students exposed to class size reduction for 2–3 years was slightly higher than the achievement of 3rd-grade students who were not exposed to class size reduction (Stecher & Bohrnstedt, 2000). After weighting and adjusting for the period of exposure to class size reduction, the annualized effect size was 0.0375 SD in math and 0.0188 SD in reading, or an average of 0.0282 SD per year. However, after additional analyses, the researchers concluded that “attribution of gains in scores to class size reduction is not warranted” (Bohrnstedt & Stecher, 2002, p. 6).

Wisconsin Wisconsin’s Student Achievement Guarantee in Education (SAGE) class size reduction program was implemented beginning in 1996. Initially, the program was limited to one high-poverty school in each eligible Wisconsin school district (except for Milwaukee, where ten schools were allowed). However, starting with the 2000–2001 school year, eligibility for the program was expanded to all schools serving grades K–3, with certain exceptions (Hruz, 2000). In exchange for $2,000 per student from the Wisconsin Department of Public Instruction, schools were required to reduce the student–teacher ratio within each classroom from the existing average of 22

The Cost-Effectiveness of Class Size Reduction    45

students per teacher (Allen & Kickbusch, 1997) to a ceiling of 15 students per teacher, beginning with kindergarten and 1st-grade in 1996–1997, followed by 2nd-grade in 1997–1998 and 3rd-grade in 1998–1999. Of the 356 SAGE classrooms in 1998–1999, a total of 264 (72%) were regular classrooms with one teacher, while 69 (19%) were conducted with a two-teacher team teaching up to 30 students, and the remaining 23 (6%) employed various other strategies for achieving the 15-to-1 ratio, including the use of a floating teacher during instruction of specific subjects (Hruz, 2000). Two major evaluations have evaluated the effectiveness of Wisconsin’s class size reduction initiative. The first evaluation, conducted by a team from the University of Wisconsin-Milwaukee, reported significant gains in student achievement for students exposed to class size reduction, compared with students who were not exposed (Maier et al., 1997; Molnar et al., 1999, 2000, 2001; Smith, Molnar, & Zahorik, 2003; Zahorik, Molnar, Ehrle, & Halbach, 2000). However, the results of the most recent evaluation of SAGE, conducted by a team from the University of Wisconsin-Madison, suggest that it may not be valid to link these gains in achievement with program participation (Webb, Meyer, Gamoran, & Fu, 2004). This multilevel, longitudinal, quasiexperimental evaluation study controlled for ethnicity, socioeconomic status, enrollment, prior achievement, and school effects. Most importantly, the design of the study permitted the researchers to examine whether the previously reported gains may be due to a Hawthorne effect. Using the Comprehensive Test of Basic Skills (CTBS) TerraNova measure administered only to the limited group of students participating in the research study, the annualized effect size for 3rd-grade students exposed to class size reduction for 3 years was 0.1206 SD in math and 0.0617 SD in reading, or an average of 0.0912 SD per year, compared with students in matched schools who were not exposed to class size reduction. This result is consistent with the earlier evaluation of SAGE, which relied on the same measure. However, there was no significant gain in achievement over the same time period on the Wisconsin Reading Comprehension Test (WRCT), administered to all Wisconsin students, suggesting that the gain on the 3rd-grade TerraNova might be due, for example, to the desire of class size reduction teachers to demonstrate the effectiveness of class size reduction, perhaps translated into greater encouragement of their students to do well on the 3rd-grade TerraNova post-test. This interpretation is supported by the results of the 4th-grade TerraNova, administered to all Wisconsin students. The achievement of 4th-grade students exposed to class size reduction for 3 years was not significantly different than the achievement of 4th-grade students who were not exposed to class size reduction. The annualized effect size was in-

46    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

significant: 0.0121 SD in math and -0.0060 SD in reading, or an average of 0.0031 SD. The pattern of results is consistent with the interpretation that early reports of positive gains were due to a Hawthorne effect rather than the class size reduction program. Alternatively, the results suggest that gains in early grades fade away by 4th grade.

RAND Cost Estimates The best available estimates of the costs of class size reduction were derived from two major analyses conducted by RAND (Brewer, Krop, Gill, & Reichardt, 1999; Reichardt, 2000). The researchers used administrative data to estimate the number of additional classrooms that would be needed to reduce class sizes, multiplied this figure by the estimated cost per classroom, then derived an annual cost per student of various class size reduction policies. However, there are important differences in the assumptions underlying these two analyses, leading to large differences in estimated costs. The first analysis, conducted by Brewer et al. (1999), assumed that class size reduction was implemented as a district-level target across grades 1–3 as a single group. For example, if the target was 20 students per class, class size was permitted to vary above this target for half of the 1st-, 2nd-, and 3rdgrade classrooms in each district, as long as class size was below the target for the remaining classrooms. This policy provides tremendous flexibility, reduces the need to hire teachers, and greatly reduces the estimated cost of class size reduction. In contrast, the second analysis, conducted by Reichardt (2000), analyzed the cost of class size reduction assuming that the average class size within each grade, within each building, was not permitted to exceed a specified ceiling, such as 20 students per classroom. This policy is much more restrictive than the policy assumed in the first RAND analysis, requires the hiring of more teachers, and leads to much higher estimated costs. Thus, for the purpose of matching cost estimates to effect-size estimates, it is essential to determine which of these policy assumptions most accurately reflect real-world class size reduction policies—the policies under which the effect-size estimates were generated. In Tennessee, each individual classroom assigned to the class size reduction treatment was limited to a ceiling of 17 students. In California, funds had to be repaid if the average enrollment count in any class size reduction classroom exceeded 20.44 from the first instructional day through April 15 of each school year (California Department of Education, 2007). In Wisconsin, “any instance” in which a SAGE teacher became responsible for teaching more than 15

The Cost-Effectiveness of Class Size Reduction    47

students had to be reported, and payments were reduced if the student– teacher ratio exceeded 15 (Wisconsin Department of Public Instruction, 2007b, p. 3). None of these class size reduction policies permitted class size (or student–teacher ratio, in Wisconsin) in any individual classroom to exceed the specified ceilings. Each of these policies was much more restrictive than the policies investigated in either of the two RAND cost analyses. These restrictive policies required the hiring of more teachers than assumed by either RAND analysis because it was not permissible to balance large classes that exceeded the class-size target with smaller classes of students (for example, in special education). Every single class had to meet the class-size ceiling in order to qualify as “reduced.” Therefore, the Reichardt (2000) cost estimates—although they are higher than the estimates by Brewer et al. (1999)—are probably too low. They do not accurately reflect the full cost of implementing class size reduction in Tennessee, California, or Wisconsin. In addition, these cost figures are underestimates to the degree that large-scale class reduction would increase the demand for teachers, induce higher teacher salaries, and raise the costs of reducing class sizes. Thus, the cost-effectiveness analysis presented below, which uses Reichardt’s (2000) lower-bound cost estimates, represents an upper-bound estimate of the cost-effectiveness of class size reduction. Of the policies that Reichardt (2000) analyzed, the cost estimates for the “single grade” ceiling policy, where the average class size within each grade within each building, must not exceed a specified numerical limit, are the best available lower-bound estimates of the full cost of implementing class size reduction. Table 3.1 lists the average annual cost per enrolled student of every one-student reduction in class size, using leased, relocatable classrooms, down to class size limits of 15, 17, and 20 students. Table 3.1  Average Annual Cost per Enrolled Student for Every 1-Student Reduction in Class Size Down to Class Size Limits of 15, 17, and 20 Students Class Size Ceiling Cost

15

17

20

$211.56

$197.04

$199.91

Source: Reichardt (2000), Table 3.4. Figures translated into the cost of reducing class size by 1-student increments based on Reichardt’s initial class size of 24.1 students. Reichardt did not explain why the cost to reduce class size is cheaper for a reduction to a ceiling of 17 students, compared with a ceiling of 20 students. Adjusted for inflation using the September 1997 price index (161.2) and the August 2006 price index (203.9).

48    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Summary To summarize, the average effect size of class size reduction across reading and math was 0.125 SD per year in Tennessee and negligible in California. The impact in Wisconsin might be attributable to Hawthorne effects. Of the two RAND cost estimates, the best available lower-bound estimate, by Reichardt (2000), suggests that the annual cost of class size reduction ranges between $197.04 and $211.56 for every one-student reduction in class size. Judging whether this is an efficient use of social resources depends on the available alternatives. Table 3.2 lists four estimates of the effectiveness–cost ratio for class size reduction. The highest effectiveness–cost ratio is 0.00009063, based on the 0.125 SD effect-size estimate from Finn et al. (2001), and reduction of class sizes from an average of 24 to a ceiling of 17 students. In contrast, rapid formative assessment is much more cost-effective. If the average effect size for rapid formative assessment is 0.319 SD, and the average annual cost per student is $28.34, the effectiveness–cost ratio is 0.01125618. The effectiveness–cost ratio for rapid formative assessment is 124 times as large as the effectiveness–cost ratio for class size reduction from an average of 24 to a maximum of 17 students. In other words, student achievement would increase 124 times faster for every dollar invested in rapid formative assessment rather than class size reduction. Note that the advantage for rapTable 3.2  Effectiveness–Cost Ratios for Various Class Size Reduction Effect Size and Cost Estimates Source Meta-analysis (Greenwald et al., 1996) Tennessee (Finn et al., 2001) California (Stecher & Bohrnstedt, 2000) Wisconsin (Webb et al., 2004)

Effectiveness–Cost Ratioa 0.00004648b 0.00009063c 0.00001567d 0.00000209e

Student achievement in standard deviation units divided by annual cost per student in dollars. b Based on effect size of 0.0295 SD for every 3-student reduction in student–teacher ratio and projected reduction of class size from 22 to a ceiling of 15 students per grade. c Based on effect size of 0.125 SD and reduction of class size from 24 to a ceiling of 17 students per grade. d Based on effect size of 0.0282 SD and reduction of class size from 29 to a ceiling of 20 students per grade. e Based on effect size of 0.0031 SD and reduction of class size from 22 to a ceiling of 15 students per grade.

a

The Cost-Effectiveness of Class Size Reduction    49

id formative assessment remains substantial, whether the cost-effectiveness of class size reduction is based on actual policies in Tennessee, California, or Wisconsin, or the estimated effect of changes in student–teacher ratio drawn from the Greenwald et al. (1996) meta-analysis. This result is consistent with Hattie and Timperley’s (2007) finding that performance feedback is more effective than an array of alternative interventions.

Discussion In concrete terms, the research findings suggest that improving student achievement by 0.319 SD (or approximately 3 months of learning) for every American student would take less than 9 months and cost $1.4 billion using rapid formative assessment ($28.34 multiplied by projected 2006 PK–12 enrollment of 48,574,000 students). To achieve the same effect size through class size reduction would take 2.6 years and cost $174.2 billion (the cost to reduce class size by 7 students, or $1,379.28 per student, multiplied by 48,574,000 students and 2.6 years). These calculations suggest the magnitude of the differences in effectiveness and cost. However, the calculated difference is conservative, because the effect-size estimate for class size reduction is drawn from Project STAR. The special characteristics of STAR, together with the evidence from California and Wisconsin, suggest why the STAR results may not generalize to routine implementations of class size reduction. First, Project STAR involved only a small fraction of classes in Tennessee (Brewer et al., 1999). Schools were permitted to participate only if they had sufficient classroom space (Brewer et al., 1999), served a minimum of 3 classes per grade level, and if principals, teachers, and students agreed to random assignment (Finn et al., 2001). Furthermore, of every 3 classes that participated in the experiment, only one was actually reduced in size to 13– 17 students. The other two classes were “large” classes of 22–26 students. Thus, the Tennessee experiment did not require the hiring of large numbers of new teachers and did not strain the supply of experienced, qualified teachers in Tennessee (Brewer et al., 1999). As a result, teachers could be hired for the small number of additional classes that were required by the experiment without reducing teacher quality. In California, however, class size reduction was implemented statewide in grades K–3, necessitating the hiring of 21,000 teachers, including many inexperienced individuals with emergency credentials and no formal qualifications, the greater majority serving poor and minority students (Schrag, 2007). Prior to the implementation of class size reduction, barely 2 percent of primary teachers lacked full credentials; by 1999–2000, that figure had risen to nearly 14 percent statewide and to roughly 22 percent in schools in

50    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

which 30 percent or more of the students came from low-income families (Schrag, 2007). Thus, the expansion of the teaching force required to staff the additional classrooms led to a deterioration in average teacher quality in schools serving a predominantly African American student body, offsetting the benefits of smaller classes (Jepsen & Rivkin, 2002). Furthermore, the cost of large-scale class size reduction did not permit reduction of class sizes to the range of 13–17 students—the range that was found to be effective in Tennessee. The data from California arguably provide a more realistic picture of the benefits that can be obtained through large-scale class size reduction under prevailing labor market and fiscal conditions. Since class size reduction was less than one fifth as cost-effective in California compared with Tennessee, the advantage of rapid formative assessment over class size reduction may be five times greater than estimated above, based on the Tennessee results. There is a second reason to doubt the generalizability of the STAR results beyond the research context. Recall that in Wisconsin, positive results on a measure administered specifically for the evaluation of SAGE were not confirmed by a contemporaneous measure administered to all Wisconsin students and were not confirmed by a measure administered 1 year later (again, to all Wisconsin students). These results suggest the possibility of a Hawthorne effect. Specifically, teachers in the class size reduction condition may have exerted extra effort to encourage their students to perform well on the special SAGE measure, knowing that if the results of the evaluation were positive, the program would likely be expanded, guaranteeing smaller class sizes across Wisconsin, reducing teacher workloads, and increasing the employment of teachers (which is exactly what happened in Wisconsin). The Wisconsin results suggest that research regarding class size reduction—including the STAR evaluation—may be vulnerable to a “John Henry” threat to internal validity.1 The evidence for a Hawthorne effect extends beyond the Wisconsin data (Buckingham, 2003). Hoxby (2000a) employed two analytical methods that are not vulnerable to Hawthorne effects to examine the relationship between class size and student achievement in Connecticut. Using the first strategy, Hoxby modeled the relationship between natural variations in class size and student achievement, using the fact that random variations in cohort size cause variations in class size. Thus, cohort size was used as an instrumental variable to sever the link between class size and other factors that influence both class size and student achievement.2 Using the second strategy, Hoxby compared the achievement of adjacent cohorts before and after abrupt changes in class size at a given grade level, using the fact that class size jumps abruptly when a class has to be added to, or subtracted

The Cost-Effectiveness of Class Size Reduction    51

from, a grade because enrollment has triggered a maximum class size rule. Using data from 649 elementary schools covering a period of 24 years, neither approach yielded a significant effect of class size on student achievement, even though power was sufficient to detect class size effects as small as 0.02 to 0.04 SD. Hoxby concluded that Hawthorne effects may explain the positive outcomes of experiments such as Project STAR and the experiments included in the Greenwald et al. (1996) meta-analysis: Since Connecticut school staff were unaware of the natural experiment, they could not have reacted to the evaluation. Explicit policy experiments may work differently because of Hawthorne effects or other reactive behavior on the part of participants. (p. 1281)

If all class-size experiments are vulnerable to Hawthorne effects, the effects reportedly achieved in Tennessee may be a mirage. Using the regression discontinuity approach based on maximum class size rules, Wößmann (2007) analyzed the effect of variations in class size across 10 countries and found that “In all ten countries, the possibility that the estimated class size effects are as large as Krueger’s estimate for Project STAR can be rejected with more than 99 percent statistical confidence” (p. 264). Eight of the 10 countries yielded estimates of class size effects that are not statistically significantly different than zero. Wößmann concluded: “There is no single country for which any of the specifications detects a statistically significant and large class-size effect” (p. 265). Therefore, “In all countries where the identification strategies can be implemented with reasonable precision, substantial class-size effects can be ruled out” (p. 267). Levin (2001) also used a regression discontinuity, instrumental variable approach to control for endogeneity of class size based on a rule applied by the Dutch Ministry of Education, which links the number of teachers that are funded at a school to each school’s enrollment. Based on a nationally representative sample of 400 Dutch primary schools, Levin concluded that “no consistent evidence is found in support of the conventional wisdom that class size reductions are beneficial to scholastic achievement” (p. 235). The point of this discussion is that it would be a mistake to place more reliance on the Tennessee STAR results, or even the Greenwald et al. (1996) meta-analysis, than the results from California and Wisconsin. Research studies provide initial evidence about what may be expected when an intervention is scaled-up. But once data are available from large-scale implementations of the intervention, under real-world conditions, without the idiosyncratic limitations imposed by researchers and the powerful incentives that are created when research participants have a vested inter-

52    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

est in the outcome, the real-world data trumps even the most finely tuned random-assignment study. The results from California and Wisconsin, in combination with the results regarding rapid formative assessment, suggest that further research regarding class size reduction is unlikely to be fruitful. Attention should be turned to rapid formative assessment and other more promising alternatives. A fair question is whether evaluations of rapid formative assessment may also be vulnerable to Hawthorne effects. However, the incentives are likely to be different. Teachers know that experiments that successfully demonstrate the positive impact of class size reduction are likely to lead to expansion of the policy statewide (as in Wisconsin). Class size reduction offers the promise of fewer students per teacher, reduced workloads, and dramatically improved employment prospects. Thus, the incentives for teachers participating in class size reduction experiments are very strong. But once the policy has been adopted statewide, as in California, the special incentives disappear. Thus, the data from California are especially relevant in assessing the likely impact of implementing class size reduction under typical conditions, without the artificial carrots that are inevitable in a research experiment. In contrast, rapid formative assessment provides a technology that improves teacher effectiveness (as measured by improvement in student achievement), but neither reduces teacher workloads nor improves teachers’ employment prospects. It would be a mistake to blame teachers or policymakers for enthusiasm regarding class size reduction. What is needed is data regarding the relative cost-effectiveness of the alternatives. In the absence of that data, teachers as well as policymakers have, in the past, been forced to make their own assessments about what works.

4 The Cost-Effectiveness of Comprehensive School Reform

C

omprehensive school reform (CSR) may be defined as externally developed school improvement programs known as “whole-school” or “comprehensive” reforms emphasizing a coherent vision of education, a challenging curriculum, and high expectations for academic achievement. CSR is often implemented at the elementary school level, although several models target middle or high schools. CSR typically involves intensive staff development, increased attention to instruction and the needs of individual students, and parent involvement. These programs originated in 1991, when President George H. W. Bush announced the creation of a private-sector organization called the New American Schools Development Corporation (NAS), which was intended to support the creation of “break the mold” models of American schools that would enable all students to achieve world-class standards in core academic subjects (Kearns & Anderson, 1996). NAS solicited and received nearly 700 proposals in February 1992. Eleven were chosen for a 3-year program of development and testing. Subsequently, NAS dropped four models but provided more than $150 million over the past decade to develop and “scale up” implementation of the remaining seven models. The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 53–71 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

53

54    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

In 1997, the U.S. Congress spurred the development of CSR by passing legislation to fund the Comprehensive School Reform Demonstration (CSRD) Program, which provided $50,000 per year for 3 years to qualifying schools. In 2001, the reauthorization of Title I limited CSR funding to “scientifically based” whole-school reform models, increasing pressure on CSR developers to show that the models improved student achievement (U.S. Department of Education, 2002b). Congressional appropriations for CSR totaled $1.9 billion from 1998 to 2006 (U.S. Department of Education, 2004, 2006), in addition to over $150 million provided by NAS (Borman, Hewes, Overman, & Brown, 2004). Thus, funding for CSR has totaled well over $2 billion. CSR has expanded to include over 800 different reform models and has been implemented in 5,160 schools nationwide (Rowan, Barnes, & Camburn, 2004). It is estimated that somewhere between 10 percent and 20 percent of all elementary schools in the United States have adopted an external model of CSR or are working with their own locally developed model (Rowan et al., 2004). In all, 45 percent of all Title I schools, and 80 percent of the highest-poverty schools, operate a schoolwide program (Heid & Webber, 1999). The theory of action underlying CSR is that comprehensive changes are needed in multiple areas, including staff attitudes and school organization, as well as curriculum and instruction in order for these changes to be effective in improving student achievement (Borman et al., 2004). It is believed that the impact of isolated changes in any individual area is likely to be undermined by dysfunction in other areas. For example, changes in attitudes and organizational behaviors may be ineffective without improvements in curriculum and instruction, and improvements in curriculum and instruction can easily be undermined by dysfunctional attitudes and organizational behaviors. Thus, the theory of action underlying CSR is that all of these areas must be simultaneously addressed in order to improve student achievement. The attraction of CSR may be explained by the plausibility of this grand model and the tendency to emphasize dramatic reforms that are highly visible symbols of a commitment to change (Tyack & Cuban, 1995). However, while the notion that public schools need to be overhauled from top to bottom may be attractive, the history of school reform suggests that incremental reforms are more likely to persist than dramatic reforms, because incremental reforms are more easily integrated into existing structures and routines (Tyack & Cuban, 1995). Enormous resources have been devoted to CSR, yet the results have been meager. There is a need to re-examine the cost and effectiveness of CSR and to compare its cost-effectiveness with promising alternatives such as rapid assessment.

The Cost-Effectiveness of Comprehensive School Reform    55

Evidence Regarding CSR Evidence regarding the effects of CSR has emerged very slowly. Early schoolwide reforms failed to produce compelling evidence of improved student achievement (Wong, 2001; Wong & Meyer, 1998). As a result, reviews of the research literature were limited to practitioner-oriented summaries of the general attributes of the CSR models, the level of support provided by developers, the costs associated with implementing the models, and narrative appraisals of the research supporting each CSR design (see Herman, 1999; Northwest Regional Educational Laboratory, 1998, 2005; Slavin & Fashola, 1998; Traub, 1999; Wang, Haertel, & Walberg, 1997). These reviews did not provide quantitative meta-analyses of the overall effects of CSR nor the effects of the various CSR models. More recently, however, Borman et al. (2003) synthesized “all known research on the achievement effects of the most widely implemented, externally developed school improvement programs known as ‘whole-school’ or ‘comprehensive’ reforms” and conducted the first meta-analysis of CSR effects (p. 126). This exhaustive review of 232 studies of achievement, regarding 29 of the most widely implemented CSR models, found that several models produced large effect sizes. However, the number and methodological quality of the research studies regarding those models was not sufficient to draw firm conclusions (Borman et al., 2003). Three models met the highest standard of evidence and “are the only CSR models to have clearly established, across varying contexts and varying study designs, that their effects are relatively robust and that the models, in general, can be expected to improve test scores” (Borman et al., 2003, p. 168). The best estimate of effect size is derived by examining studies involving comparison groups.1 These effect sizes were small: Direct Instruction (effect size = 0.15 SD), the School Development Program (effect size = 0.05 SD), and Success for All (effect size = 0.18 SD). Acknowledging disappointing results, researchers conceded that “CSR . . . is not the panacea for closing the achievement gap and decreasing high school drop out rates for poor and historically underserved students of color” (Ross & Gil, 2004, p. 170).

Unexplained Variation in Results Borman et al. (2003) noted wide variation in achievement that was not explained by the CSR models. The results suggest that CSR models fail to account for important factors that influence student achievement:

56    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement The heterogeneity of the CSR effect and the fact that few of the general reform components helped explain that variability suggest that the differences in the effectiveness of CSR are largely due to unmeasured programspecific and school-specific differences in implementation. (Borman et al., 2003, p. 166)

Furthermore, several factors previously presumed to influence effect sizes were not significant: Our regression analysis suggested that whether a CSR model, in general, requires the following components explains very little in terms of the achievement outcomes the school can expect: (a) ongoing staff professional development; (b) measurable goals and benchmarks for student learning; (c) a faculty vote to increase the likelihood of model acceptance and buy-in; and (d) the use of specific and innovative curricular materials and instructional practices designed to improve teaching and student learning. (Borman et al., 2003, p. 166)

The insignificant impact produced by explicitly requiring ongoing staff development, educational standards, faculty buy-in, and innovative curricula and instruction suggests that incorporating these four components may not significantly improve outcomes. What explains the lack of effects? One possibility is that among schools where the requirements are in place implementation of these components may be poor. While this may be the case, over $2 billion have been invested in implementation of CSR since 1991 (Borman et al., 2004; U.S. Department of Education, 2004, 2006; Vernez, Karam, Mariano, & DeMartini, 2006). If this massive investment has failed to ensure strong implementation, it seems unlikely that further improvements will be easily achieved. A second possibility is that, among schools that are not subject to the requirements, teachers are already highly committed, are participating in professional development activities, and are implementing educational standards and innovative curricula and instruction. This would explain why there is little difference in outcomes between schools that do and those that do not require these components. Essentially, these components may already be in place in most schools. This interpretation is supported by a RAND study of 250 schools that implemented either the Accelerated Schools Model, Core Knowledge, Direct Instruction or Success for All, and found that the schools tended to engage in the same types of activities regarding curriculum, methods of instruction, student groupings, governance, assessment of students, and parent involvement compared with schools that had not implemented a comprehensive school reform model (Vernez et al., 2006).

The Cost-Effectiveness of Comprehensive School Reform    57

On average, all schools engaged in these activities at the same frequency or level of intensity (Vernez et al., 2006). In general, teachers and principals feel tremendous pressure to raise student achievement (Pedulla et al., 2003) and, arguably, must be highly committed to pursue teaching as a career, given the low pay and stressful working conditions. Thus, levels of commitment may not vary significantly between schools that have and those that have not implemented a CSR model. Furthermore, teachers routinely participate in professional development activities. For example, 98.3 percent of all teachers in the nationally representative Schools and Staffing Survey indicated that they had participated in some form of teacher professional development within the 12 months prior to the survey, and 72.6 percent participated in “regularly scheduled collaboration with other teachers” (Choy, Chen, Bugarin, & Broughman, 2006, Table 16, p. 49). Thus, participation in professional development activities may not vary significantly between the two groups of schools. Similarly, the percentage of schools that implement educational standards is unlikely to differ between the two groups, because all schools came under strong pressure to implement standards starting with the passage of Goals 2000 in 1994 (U.S. Department of Education, 1998) and subsequently reinforced by the requirements of the federal No Child Left Behind Act of 2001 (U.S. Department of Education, 2002a). A study involving a nationally representative sample of public elementary and secondary schools found that 72 percent of all principals reported using content standards to a great extent to guide curriculum and instruction in reading and mathematics (Heid & Webber, 1999). Furthermore, “there were no significant differences in the use of content standards between Title I and non-Title I schools or between schoolwide programs and targeted assistance schools or between highest-poverty and low-poverty schools” (Heid & Webber, 1999). A nationally representative survey of teachers in Title I schools found that 79 percent reported teaching to standards in reading and 66 percent reported teaching to standards in math (U.S. Department of Education, 2002c). Finally, the rate at which schools implement innovative curricula may not differ across the two groups of schools because the pressure to raise test scores as a result of the No Child Left Behind Act exerts tremendous pressure on teachers and principals to seek out innovative curricula. Many schools are rapidly adopting innovative curricula, regardless of any requirement to do so. In addition to the 1,800 schools across the United States that have adopted Success for All (Borman et al., 2004), numerous schools have adopted other innovative curricula. By 2003, over 2.8 million elementary school students used the National Science Foundation (NSF)-funded Everyday Mathematics Program (University of Chicago School Mathematics

58    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Project, 2003). Students in 5,000 middle schools used NSF-funded math programs (Clayton, 2000), including students in over 2,200 school districts that have adopted the Connected Math Program (National Science Foundation, undated), and at least 500 high schools used the NSF-funded Core-Plus Program (Clayton, 2000). A third possibility is that staff development, educational standards, faculty buy-in, and innovative curricula and instruction—as currently designed and implemented—are simply inadequate for the purpose of improving student achievement. This is perhaps the most straightforward interpretation of Borman et al.’s (2003) results. Variation in these components does not explain variation in outcomes, and the addition of these components is unlikely to improve outcomes. The addition of components designed to involve parents may even have a negative impact: “The one reform attribute that was a statistically significant predictor of effect size suggested that CSR models that require the active involvement of parents and the local community in school governance and improvement activities tend to achieve worse outcomes than models that do not require these activities” (p. 95). This finding is consistent with a previous review of 41 evaluation studies of parental involvement, which found “little empirical support for the widespread claim that parent involvement programs are an effective means of improving student achievement” (Mattingly, Prislin, McKenzie, Rodriquez, & Kayzar, 2002, p. 549). Another puzzling result is inconsistent with the basic CSR assumption that broad, whole school changes are necessary. The effect sizes for Success for All (d = 0.18) and Direct Instruction (d = 0.15), both of which focus primarily on changing a school’s instructional practices, exceed the average effect size for all CSR models (d = 0.12), including models where changes in instruction are only one component of much broader school reforms. This result undermines the argument that broad reforms are more effective than reforms that focus narrowly on changes in instruction, and suggests instead that narrowly focused models may be somewhat more effective. At the same time, the small effect sizes for the narrowest models (Success for All and Direct Instruction) suggest that narrowing the CSR approach to focus on instruction is unlikely to result in sharp improvement in student achievement.

Summary In sum, Borman et al.’s (2003) results cast doubt on the thesis that the meager gains from CSR can be significantly improved by fine-tuning the models. Neither adding nor subtracting components promises to substan-

The Cost-Effectiveness of Comprehensive School Reform    59

tially change CSR outcomes. The unavoidable implication is that CSR models, as currently implemented, are a flawed strategy for improving student achievement. The small, uneven effects suggest that we cannot reliably expect large improvements from implementing any of the CSR models. The meager returns to the $2 billion already invested in CSR implementation efforts suggest that any further gains in student achievement will not be easy and will rely on future breakthroughs in overcoming the barriers inherent in implementing large, complex whole-school reforms. For these reasons, it would be unrealistic to expect that improved implementation of existing CSR models will significantly improve student achievement. Instead, these results suggest a need to examine the cost-effectiveness of CSR compared with promising alternatives such as rapid assessment.

Cost-Effectiveness of CSR For the purpose of the cost-effectiveness analysis, I drew upon Borman et al.’s (2003) impact estimates for each of the 29 CSR models included in the meta-analysis. Importantly, these estimates were obtained over varying periods of program implementation and, thus, are not directly comparable across the 29 CSR models. Borman et al. reported information regarding the average duration of implementation for each model, but this information does not correspond to the duration of the constituent research studies in the meta-analysis and, thus, cannot be used to annualize the reported effect sizes. Since some of the constituent research studies were conducted over multi-year periods, the reported effect sizes are only upper bound estimates of the annualized effect sizes. Special attention is given to the three CSR models for which there is reliable impact evidence: Success for All, the School Development Model, and Direct Instruction. Cost information was drawn from three sources (Herman et al., 1999; King, 1994; Odden, 2000). King (1994) conducted a detailed, refereed cost analysis of three CSR models including Success for All and the School Development Model. A second cost analysis of the same three models lacks detail, was not refereed, and was excluded (Barnett, 1996). Cost information for Direct Instruction as well as several other CSR models was adapted from Herman et al. (1999). For comparison purposes, Odden’s (2000) lower-bound estimate of the costs of the major CSR models was also provided. All costs were adjusted for inflation to the same period (August 2006) using the consumer price index (CPI). Ideally, the Elementary/Secondary Price Index would be used to adjust the costs of educational inputs for inflation. However, this index is not available after 1994. The use of the CPI significantly underestimates inflation

60    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

in the costs of educational inputs, but the distortion is less than the distortion when using traditional measures, such as the gross domestic product (GDP) price index. For example, the costs of elementary and secondary educational inputs increased by 224 percent between 1974 and 1994, while the CPI and the GDP price index increased 190 and 173 percent, respectively (U.S. Department of Education, 1997, Table 38). Thus, the CPI is the best available measure of inflation in the costs of educational inputs. CSR involves significant changes in the way schools are organized and operated, typically including wholesale changes in staffing patterns, curriculum, and instruction. Thus, CSR typically requires large increases in staffing and training costs. These changes are reflected in King’s (1994) estimates of the annual per-pupil costs of the additional staffing and training required for implementing Success for All and the School Development Model during the initial year of model implementation. Differences in costs are related to differences in stated objectives: Success for All emphasizes changes in instructional practices, including substantial amounts of individual tutoring, while the School Development Model aims to foster children’s social and emotional development and improvements in school climate. King’s figures were adjusted downward to estimate average annual ongoing costs by amortizing initial fixed fees and high initial training costs over the expected life of the programs. In general, unless the developer indicated otherwise, annual ongoing training costs were assumed to be half the year 1 costs. This assumption reflects the need to provide ongoing training of existing staff and to train new staff who are hired during the life of the program, but may underestimate the true ongoing costs. The expected 6-year life of the CSR programs was based on data suggesting that nearly one third of a sample of 395 urban, disadvantaged, lowachieving elementary and middle schools dropped or switched CSR models within a 2-year period (Taylor, 2006) (analogously, U.S. Senate terms last 6 years, and we observe that one third of all Senate seats turn over every 2 years). Thus, initial fixed fees and high initial training costs were amortized over a 6-year period. King’s (1994) data were adjusted by monetizing the value of the additional teacher and principal time that King estimated would be necessary to implement the CSR models, based on average teacher and principal salaries. At an average teacher salary of $51,880 per year and average of 184 work days per year, teacher time is valued at $1.76 per hour per student, assuming a class size of 20 students. Similarly, principal time was valued at $0.09 per hour per student, assuming an average principal salary of $75,857, a contract lasting 42 weeks per year, and 500 students per building.

The Cost-Effectiveness of Comprehensive School Reform    61

Following King, the opportunity costs of increased parent and student time that would be required to implement the CSR models were not monetized, because these costs are difficult to value. Therefore, the cost estimates presented here are underestimated by the value of that time. Herman et al. (1999) provide the best available cost estimates for Direct Instruction, based on cost data from a sample of four sites using the model as well as cost data from the developer (cost data are also available from a more recent source but are based solely on information from each developer’s Web site and do not include release time for peer coaches and staff training, conferences, curricular materials for students, or travel to model schools [American Institutes for Research, 2006]).2 The developer estimated that all 25 teachers (in a school of 500 students) would require release time of 9.5 days in the first year and 4.5 days in each following year. I amortized the extra 5 days of release time in the first year over the expected 6-year life of the CSR program. Direct Instruction emphasizes scripted preplanned curricula and, thus, the costs for curriculum materials are proportionately larger compared with Success for All or the School Development Model. The costs for all three models are summarized in Table 4.1. King’s (1994) cost analysis and Borman et al.’s (2003) meta-analysis of effect sizes permit calculations of lower- and upper-bound effectiveness–cost ratios (effect size divided by the annual cost per student) for Success for All (effect size = 0.18 SD) and the School Development Model (effect size = 0.05 SD). Effectiveness–cost ratios for Direct Instruction (effect size = 0.15 SD) may be calculated using Herman et al.’s (1999) cost figures and Borman et al.’s meta-analysis. The upper-bound effectiveness–cost ratios for these three models are, respectively, 0.000224, 0.000129, and 0.000290 (the policy implications of this chapter are based on the large differences in effectiveness–cost ratios between CSR and rapid assessment, not the relatively small differences in effectiveness–cost ratios across individual CSR models). In addition to the results above regarding the CSR models that Borman et al. (2003) categorized as having the Strongest Evidence of Effectiveness, it is useful to establish a tentative upper bound on the cost-effectiveness of the CSR models that Borman et al. categorized as having Highly Promising Evidence of Effectiveness, based on studies with comparison groups.3 The results of this analysis are tentative for three reasons: (a) Borman et al. (2003) indicated that the number and methodological quality of the research studies regarding those models was not sufficient to draw firm conclusions, (b) the cost data may not be reliable, and (c) Borman et al.’s effect-size estimates were not annualized and therefore are not directly comparable. As the number and methodological quality of CSR research studies increases and annualized effect-size estimates become available, it is probable that

$804.90

Total

$2,127.59

$1,468.30 $3.46c $176.15d N/Ae $472.12g $7.56

High

$386.26

$138.52 N/A $146.28d N/Ae $97.68g $3.78

Low

$1,368.00

$415.56 N/A $355.02d N/Ae $586.08g $11.34

High

$516.37

$124.10 N/A $161.33 $155.13 $75.09h $0.72

Low

$764.58

$372.31 N/A $161.33 $155.13 $75.09h $0.72

High

Note: All per-pupil estimates based on a school of 500 students, using August 2006 dollars. N/A = not applicable or not available. a Based on King (1994), Tables 2 and 4. Adjusted for inflation using the March 1994 price index (147.2) and the August 2006 price index (203.9). b Based on Herman et al. (1999). Adjusted for inflation using the January 1999 price index (164.3) and the August 2006 price index (203.9). c Amortized over the expected 6-year life of the CSR program. d Assumes that annual training costs in years 2 through 6 are half the costs in year 1, and extra initial costs are amortized over the 6-year life of the CSR program. e King (1994) focused solely on personnel costs, asserting that nonpersonnel costs are small. Thus, the total costs of Success for All and the School Development Model are underestimated by the amount of nonpersonnel costs. f Assuming annual teacher salary of $51,880 (U. S. Department of Education, 2005c) and an average of 184 work days (37 weeks) per year, the cost of teacher time is $1.76 per student per hour, adjusted for inflation. g King’s (1994) “partial” staffing was assumed to apply to half of the teaching staff. h Direct Instruction requires 5 days of teacher release time in year 1 in addition to the annual requirement of 4.5 days; the 5 days were amortized over the expected 6-year life of the CSR program. i Assuming annual principal salary of $75,857 (National Center for Education Statistics, 2007a) and a contract lasting 42 weeks per year, the cost of principal time is $0.09 per student per hour, adjusted for inflation.

$540.22 $3.46c $94.64d N/Ae $162.80g $3.78

Extra Staff Initial Fee Training Materials Teacher Timef Principal Timei

Low

Table 4.1  Annual Costs per Student to Implement Success For All, the School Development Model, and Direct Instruction Success for Alla School Developmenta Direct Instructionb

62    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

The Cost-Effectiveness of Comprehensive School Reform    63

the effect-size estimates for many of these models will decrease and the estimated costs will increase. Thus, the purpose of this analysis is limited to establishing an upper-bound estimate of the cost effectiveness of CSR in relation to the cost-effectiveness of rapid assessment, rather than establishing the relative cost-effectiveness among the various CSR models. The CSR model with the highest effect size in the group categorized as having Highly Promising Evidence of Effectiveness is Expeditionary Learning Outward Bound (effect size = 0.51 SD) (Borman et al., 2003). Based on a sample of three sites and information from the developer, the annual inflation-adjusted cost in years 1 and 2 is $100,522.82, including professional development, teacher release time, and materials, but excluding travel, stipends, and expedition costs (Herman, 1999). Annual costs decline 20 percent in year 3, 36 percent in year 4, and 48.8 percent in year 5 (Herman, 1999), and year 6 costs are assumed to equal year 5 costs. Travel, stipends, and expedition costs total $1,365.12 per teacher per year, or $34,128 for 25 teachers (Herman, 1999). Amortizing high initial costs over the expected 6-year life of a CSR program, the annual cost per student in a school with 500 students is $217.83, adjusted for inflation. The effectiveness–cost ratio is 0.002341. The CSR model with the second-highest effect size in this group is Roots and Wings (effect size = 0.35 SD) (Borman et al., 2003). Roots and Wings incorporates the elements of Success for All but includes additional components as well (MathWings and WorldLab). While a detailed cost analysis is not available for Roots and Wings, the estimate derived above from King’s (1994) data regarding Success for All suggests that a lower-bound estimate for the cost of Roots and Wings is a minimum of $804.90 per student per year. Thus, the effectiveness–cost ratio for Roots and Wings is a maximum of 0.000435. The effect size for the remaining CSR model in this group, Modern Red Schoolhouse, is 0.17 SD (Borman et al., 2003). Information from the developer suggests annual costs for materials ranging from $10,341 to $124,102, costs for teacher release time (13 days for each member of a teaching staff of 25 teachers) equal to $4,581.79, plus other costs averaging $86,872, adjusted for inflation (Herman, 1999). While the effect size is one third the size of the effect for Expeditionary Learning Outward Bound, the costs are not substantially different, implying a maximum effectiveness–cost ratio considerably smaller than 0.002341, the ratio for Expeditionary Learning Outward Bound. Table 4.2 summarizes the effectiveness–cost ratios for the six models described above. The highest effectiveness–cost ratio is 0.002341, for Expeditionary Learning Outward Bound. Therefore, the upper-bound effectiveness–cost ratio for the six CSR models with the strongest or highly promising

64    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement Table 4.2  Effectiveness–Cost Ratios for CSR Models with the Strongest or Highly Promising Evidence of Effectiveness

Success for All School Development Model Direct Instruction Expeditionary Learning Outward Bound Roots and Wings Modern Red Schoolhouse

Effect Size (SD)a

Costb

Effectiveness–Cost Ratioc

0.18 0.18 0.05 0.05 0.15 0.15 0.51

$804.90d $2127.59d $386.26d $1368.00d $516.37d $764.58d $217.83

0.000224 0.000085 0.000129 0.000036 0.000290 0.000196 0.002341

0.35 0.17

$804.90 see text

0.000435 less than 0.002341

Note: All cost figures in August 2006 dollars. From Borman et al. (2003). Note that Borman did not annualize his effect size estimates, i.e., the effect sizes in individual studies of CSR may have been achieved over multi-year periods. b Annual cost per student in dollars. c Effect size in SD units divided by annual cost per student in dollars. d From Table 4.1. a

evidence of effectiveness, according to Borman et al. (2003), is 0.002341. The lowest effectiveness–cost ratio is 0.000036, for King’s (1994) upperbound cost estimate and Borman et al.’s 0.05 SD effect-size estimate for the School Development Model. Perhaps the most widely recognized cost analysis of CSR was conducted by Odden (2000). Based on interviews with CSR developers, Odden estimated the additional ongoing staffing and training costs that would be needed to implement the major CSR models, beyond “core” costs equal to $1.36 million (adjusted for inflation), that each school would incur if it provided one teacher for every 25 students and one principal for every 500 students. Odden (2000) estimated that the additional annual cost per student for the major CSR models ranged between $290.01 and $900.57, adjusted for inflation.4 These models apparently included Roots and Wings, which is a more elaborate version of Success for All, as well as the Comer School Development Model, ATLAS, Expeditionary Learning-Outward Bound, Modern Red Schoolhouse, Co-NECT, and others. While Odden did not provide precise estimates linked to each model, his lower-bound cost figure provides a useful benchmark for comparing the cost data from Herman et al. (1999), which in many cases appear to underestimate the true costs of individual CSR models.

The Cost-Effectiveness of Comprehensive School Reform    65

Comparisons of Cost-Effectiveness While CSR models have multiple goals, a primary goal is to improve student achievement in math and reading. Thus, CSR may be compared with rapid assessment along this dimension in order to identify the most efficient approach for improving math and reading achievement. If rapid assessment is more efficient for this purpose, a separate evaluation may be conducted to determine the efficiency of CSR with regard to other outcomes that are not well-addressed through rapid assessment. A conservative approach is to examine the lowest effectiveness–cost ratios for Reading and Math Assessment in relation to the highest effectiveness– cost ratios for CSR, as well as two popular alternatives for raising student achievement: class size reduction and high-quality preschool (Table 4.3). The lowest effectiveness–cost ratio for Reading Assessment is 8 times larger than the highest effectiveness–cost ratio for the most cost-effective CSR model in Borman et al.’s (2003) grouping of models demonstrating Strongest Evidence or Highly Promising Evidence of effectiveness (Tables 4.2 and 4.3). The lowest effectiveness–cost ratio for Math Assessment is 7 times larger than the highest effectiveness–cost ratio for the most cost-effective CSR model in this group. The results suggest that Reading and Math Assessment are roughly an order of magnitude more effective per dollar compared with the most cost-effective CSR model in this group. For comparison purposes, Table 4.3 lists annualized effect sizes, the annual cost per pupil, and effectiveness–cost ratios for class size reduction, Perry preschool, and Abecedarian preschool. Perhaps the most positive assessment of class size reduction, by Finn et al. (2001), employed hierarchical linear modeling, controlled for an array of covariates, and isolated the impact of class size reduction according to duration of exposure. After 1, 2, and 3 years of exposure to class size reduction, the achievement of students randomly assigned to classrooms of 13–17 students was higher than the achievement of students randomly assigned to classrooms with 22–26 students, with annualized effect sizes equal to 0.120 SD in reading and 0.129 SD in math per year at the end of 2nd grade. The annual inflation-adjusted cost per student to reduce class size from 24 to a ceiling of 17 students per class is $1,379.28 (Reichardt, 2000), resulting in effectiveness–cost ratios of 0.000087 in reading and 0.000094 in math. The lowest effectiveness–cost ratio for Reading Assessment is 213 times the highest effectiveness–cost ratio for class size reduction. The lowest effectiveness–cost ratio for Math Assessment is 182 times the highest effectiveness–cost ratio for class size reduction. These results suggest that Reading and Math Assessment are approximately two orders of magnitude more cost-effective than class size reduction.

66    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement Table 4.3  Comparison of Effect Size, Cost, and Effectiveness–Cost Ratios Effectiveness–Cost Ratioc Effect Size (SD)a Reading Rapid Assessment   (high estimates) Rapid Assessment   (low estimates) Class Size Reduction   Nye et al. (2001)   Finn et al. (2001) Perry Preschool Abecedarian Preschool

Math

Costb

0.270d 0.392

e

0.175h 0.324i 0.104j 0.120l 0.150m 0.150o

0.090j 0.129l 0.155m 0.054o

Reading

$9.45f $18.89g $9.45f $18.89g

0.028571

$1,379.28k $1,379.28k $12,147.03n $10,188.09p

0.000075 0.000087 0.000012 0.000015

Math 0.020752

0.018519 0.017152 0.000065 0.000094 0.000013 0.000005

Annualized effect size. Annual cost per student in August 2006 dollars. c Effect size in SD units divided by annual cost per student. d From Ross et al. (2004) e From Ysseldyke and Tardrew (2007). f Reading. g Math. h From Nunnery et al. (2006). i From Ysseldyke & Bolt (2007). j Nye et al. (2001). Average of effect sizes in grades 1, 2, and 3. k From Reichardt (2000). Annual cost per student of reducing class size from 24 to a ceiling of 17 students per class, adjusted for inflation using the September 1997 price index (161.2) and the August 2006 price index (203.9). l Finn et al. (2001). In grade 2, the achievement advantage for students who participated in small classes for 1, 2, and 3 years was 0.12 SD, 0.24 SD, and 0.36 SD, respectively, in reading, or an average of 0.12 SD per year, and 0.16 SD, 0.24 SD, and 0.32 SD, respectively, in math, or an average of 0.129 SD per year. m From Schweinhart et al. (1993), Table 13, annualized over 2-year period. n From Barnett (1992), adjusted for inflation using the January 1985 price index (105.5) and the August 2006 price index (203.9). o From Ramey et al. (2000), Figure 3, annualized over 5-year period. p From Barnett & Masse (2007). Cost in a public school setting, minus the value of formal and informal childcare services provided to the control group, and adjusted for inflation using the January 2002 price index (177.1) and the August 2006 price index (203.9). Note that since preschool costs are incurred in years prior to the K–6 years when rapid assessment is typically implemented, preschool costs are underestimated relative to the cost estimate for rapid assessment (costs for rapid assessment should be discounted to the time period when preschool costs are incurred). a

b

The advantage of rapid assessment compared with preschool is even stronger than the advantage compared with class size reduction. The reported effect sizes for children participating in Perry preschool were 0.150 SD in reading and 0.155 SD in math (annualized over a 2-year implementation period) at the end of 2nd grade (Schweinhart, Barnes, & Weikart,

The Cost-Effectiveness of Comprehensive School Reform    67

1993), but the annualized cost was $12,147.03, resulting in small effectiveness–cost ratios of 0.000012 in reading and 0.000013 in math. Furthermore, it is not clear that the children participating in the treatment outperformed children in the control group with regard to student achievement. After correcting for family-wise error, only 2 of a total of 24 tests reached statistical significance, suggesting that the overwhelming majority of statistical tests indicated that there was no significant difference in the achievement of children who participated in Perry preschool compared with children who did not participate. Results for participants in Abecedarian preschool were roughly comparable to the results for participants in Perry preschool. Annualized effect sizes for 3rd-grade children were 0.150 in reading and 0.054 in math over a 5-year implementation period, at an annual cost of $10,188.09, adjusted for inflation and also the value of formal and informal daycare services provided to the control group, resulting in effectiveness–cost ratios of 0.000015 in reading and 0.000005 in math. The lowest effectiveness–cost ratio for Reading Assessment is 1,235 times the highest effectiveness–cost ratio for high-quality preschool. The lowest effectiveness–cost ratio for Math Assessment is 1,319 times the highest effectiveness–cost ratio for high-quality preschool. These results suggest that Reading and Math Assessment are approximately three orders of magnitude more cost-effective than high-quality preschool. While ratios of effectiveness–cost ratios are sensitive to small changes in denominators, the analysis suggests that Reading and Math Assessment are significantly more cost-effective than CSR, class size reduction, or highquality preschool. The difference in cost-effectiveness is approximately one order of magnitude compared with CSR, two orders of magnitude compared with class size reduction, and three orders of magnitude compared with high-quality preschool.

Sensitivity Analysis I As noted in chapter 2, a key assumption is that schools do not need to purchase additional computers and printers in order to implement rapid assessment. This assumption is based on research implying that virtually all classrooms will have access to at least one online computer by the year 2006 (Parsad & Jones, 2005), plus the ability of all online computers to print from a high capacity printer in a school’s media center (Reilly, no date), and was verified through interviews with teachers and observations of classrooms where rapid assessment was used. However, if each classroom

68    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

requires new equipment, an entire system, including a computer, monitor, keyboard, mouse, software, service plan, and a high capacity laser printer, may be purchased from Dell for $653. Assuming that a complete system is purchased for every classroom of 20 students and is amortized over a 7-year period, the annual cost per student is $4.66. Splitting this cost between reading and math raises the total cost of Reading Assessment from $9.45 to $11.78 per student. Using the lower-bound effect-size estimate of 0.175 SD, the effectiveness–cost ratio falls to 0.014856, but Reading Assessment remains a minimum of 6 times as cost-effective as CSR, 171 times as cost-effective as class size reduction, and 990 times as cost-effective as high-quality preschool. The total cost of Math Assessment rises from $18.89 to $21.22. Using the lower-bound 0.324 SD effect-size estimate, the effectiveness–cost ratio falls to 0.015269, but Math Assessment remains a minimum of 5 times as costeffective as CSR, 162 times as cost-effective as class size reduction, and 1,174 times as cost-effective as high-quality preschool.

Sensitivity Analysis II A second assumption noted in chapter 2 is that the rapid assessment software saves more teacher time (primarily time that would otherwise be spent grading math homework and assessing reading comprehension) than is consumed in noninstructional tasks such as supervising student use of the computer, scanner, and printer. However, if teachers do not save time and, instead, lose 15 minutes per day (or 1.25 hours per week), the annual cost is $81.06 per student, assuming 20 students per teacher and an annual inflation-adjusted teacher salary of $51,880 (U.S. Department of Education, 2005c). Splitting this cost between reading and math raises the total cost of Reading Assessment from $9.45 to $49.98 per student. Using the lower-bound 0.175 SD effect-size estimate, the effectiveness–cost ratio falls to 0.003501, but Reading Assessment remains a minimum of 1.5 times as cost-effective as CSR, 40 times as cost-effective as class size reduction, and 233 times as cost-effective as high-quality preschool. The total cost of Math Assessment rises from $18.89 to $59.42. Using the lower-bound 0.324 SD effect-size estimate, the effectiveness–cost ratio falls to 0.005453, but Math Assessment remains a minimum of 2.3 times as cost-effective as CSR, 58 times as cost-effective as class size reduction, and 419 times as cost-effective as high-quality preschool.

Discussion The results indicate that rapid assessment represents a much more cost-effective approach than CSR, class size reduction, or high-quality preschool.

The Cost-Effectiveness of Comprehensive School Reform    69

The true advantage for rapid assessment is likely to be substantially larger than indicated by the point-estimate ratios in Tables 4.2 and 4.3. Given the unreliability of CSR effect-size and cost estimates for models other than Success for All, the School Development Model, and Direct Instruction, the true upper bound for the CSR effectiveness–cost ratios is likely to be closer to the maximum ratio calculated for the three models for which there is reliable evidence. Rapid assessment is a minimum of 59 times as cost effective as Direct Instruction, the most cost-effective of the three models, suggesting that the true advantage of rapid assessment may be 59 times as large as the cost-effectiveness of CSR. Regardless, the cost-effectiveness analysis suggests that CSR is not an efficient approach for improving student achievement. This lack of efficiency may explain previous research findings suggesting that enthusiasm for CSR wanes over time and leads to teacher burnout, conflict, disengagement, and exhaustion (Little & Bartlett, 2002). This inefficiency may also indicate fundamental problems with CSR. A basic assumption underlying CSR is that comprehensive, whole-school reforms that “break-the-mold” to create radically different schools are necessary to achieve significant gains in student achievement. However, of the models for which there is strong evidence of effectiveness, the most cost-effective (Direct Instruction) is the most traditional, prescribing conventional teaching methods and reinforcing what Tyack and Cuban (1995) call the fundamental grammar of schooling. This suggests that the basic assumption underlying CSR may not be correct. A second assumption underlying CSR is that improvements in school culture and improved teaching are adequate to improve student achievement. CSR is not designed to directly influence student engagement, although several models stress the importance of establishing respect and trust and a positive school culture may be expected to foster positive attitudes. However, without deeper insight into factors that build intrinsic interest in academic achievement, it may be unrealistic to expect that CSR will significantly improve student engagement in academic work. Instead, CSR is typically designed to transform the school structure that supports teachers, providing a positive, collegial environment where teachers work together to solve instructional problems. In some cases, such as with Direct Instruction and Success for All, explicit guidance is provided regarding instructional strategies. In other cases, explicit guidance is provided regarding a focus on academics, such as with the Modern Red Schoolhouse. While a positive environment and good teaching may indirectly serve to engage students, none of the CSR models offer deep insight into factors that build intrinsic interest in academic achievement, and none of the models specify how schools may be structured to foster student engagement.

70    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

CSR is not designed to address low student engagement, yet lack of engagement is epidemic in the public schools. Data from the Education Longitudinal Survey (Ingels et al., 2005), which is a nationally representative survey of 10th graders, indicated that only 24 percent liked school a great deal—65 percent reported that they liked school “somewhat,” and 12 percent said they “did not like it at all”—suggesting that by their 10thgrade year, the vast majority of students were, at best, lukewarm about school. An even larger majority—81 percent—indicated that “the teaching is good”—suggesting that bad teaching is not an adequate explanation for low achievement and improved teaching is unlikely to produce dramatic improvements in student achievement. To the extent that the basic reason for low student achievement is low engagement, current CSR models are not adequate to address this challenge. What explains the enormous differences in the cost-effectiveness of rapid assessment in comparison with CSR and class size reduction? If the basic reason for low student achievement is low engagement in academic work, and if performance feedback serves to engage students, then an intervention such as rapid assessment may be more precise and effective than broad interventions such as CSR or class size reduction. The research reviewed in chapter 1 suggests that performance feedback engages students by reinforcing student self-efficacy. A student’s perceived control over his or her academic performance is strongly predictive of academic achievement (Brookover, Beady, Flood, Schweitzer, & Wisenbaker, 1979; Brookover et al., 1978; Coleman et al., 1966; Crandall, Katkovky, & Crandall, 1965; Kalechstein & Nowicki, 1997; Keith, Pottebaum, & Eberhart, 1986; Skinner et al., 1990; Teddlie & Stringfield, 1993). There is a feedback loop between performance and control beliefs, with high performance leading to subsequent perceptions of control, so that early achievement strongly influences later achievement and does so primarily by increasing students’ sense of personal control (Musher Eizenman, Nesselroade, & Schmitz, 2002; Ross & Broh, 2000; Skinner et al., 1990). Thus, when children believe that they can exert control over success in school, they perform better on cognitive tasks. And, when children succeed in school, they are more likely to view school performance as a controllable outcome (Skinner et al., 1990, p. 22). To the extent that rapid assessment enables teachers to provide individualized instruction, keeping each student in his or her own zone of proximal development, students are more likely to be successful and feel that they can control their performance. Feelings of control reinforce effort, which improves achievement, which reinforces feelings of control and engagement.

The Cost-Effectiveness of Comprehensive School Reform    71

CSR focuses on creating a school environment that supports teachers and effective teaching, yet neglects to provide a system of rapid assessment and individualized curricula where every book that is read and every set of math problems that is accurately completed is quickly acknowledged through objective feedback to students. Thus, students in CSR schools typically do not have the type of feedback system that research suggests is effective in promoting engagement (Robinson et al., 1989). This lack of feedback may explain why CSR is far less effective than rapid assessment. A similar analysis suggests why class size reduction is poorly suited to the task of engaging students in academic work: reducing class size is, at best, a weak strategy to improve performance feedback. Rapid assessment could easily be integrated into CSR. The research findings presented here suggest that this may be an effective way of improving CSR outcomes. On the other hand, rapid assessment could also be implemented as a stand-alone intervention and might improve outcomes a minimum of 7 times as efficiently as CSR. It seems likely that stand-alone implementation could be accomplished nationwide much more quickly than complex whole-school reforms such as CSR, and at a fraction of the cost. Similarly, high-quality preschool is an enormously complex, expensive intervention, requiring highly skilled early childhood educators. In contrast, rapid assessment is relatively simple to implement and could be implemented in existing classrooms, with existing teachers. The results of the studies in Memphis suggest that it can be effective with economically disadvantaged, minority students, while the results of the cost-effectiveness analysis comparing rapid assessment and high-quality preschool suggest that rapid assessment is dramatically more cost-effective. Funding for rapid assessment is likely to be a more productive use of scarce resources, compared with CSR, class size reduction, or high-quality preschool.

This page intentionally left blank.

5 The Cost-Effectiveness of Value-Added Teacher Assessment

T

here is a growing consensus among educational researchers that student achievement may be improved by using value-added assessments of teacher performance to identify, select, and retain high-performing teachers. However, this chapter highlights key evidence suggesting that valueadded teacher assessment is unlikely to be a panacea, focusing on a costeffectiveness analysis of Gordon, Kane and Staiger’s (2006) proposal to raise student achievement by identifying and replacing the bottom quartile of non-tenured teachers, using value-added assessment of teacher performance. This proposal is currently the only proposal that adequately addresses the fact that most teachers are tenured and, therefore, cannot be dismissed without adequate cause (Gordon et al., 2006). Section I reviews relevant literature, noting studies suggesting that value-added assessment would not survive legal challenges to its reliability and validity for high-stakes teacher evaluations and would fail the criterion of “adequate cause” for dismissal. Section II provides a cost-analysis of Gor-

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 73–84 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

73

74    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

don et al.’s (2006) proposal. Section III compares the cost-effectiveness of the proposal with the cost-effectiveness of alternative approaches for raising student achievement. Section IV summarizes the findings and discusses the implications of the cost-effectiveness analysis.

Background Econometric methods have been used to statistically isolate the “value-added” contribution of each teacher to student achievement (Hanushek, Kain, O’Brien, & Rivkin, 2005; Rivkin, Hanushek, & Kain, 2005; Rockoff, 2004). These studies suggest that student achievement may be improved by an entire grade-level if high-performing teachers are substituted for low-performing teachers (Hanushek, 1992). In contrast, teacher characteristics such as educational attainment or years of experience have weak relationships with student achievement (Goldhaber, 2002; Greenwald et al., 1996; Grissmer, Flanagan, Kawata, & Williamson, 2000; Hanushek, 1989, 1996; Hanushek et al., 2005; Hedges, Laine, & Greenwald, 1994; Nye, Konstantopoulos, & Hedges, 2004, p. 249; Wenglinsky, 2001; Wilson & Floden, 2003). Rivkin et al. (2005) advocate the use of performance information in “hiring, firing, mentoring, and promotion practices” to “improve the stock of teachers” (pp. 450–451). Value-added assessment has been implemented in Tennessee since 1992 and has now been mandated for use by all school districts in Pennsylvania and Ohio and several hundred school districts in 21 other states, including the Dallas and Seattle school districts (Center for Greater Philadelphia, 2004). However, the use of performance data for high-stakes purposes requires high reliability and validity at the individual teacher level. Recent studies using value-added statistical models suggest that the false classification rate may be significant (Aaronson, Barrow, & Sander, 2007; Ballou, 2005; Koedel & Betts, 2007; McCaffrey, Sass, & Lockwood, 2008), suggesting that value-added assessment would not survive legal challenges to its reliability and validity for high-stakes teacher evaluations.1 In addition to the reliability and validity problems associated with valueadded performance assessment, it may not be cost-effective to replace lowperforming teachers with high-performing counterparts. This approach is necessarily limited to probationary teachers, because tenured teachers are not subject to contract renewal. New teachers are not tenured and typically undergo a 3-year probationary period during which building principals may decide not to renew the teachers’ contracts. Thus, Robert Gordon, Senior Vice-President at the Center for American Progress, and economists Thomas Kane at Harvard and Douglas Staiger at Dartmouth (2006) propose to improve the quality of the teaching force by identifying and replac-

The Cost-Effectiveness of Value-Added Teacher Assessment    75

ing the bottom quartile of novice teachers at the end of their second year of teaching, equal to 68,299 (about 2%) of the 3,214,900 elementary and secondary public school teachers in 2004 (Marvel et al., 2007), using valueadded assessments of teacher performance. Their proposal is one of the featured proposals by the Hamilton Project at the Brookings Institution and, thus, has attracted attention as perhaps the most promising strategy for implementing a value-added teacher assessment system. The significance of Gordon et al.’s (2006) proposal is that it converts the general strategy of value-added teacher assessment into a concrete proposal that can be evaluated. If value-added teacher assessment is a promising strategy for raising student achievement, and if Gordon et al.’s (2006) proposal is currently the only proposal that adequately addresses the fact that most teachers are tenured and, therefore, cannot be replaced without adequate cause, then an evaluation of this proposal represents a critical experiment. If it can be shown that Gordon et al.’s (2006) proposal is not cost-effective, relative to alternative policies, then it is likely that the valueadded teacher assessment strategy is not cost-effective. The proposal comes at a time when teacher demand is projected to exceed supply by 35 percent over the next two decades (Gordon et al., 2006). Gordon et al. (2006) point out that the current output of teaching colleges will be inadequate to satisfy this demand and advocate that their proposal be implemented through recruitment of individuals currently employed elsewhere. This proposal would create substantial costs to society of transitioning individuals who are currently working in other occupations into the teaching profession.

Costs of Teacher Replacement Individuals who have accumulated substantial work experience in occupations other than teaching may enter teaching through alternative teacher certification programs. Alternative certification typically allows these individuals to enter teaching relatively quickly, under the supervision of mentors and while electing courses in curriculum and instructional methods. Expansion of alternative certification programs to accommodate the additional demand for new teachers created by Gordon et al.’s (2006) proposal would generate substantial costs. Since 83 percent of individuals enrolled in alternative certification programs were previously employed in the year prior to certification (Feistritzer, 2005), the costs to society of Gordon et al.’s (2006) proposal would include the costs created for an employer when an individual leaves a job in order to enter teaching and the employer must hire a replacement. If that replacement worker is, in turn, drawn from the

76    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

ranks of currently employed workers, a second employer must hire a replacement. If the second employer hires from the ranks of the currently employed, yet a third employer must hire a replacement. This chain of events continues until the last employer hires from the ranks of the notemployed (persons who are unemployed or not-in-the-labor force).2 Calculation of the chain of replacement costs requires modeling of the chain of job-hopping. The chain ends when a job-hopper’s employer fills the position with an individual who is not employed. The probability that an individual who enters teaching through alternative certification was working in the year before he or she entered teaching is .83, based on data from a national survey (Feistritzer, 2005). The rest (17%) were retired, unemployed, homemakers, students, or other individuals not in the labor force. Thus, the probability that a new teacher entering through alternative certification will create replacement costs for a previous employer is .83. The probability that the replacement (a college-educated individual who would ultimately be eligible for a teaching position) is drawn from currently employed workers is .50 (Fallick & Fleischman, 2004). Therefore, the cost analysis assumes a .50 probability that each subsequent worker replacement is drawn from currently employed workers. The total cost to society of the chain of job-hopping triggered by hiring one teacher through alternative certification is the sum of the expected (probability adjusted) costs of all worker replacements. The turnover-related replacement cost multiplier is 1.567, suggesting that the expected (probability adjusted) cost of replacing all of the workers who job-hop as a consequence of hiring one teacher through alternative certification is a multiple of 1.567 of the present cost to one employer of replacing one worker.3 The turnoverrelated replacement cost multiplier is used to derive the costs in Table 5.1, Section I (costs to sending employers) and Section II (costs to replacement employees). Section I costs total $33,016.14 ($21,069.65 before the multiplier is applied). Section II costs total $12,434.34 ($7,935.12 before the multiplier). Table 5.1 also includes costs to the hiring school district (Section III), the new teacher (Section IV), the terminated teacher (Section V), and other costs to society (Section VI), as well as the gain in output (a benefit) of the terminated teacher once he or she has been retrained and has started a new job (Section VII), psychic gains to the new teacher (Section VIII.A), psychic gains to replacement employees (Section VIII.B), and the cost of alternative certification (Section IX).4 All costs are adjusted to August 2006 dollars using the consumer price index. The average cash cost of alternative certification is $3,959.46 per teacher (Feistritzer, 2006), or an annual cost of $434.63 per year, assuming an

The Cost-Effectiveness of Value-Added Teacher Assessment    77 Table 5.1  Cost to Society of Replacing One Teacher I. Replacement Costs to Sending Employers A. Lost Output B. Hiring C. Orientation and training D. Reduced output E. Administrative costs for quitting workers II. Costs to Replacement Employees A. Job search (cash costs) B. Time devoted to job search C. Relocation III. Costs to Hiring School District A. Administrative costs of terminated teacher B. Hiring of new teacher C. Orientation and training costs of new teacher IV. Costs to New Teacher A. Job search (cash costs) B. Time devoted to job search C. Relocation V. Costs to Terminated Teacher A. Job search (cash costs) B. Time devoted to job search C. Retraining D. Relocation to new job E. Lost output (income) VI. Additional Costs to Society A. Reduced Output of Terminated Teacher B. Foregone Output of Replacement Teacher VII. Gain of Terminated Teacher’s New Output VIII. Psychic Gains A. To New Teacher B. To Replacement Employees IX. Costs of Alternative Teacher Certification Total

2006 Dollars $18,307.41 $1,187.28 $2,347.90 $10,455.27 $718.28 $1,105.83 $7,122.48 $4,206.03 $458.38 $757.68 $1,498.34 $705.70 $4,545.30 $2,684.13 $705.70 $5,189.24 $2,890.23 $2,684.13 $41,586.89 $6,672.16 $589,046.96 ($479,923.19) ($59,381.48) ($12,434.34) $434.63 $153,570.94

Note: See Appendix A for notes to Table 5.1.

average teaching career of 9.11 years (Yeh, 2010a). These costs—primarily tuition and expenses in states that require candidates to complete additional coursework—are underestimated because they do not include the opportunity costs to the candidates (the value of the time required to complete the coursework).

78    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Total Cost to Society of Gordon et al.’s (2006) Proposal The total cost to society of Gordon et al.’s (2006) proposal is composed of two distinct sets of costs. The first set of costs is limited to the subset of teachers who are replaced with teachers hired through alternative certification. Based on the number of teachers who left the profession the prior year, approximately 269,600 new public school teachers were hired into the profession in 2005 (Marvel et al., 2007). Data regarding survival rates suggest that 76 percent (204,896) survived through their second year (Quartz et al., 2004). Gordon et al. (2006) propose to boost hiring by 33.33 percent, expanding the number of 2nd-year survivors to 273,195, permitting the bottom-quartile (68,299) to be fired. At a cost of $153,570.94 per teacher (Table 5.1), this amounts to a total of $10,488,741,630. The annual cost is $213.56 per student, based on 2005 public school PK–12 enrollment of 49,113,000 students (National Center for Education Statistics, 2007). Note that the per-student cost is an average over all students, not the smaller group of students associated with teachers who are replaced. The second set of costs is associated with all students and, therefore, is calculated separately from the figures in Table 5.1: the cost to raise salaries for all teachers that is required to replace the bottom quartile of novice teachers, plus the additional cost of implementing a value-added assessment system, which may be estimated from the costs of implementing Tennessee’s Value-Added Assessment System (TVAAS). The cost of administering and scoring the assessments was $4.80 per student, and the cost of the TVAAS reports added $0.80 per student, for a total annual cost of $5.60 (Bratton, Horn, & Wright, n.d.). The total annual cost of implementing Gordon et al.’s (2006) proposal to use value-added assessment systems to identify and replace the bottom quartile of novice teachers each year is the cost to society of transitioning individuals who are currently working in other occupations into the teaching profession ($213.56), plus the cost to raise salaries for all teachers that is necessary to replace the bottom quartile of novice teachers ($405.56), plus the cost of the assessments ($5.60), or a total of $624.72 per student.

A Less Expensive Version A less expensive version of Gordon et al.’s (2006) proposal is to replace bottom quartile novice teachers by increasing the production of individuals who become certified to teach after enrolling for 1 year of college beyond a bachelor’s degree, rather than by drawing upon currently employed individuals. Replacing poorly performing novice teachers with 5th-year college

The Cost-Effectiveness of Value-Added Teacher Assessment    79

graduates avoids the costs associated with transitioning currently employed workers into teaching jobs, but substitutes the costs of an additional year of college and lost wages. These are new costs because it would be necessary to greatly expand current production of teacher college graduates in order to satisfy the demand created by Gordon et al.’s (2006) proposal. These costs are lower than the costs of drawing upon the ranks of currently employed workers but remain substantial.5 The net cost of replacing a bottom quartile teacher with a teacher hired through a 5th-year teacher education program is the cost to educate the new teacher ($78,952.94), plus the opportunity cost of employing the new teacher ($403,299.15), minus the gain of the terminated teacher’s output in a new occupation ($479,923.19), plus the costs to the hiring school district ($2,714.40), job search and relocation costs for the new teacher ($7,935.13), the reduced output of the terminated teacher while learning a new job ($6,672.16), and the costs to the terminated teacher ($53,056.19), or $72,706.78 (see Table 5.1, sections III, IV, V, VIA, and VII). The total cost of hiring 68,299 new teachers each year is $4,965,800,367, or an annual cost of $101.11 per student, based on 2005 public school PK– 12 enrollment of 49,113,000 students (National Center for Education Statistics, 2007i). The total annual cost of implementing this modified version of Gordon et al.’s (2006) proposal is the cost of the assessments ($5.60) plus the costs to society of hiring a teacher through a 5th-year teacher education program ($101.11), plus the cost to raise salaries sufficiently to replace the bottom quartile of novice teachers ($405.56), or a total of $512.27 per student. This figure is underestimated to the extent that fired teachers incur psychic losses, and to the extent that reduced supply of teachers as a consequence of the increased risk to novice teachers that is implied by Gordon et al.’s (2006) proposal would drive teacher salaries upward, raising the cost of hiring new teachers as well as the cost of employing existing teachers.

Cost-Effectiveness of Gordon et al.’s (2006) Proposal Gordon et al. (2006) controlled for student characteristics, compared classrooms within schools, and estimated that student achievement would improve by 1.2 percentile points per year, or 0.057 SD, as a consequence of their proposal, based on the performance of roughly 150,000 3rd-, 4th-, and 5th-grade Los Angeles students in 9,400 classrooms each year from 2000 through 2003 (correcting for errors in their calculations boosts the impact estimate to 0.059 SD). The policy would increase the average impact of retained teachers by about 1.5 percentile points but would also require an increase in the hiring of novice teachers in order to maintain class size.

80    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

The evidence suggests that the average “value-added” of novices is about 4 percentile points lower than teachers with 2 years of experience. After subtracting the loss in performance as a consequence of hiring additional novice teachers, the net increase in student achievement would be 1.2 percentile points per year. Gordon et al. (2006) assume that bottom quartile teachers are reliably weeded out during the typical 3-year probationary period when novice teachers must reapply for their jobs every year. Furthermore, Gordon et al. (2006) assume that performance is stable and teachers in the 2nd, 3rd, and 4th quartiles maintain their advantage over bottom quartile teachers after this weeding process is completed. Research (see Endnote 1) suggests that these assumptions are unwarranted. Gordon et al. (2006) propose to address the poor reliability of value-added measures by averaging each teacher’s performance over his or her first 2 years. However, existing studies suggest that value-added models of teacher performance are incorrectly specified, a problem that cannot be addressed through averaging. Thus, the Gordon et al. (2006) estimate is probably an upper-bound estimate of the effectiveness of their proposal. For the purpose of the cost-effectiveness analysis, it is assumed that Gordon et al.’s (2006) proposal is implemented on an ongoing basis and, thus, benefits and costs accrue annually. The standard measure, for comparison purposes, is the effect size per year divided by the annual cost per student per year. If the effect size of Gordon et al.’s (2006) proposal is 0.057 SD per year and the annual cost is $624.72 per student, the effectiveness-cost ratio is 0.000091 standard deviations per dollar. The ratio is quite low, reflecting the low effectiveness and high cost of this approach for raising student achievement. If the cost using the less expensive version of Gordon et al.’s approach is $512.27, the effectiveness-cost ratio rises to 0.000111. For comparison, Table 5.2 lists the annualized effect size, annual cost per student, and effectiveness-cost ratio for voucher programs, charter schools, a 10 percent increase in per-pupil expenditure, increased educational accountability (defined as the implementation of high school exit exams), comprehensive school reform, class size reduction, high-quality preschool, and rapid assessment, where student performance in math and reading is rapidly assessed 2–5 times per week.

Conclusion The purpose of cost-effectiveness analysis is to compare alternative uses of social resources in order to determine which approach is most cost-effective. The most cost-effective approach for raising student achievement is rapid assessment, where students are tested in math and reading between 2

The Cost-Effectiveness of Value-Added Teacher Assessment    81 Table 5.2  Comparison of Effect Sizes, Costs, and Effectiveness–Cost Ratios for Various Interventions to Raise Student Achievement Effectiveness–Cost Ratioc

Effect Size (SD)a Gordon et al.’s (2006) proposal Cheaper version (5th-year grads) Voucher programs Charter schools 10 percent increase in spending Increased accountability Comprehensive school reform Class size reduction   Nye et al. (2001)   Finn et al. (2001) Perry Preschool Abecedarian Preschool Rapid Assessment   (high estimates) Rapid Assessment   (low estimates)

Reading

Math

Costb

0.057

0.057

$624.72 0.000091 0.000091

0.057

0.057

$512.27 0.000111 0.000111

0.032 0.009 0.083

0.080 0.001 0.083

$9,646.00 0.000003 0.000008 $8,086.30 0.000001 0.000000 $1,118.83 0.000075 0.000075

0.050 0.510

0.050 0.510

$2,025.97 0.000025 0.000025 $217.83 0.002341 0.002341

0.104 0.120 0.150 0.150 0.270

0.090 0.129 0.155 0.054 0.392

0.175 0.324

$1,379.28 $1,379.28 $12,147.03 $10,188.09 $9.45 $18.89 $9.45 $18.89

Reading

0.000075 0.000087 0.000012 0.000015 0.028571

Math

0.000065 0.000094 0.000013 0.000005 0.020752

0.018519 0.017152

Notes: a Annualized effect size. b Annual cost per student, adjusted for inflation to August 2006 dollars. c Effect size in SD units divided by annual cost per student.

to 5 times weekly. Rapid assessment is of particular interest based on prior cost-effectiveness analyses suggesting that it is much more cost-effective than voucher programs, charter schools, a 10 percent increase in per pupil expenditure, increased educational accountability, comprehensive school reform, class size reduction, or high-quality preschool (Yeh, 2007, 2008). Rapid assessment is approximately one magnitude (10 times) more costeffective than comprehensive school reform, two magnitudes more costeffective than class size reduction or a 10 percent increase in per pupil expenditure or Gordon et al.’s (2006) proposal or increased educational accountability, three magnitudes more cost-effective than voucher programs or high-quality preschool, and four magnitudes more cost-effective than charter schools (a magnitude of gain implies that student achievement

82    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

would increase 10 times faster for every dollar invested in rapid assessment rather than the alternative). Comparisons of effectiveness-cost ratios suggest that rapid assessment is potentially a more promising approach for improving student achievement than replacing the bottom quartile of novice teachers. This is consistent with Hattie and Timperley’s (2007) review of research suggesting that performance feedback is more effective than an array of alternative interventions. In concrete terms, the research findings suggest that improving student achievement by an average of 0.319 SD in math and reading (or approximately 3 months of learning) for every American student would take less than 9 months and cost $1.4 billion using rapid assessment ($28.34 multiplied by 2005 public school PK–12 enrollment of 49,113,000 students [National Center for Education Statistics, 2007]). To achieve the same effect size through Gordon et al.’s (2006) proposal would take 6 years and cost $184 billion ($624.72 multiplied by 49,113,000 students and 6 years). These calculations suggest the magnitude of the differences in effectiveness and cost. The significance of these results is the corollary: it is unlikely that the general value-added teacher assessment strategy is cost-effective relative to rapid assessment. To the extent that Gordon et al.’s (2006) proposal is the only specific value-added teacher assessment proposal that adequately addresses the fact that most teachers are tenured and, therefore, cannot be replaced without adequate cause, then the evaluation of this proposal represents a critical test of the hypothesis that the general value-added assessment strategy is a promising strategy for raising student achievement. The results show that Gordon et al.’s (2006) proposal is not cost-effective, relative to rapid assessment. Thus, it seems unlikely that the general valueadded teacher-assessment strategy will ever prove to be cost-effective. Attempts to improve student achievement by replacing low-performing teachers with high-performing counterparts are hobbled in two ways. First, only the small fraction of probationary teachers may be replaced. Second, the strategy of replacing low-performing teachers ignores the lesson of history, which is that significant improvements in productivity in any field almost always result from fundamental advances in organization and technological processes that permit all workers to be more productive (van Ark, Kuipers, & Kuper, 2000). Rapid assessment is an example of a technological advance that permits both high- and low-performing teachers to be more effective. Technological advances offer the possibility that the entire distribution of teacher performance may be shifted. In contrast, attempts to replace “bad apples” with “good apples” presume that major improvements in productivity may be obtained with existing process technology. The find-

The Cost-Effectiveness of Value-Added Teacher Assessment    83

ing that rapid assessment is much more cost-effective than Gordon et al.’s (2006) proposal suggests a need to revisit that assumption and to consider advances that shift the entire distribution of teacher performance. A number of objections might be raised regarding our analysis. We focused narrowly on the cost implications of replacing the bottom part of the distribution of teachers by moving workers out of non-teaching jobs and into teaching jobs. One objection is that this is an unorthodox approach that could be applied equally to class size reduction, which also requires the hiring of more teachers. This argument suggests that previous calculations of the costs of class size reduction have been underestimated, which would reduce the estimate of the cost-effectiveness of class size reduction in Table 5.2. However, this does not change the conclusion that neither approach is cost-effective relative to alternatives, e.g., rapid assessment. A second objection is that the multiplier effects from hiring individuals who were previously employed in another job are not relevant from the viewpoint of the hiring district, since the cost to the district is simply the wage paid to recruit a new teacher, and the district does not pay any of the multiplicative costs in secondary labor markets. From the individual district’s viewpoint, while more workers would “churn” through teaching, at any given point the payroll is the same and, by that measure, the education sector is not using any more human resources. However, the purpose of cost-effectiveness analysis is to identify the approach that offers the most productive use of all social resources, regardless of how and where those costs are incurred. In the same way, society is concerned with the environmental costs imposed by automobiles, regardless of the viewpoint of individual automakers who are mainly concerned with their own costs and revenues and do not care about the costs of environmental pollution. A third objection is that we have neglected other important costs of Gordon et al.’s (2006) proposal: the extra pay that teachers will require if they are now being hired into a higher-risk profession, the monitoring and human resource costs of making sure that teachers are fired if they are low quality, the legal costs from challenges to unwarranted firings, and the costs to individual schools in lost school-specific human capital. We have not discussed these costs in any detail. Nor have we discussed the possibility that poorly performing teachers will presumably exert less effort once they know that they are going to be fired. All of these are legitimate concerns, but the costs are difficult to quantify, and including these costs would simply make the comparison with the alternative approaches even more unfavorable. Throughout our analysis, we have attempted to give Gordon et al. (2006) the benefit of the doubt whenever cost information was lacking.

84    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

In conclusion, the available evidence suggests that value-added teacher assessment is an unreliable, invalid measure of teacher quality, and the use of this approach is unlikely to ever be cost-effective as a means of raising student achievement. The current move to adopt value-added teacher evaluation approaches for high-stakes decisions regarding teacher hiring, promotion, and retention is premature and should be reconsidered.

6 The Cost-Effectiveness of NBPTS Teacher Certification

T

his chapter analyzes the cost-effectiveness of certification by the National Board for Professional Teaching Standards (NBPTS). NBPTS certification is believed by many educators to be a promising approach for identifying high-quality teachers. It is different from Gordon et al’s (2006) proposal to use value-added assessment to identify high-quality teachers, but rests on the same premise that teacher quality is the most important factor predicting student achievement (Ferguson, 1998b; Goldhaber, 2002; Goldhaber, Brewer, & Anderson, 1999; Hanushek, Kain, & Rivkin, 1999; Wright, Horn, & Sanders, 1997). Hanushek’s (1992) study is often cited to support this premise. He found that a high-quality teacher can increase learning by an entire gradelevel equivalent above the amount contributed by a low-quality teacher. This conclusion is currently unchallenged by other educational researchers. Thus, improvements in teacher quality are believed to be a promising approach for raising student achievement. However, unless there is a way

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 85–98 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

85

86    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

for building principals to identify high-quality teachers, it is difficult to select those teachers in order to improve student achievement. Research regarding identifying factors is hampered by the difficulty of disentangling those factors from the contribution of other factors influencing student achievement. This difficulty is best addressed through random assignment of teachers and students to classrooms. Thus, it is significant that the only study of teacher quality using random assignment concluded that the main observable characteristics of teachers—years of experience and level of education—are essentially uncorrelated with gains in student achievement: “Neither teacher experience, nor teacher education explained much variance in teacher effects (never more than 5%)” (Nye et al., 2004, p. 249). A previous review of the research concluded: While few would disagree with the claim that better teachers would improve the quality of education and student achievement, the task of identifying and hiring effective teachers is not simple. For example, one study found that only 3% of the contribution teachers made to student learning was associated with teacher experience, degrees attained, and other readily observable characteristics (Goldhaber, 2002). In other words, years of experience and graduate degrees do not have a strong impact on student achievement. Students whose teachers have master’s degrees do not outperform students whose teachers have only a bachelor’s degree (Grissmer et al., 2000; Hanushek et al., 2005; Wenglinsky, 2001). Meta-analyses and reviews of research suggest that there is no clear relationship (Greenwald et al., 1996; Hanushek, 1989; Wilson & Floden, 2003) or close to zero relationship (Hanushek, 1996; Hanushek et al., 2005; Hedges et al., 1994) between teachers’ level of educational attainment and student achievement. The relationship between teacher experience and student achievement is also inconsistent (Wilson & Floden, 2003). First- and second-year teachers are less effective, on average, than more experienced teachers, but after this period, experience has little effect (Grissmer et al., 2000; Hanushek et al., 2005; Wenglinsky, 2001). Therefore, if readily observable characteristics such as graduate credentials and years of experience are weakly related to teacher quality, it may be extremely difficult for a school principal to identify and hire good teachers. (Yeh, 2006b, p. 147)

If readily observable characteristics are inadequate for the purpose of identifying strong teachers, perhaps a sophisticated certification exam, in combination with portfolio evidence including video recordings of teacherstudent interactions, can discriminate effectively. This approach may be more sensitive to the attributes that actually make teachers successful in the classroom.

The Cost-Effectiveness of NBPTS Teacher Certification    87

NBPTS is an independent organization established in 1987 with the goal of advancing the quality of teaching and learning (National Board for Professional Teaching Standards, 2006a). NBPTS developed professional standards for teaching, then contracted with the Educational Testing Service (and, later, Pearson Educational Measurement) to create a voluntary system to certify teachers who meet those standards (Educational Testing Service, 2004; National Board for Professional Teaching Standards, 2006a). Perhaps the clearest indication that a cost-effectiveness analysis of the NBPTS program is needed is that in January 2004 the U.S. Congress directed the National Research Council (NRC), which is the research arm of the National Academy of Sciences, to conduct an evaluation, including a cost-effectiveness analysis. Subsequently, the NRC wrote an evaluation report (Hakel, Koenig, & Elliott, 2008), but found that insufficient information was available to conduct the cost-effectiveness analysis. This chapter addresses the request by the U.S. Congress for a cost-effectiveness analysis of the NBPTS program using cost information that was not available to the NRC. The chair of the NRC committee that wrote the report concluded that “Earning NBPTS certification is a useful ‘signal’ that a teacher is effective in the classroom,” where classroom effectiveness is defined in terms of the level of student achievement that is associated with specific teachers (National Academies, 2008). While NBPTS seeks to achieve outcomes in addition to improvements in student achievement, as measured through standardized tests, the report noted that little research is available regarding the effects of NBPTS on student motivation, breadth of achievement, attendance and rates of promotion, or possible spillover effects on the teaching practices of NBPTS teacher colleagues, school and district educational standards, and the quality of school- and district-level teacher professional development (Hakel et al., 2008). Findings regarding the effects of NBPTS as a form of professional development are mixed (Hakel, et al., 2008). Finally, research regarding the effects of NBPTS certification on teacher mobility and career paths is not rigorous enough to permit causal inferences (Hakel, et al., 2008). Thus, there is insufficient evidence to evaluate the effect of NBPTS on the field of teaching and the education system (Hakel et al., 2008). While the available evidence suggests that NBPTS certification may provide a means of identifying highly skilled teachers, this evidence does not provide sufficient information about the effect of the NBPTS certification process as a form of professional development, or the effect of the program on teacher recruitment and retention (Hakel et al., 2008). At present, there is insufficient evidence to suggest that NBPTS certification has broad effects beyond serving as a signal of teacher effectiveness according to standardized measures of student achievement. Thus,

88    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

it is appropriate to focus on the cost-effectiveness of the NBPTS certification system with regard to the impact on student achievement if uncertified teachers are replaced with certified teachers. I applied standard costeffectiveness techniques to evaluate the relative cost-effectiveness of NBPTS certification in comparison with the cost-effectiveness of a range of alternative approaches for raising student achievement, including two alternative methods of identifying effective teachers (Yeh, 2009b; Yeh & Ritter, 2009). The ultimate rationale for NBPTS certification is that teachers who are certified produce higher levels of student achievement; therefore, studies that judge NBPTS certification based on that metric are appropriate. The theory of action underlying NBPTS certification is that it is possible to improve achievement by replacing weak teachers with strong teachers. The quality of teaching is an intermediate, rather than a final goal. Society cares about the quality of teaching to the extent that it improves student outcomes. While a critic might argue that standardized measures of student achievement are imperfect and only capture one of many outcomes that result from NBPTS certification programs, standardized measures of achievement are the only widely used measures of student achievement that permit comparisons across a wide range of approaches for raising student achievement. In practice, comparative cost-effectiveness analyses can only be conducted using this type of measure. Thus, to reject standardized measures of student achievement is to reject the application of cost-effectiveness analysis to the field of education. It is important to complete a rigorous cost-effectiveness analysis because the results may contradict popular beliefs—among researchers as well as policymakers. As Levin, Glass, and Meister (1987) demonstrated, interventions that are resource-intensive, such as tutoring, may in fact be more cost-effective than popular alternatives such as class size reduction. Thus, it is essential to conduct and publish rigorous cost-effectiveness studies, even when “back of the envelope” calculations suggest that a particular intervention may not be cost-effective. While a quick glance at recent evaluations of the NBPTS certification program may suggest that it is unlikely to be cost-effective, it is vital to conduct a thorough cost-effectiveness study, because many educational researchers and policymakers may assume, in the absence of a rigorous study, that teacher certification is a promising, effective, and cost-effective approach for raising student achievement. In the absence of a study that addresses this question directly, scarce public resources may be misallocated, the achievement of disadvantaged students may remain depressed, and schools that are not making “adequate yearly progress” might continue to be sanctioned under the No Child Left Behind Act of 2001.

The Cost-Effectiveness of NBPTS Teacher Certification    89

Costs of NBPTS Certification NBPTS certification is a lengthy, highly demanding process. Applicants for certification are required to submit a portfolio to NBPTS involving four entries (National Board for Professional Teaching Standards, 2006b). Three are classroom based, where video recordings of teacher-student interaction and examples of student work serve as supporting documentation. A fourth entry relates to the candidate’s accomplishments outside of the classroom— with families, the community, or colleagues—and how they impact student learning. Each entry requires some direct evidence of teaching or school counseling as well as a commentary describing, analyzing, and reflecting on this evidence. Following submission of the portfolio, candidates are tested on their content knowledge through six 30-minute exercises, specific to the candidate’s chosen certificate area, at one of 300 NBPTS computer-based testing centers across the United States. Applicants are scored on a scale of 75 to 425, incorporating both the portfolio and the assessment center exercises, and they must earn a score of at least 275 to achieve certification (Goldhaber, Perry, & Anthony, 2003). Rice and Hall (2008) provide the best available estimate of the full social cost of the NBPTS certification program, with detailed estimates of costs for all resources devoted to program administration and infrastructure, information and recruitment, group meetings, portfolio development, the NBC application fee, mentor training, research, development, and dissemination, averaging $25,665.37 (adjusted to 2006 dollars) per participant per year. Importantly, Rice and Hall (2008) estimated the large opportunity costs of the time teachers devote to the development of their portfolios, as well as the uncompensated time of staff, including teacher mentors, librarians, childcare providers, and administrators who provide support. These opportunity costs constitute the bulk of the social cost of the NBPTS certification program. However, while Rice and Hall (2008) assumed that the certification process averages 1 year, it actually takes 2 years (NBPTS, 2010, pp. 10–11). Since Rice and Hall’s (2008) estimate of the program’s cost per graduate is based on calculations where total annual costs for all candidates are divided by the annual number of participants, their headcount, using the total number of (first- and second-year) participants in any given year, shrinks the cost per graduate by half (see p. 348, Footnote 12). Correcting for this underestimate suggests a cost per graduate of $51,330.74. Amortized over an average teaching career of 7.86 years as an NBPTS-certified teacher, and averaged over 20 students per classroom, the annual cost of NBPTS certification per student is $326.53.1 The cost is underestimated to the extent

90    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

that stricter certification requirements deter high-quality applicants from teaching in the public schools, raising barriers to entry that increase labor costs (Angrist & Guryan, 2004). A limitation of the NBPTS certification program is that a substantial percentage of all teachers have taught for less than 5 years and, thus, would not have reached a point where they could have accumulated the minimum 3 years of teaching experience required for NBPTS certification (National Board for Professional Teaching Standards, 2006c), nor finished a certification process that takes an average of 2 years beyond the 3-year probationary period (NBPTS, 2010, pp. 10–11). Thus, any conceivable benefits of NBPTS certification can never be realized for the large proportion of teachers in the workforce with 5 years or less of teaching experience. Nationally, 20 percent of 4th-grade teachers, 23 percent of 8th-grade math teachers, and 22 percent of 8th-grade reading teachers have less than 5 years of teaching experience (Stancavage et al., 2006). Averaging the 8thgrade figures and weighting the 4th- and 8th-grade percentages by total 4th-grade (3,611,638) and 8th-grade (3,824,670) enrollment (U.S. Department of Education, 2005b) suggests that, overall, about 21.3 percent of all teachers have less than 5 years of teaching experience. Thus, (hypothetical) federal requirements that all eligible teachers apply for NBPTS certification, and all teachers who fail the NBPTS exam be replaced with NBPTScertified teachers, would only benefit the 78.7 percent of students who are taught by teachers with 5 or more years of teaching experience. Furthermore, if all eligible teachers apply for NBPTS certification and teachers who fail the exam are replaced with teachers who are certified, the process of teacher replacement would take a minimum of 8 years, including 6 years to replace teachers (who fail the NBPTS exam) through normal teacher attrition, in addition to the 2 years required for teachers to complete the application and examination process.2

Effects of NBPTS Certification Several studies have found that teachers who eventually receive NBPTS certification were more effective than other teachers before they began the certification process, but something about the certification process reduces their productivity, and their performance never returns to pre-application levels (Clotfelter, Ladd, & Vigdor, 2007b; Goldhaber & Anthony, 2007; Harris & Sass, 2007). NBPTS certification might identify teachers who are initially more effective than uncertified teachers, but that information might be gained at the cost of reduced teacher performance. Thus, it is important to distinguish the signaling effect, measured by the deviation of the per-

The Cost-Effectiveness of NBPTS Teacher Certification    91

formance of certified teachers compared with the performance of nevercertified teachers at a point after certification, in contrast to the effect of the certification process on the human capital of certified teachers, measured from pre- to post-certification. Seven large-scale studies offer the necessary power to detect effects, if they exist, and either controlled for student or school fixed effects or used hierarchical linear modeling (HLM) (Cavalluzzo, 2004; Clotfelter et al., 2006, 2007b; Goldhaber & Anthony, 2007; Harris & Sass, 2007; Ladd, Sass, & Harris, 2007; Sanders, Ashton, & Wright, 2005). These studies provide the best available estimates of the signaling and human-capital effects of NBPTS certification. The largest study involved NBPTS teachers in Florida, 1,517 certified in reading and 1,256 in math, and controlled for student and school-fixed effects (Harris & Sass, 2007). While teachers certified before 2001 were more productive after certification than teachers who were never certified, the most recent data (for teachers certified in 2002 and 2003) indicate that the certification process actually reduced teacher performance to the level of teachers who were never certified, with effect sizes (from precertification to postcertification) averaging –0.069 SD in math and –0.115 SD in reading. The researchers concluded that NBPTS-certified teachers in Florida were more effective than other teachers before they start the certification process, but their relative productivity fell during the certification process and never returned to pre-application levels. After certification, the performance of teachers certified in 2002 and 2003 was nearly identical to the performance of never-certified teachers, with average signaling-effect sizes of 0.000 SD in math and –0.002 SD in reading. A second study involved an unspecified number of NBPTS certified teachers but included all teachers in grades 3, 4, and 5 in North Carolina for the years 1995 to 2004 (Clotfelter et al., 2007b). After controlling for student-fixed effects, the certification process reduced teacher performance, with effect sizes (from 2 years prior to certification to postcertification) averaging –0.008 SD in math and –0.013 SD in reading, based on the researchers’ preferred models. Although the reductions in performance were not statistically significant at conventional levels, they fit the pattern reported by Harris and Sass (2007) as well as Goldhaber and Anthony (2007). After certification, NBPTS teachers remained somewhat more effective than never-certified teachers, with average signaling-effect sizes of 0.032 SD in math and 0.020 SD in reading. A third study analyzed data from Florida and North Carolina using common specifications and time periods and controlling for both student and

92    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

school fixed effects (Ladd et al., 2007). The certification process had mixed effects on teacher performance (from 2 years prior to certification to postcertification), with average effect sizes of 0.016 SD in math and –0.022 SD in reading. After certification, NBPTS teachers remained somewhat more effective than never-certified teachers, with average signaling effect sizes of 0.050 SD in math and 0.023 SD in reading. McCaffrey and Rivkin (2007) drew upon the same data and analyses and reported identical results (see the coefficients for their gain model with student and school-fixed effects, in Table 4). A fourth study, involving 42 NBPTS certified teachers, controlled for school fixed effects and identified a sample of students who were apparently randomly assigned to classrooms within their schools (Clotfelter et al., 2006). This identification was based on six chi-square tests to determine if the distribution of students within each classroom, based on the students’ observed characteristics, fit the random pattern that would be expected if students were indeed randomly assigned. After controlling for lagged student achievement, NBPTS certified teachers were less effective in math than never-certified teachers, with a negative signaling effect size of –0.035 SD, but marginally more effective in reading, with a positive effect size of 0.005 SD. A fifth study, sponsored by NBPTS, involved an unspecified number of teachers from two large North Carolina school districts, applied HLM, controlled for various teacher characteristics, and found signaling effect sizes of 0.018 SD in math and –0.038 SD in reading, compared with teachers who never applied for certification (Sanders et al., 2005). A sixth study involved 303 of North Carolina’s NBPTS certified teachers and found, prior to certification, that teachers who eventually became certified were significantly more effective than teachers who never applied for certification or applied and failed (Goldhaber & Anthony, 2007). However, the relevant comparison is effectiveness after the certification process is completed. In 5 of 6 specifications, the signaling effect for NBPTS certified teachers after their first year of certification was statistically no different (at an alpha level of .05) than the effect for teachers who never applied for certification, and it is significantly negative in the sixth specification. Thus, it appears that NBPTS certified teachers were no more effective than never-certified teachers after the first year of certification. The most rigorous specification controlled for student fixed effects and found a small positive signaling effect size of 0.004 SD in reading and a negative effect size of –0.098 SD in math. The seventh study, regarding math achievement in the Miami-Dade County Public Schools, compared the performance of 61 NBPTS certified teachers

The Cost-Effectiveness of NBPTS Teacher Certification    93

with teachers who never applied for certification and found a signaling effect size of 0.063 SD, controlling for student fixed effects and a range of teacher and school characteristics (Cavalluzzo, 2004; Cunningham & Stone, 2005). To summarize, there is a small signaling effect of NBPTS certification, and effects on human capital are either mixed or negative. The average signaling effect size across the seven key studies is 0.002 SD in reading and 0.004 in math (Table 6.1). This represents the average gain in student achievement of replacing an existing teacher with an NBPTS certified teacher (the effect is diluted because some teachers in the general population are already teaching at the NBPTS level and would pass the NBPTS exam if they applied, while others would fall below the NBPTS standard and would fail the NBPTS exam). While other studies have investigated the relationship between NBPTS certification and student achievement (Bond, Smith, Baker, & Hattie, 2000; McColskey et al., 2005; Stone, 2002; Vandevoort, Amrein-Beardsley, & Berliner, 2004), none involved a sample with more than 35 NBPTS certified Table 6.1  Signaling Effect Sizes for NBPTS Certified Teachers in Comparison with Teachers Who Were Never Certified by NBPTS Study

Reading

Math

Harris & Sass, 2007 Clotfelter et al., 2007b Ladd et al., 2007 Clotfelter et al., 2006 Sanders et al., 2005 Goldhaber & Anthony, 2007 Cavalluzzo, 2004

–0.002a 0.020b 0.023c 0.005d –0.038e 0.004f —

0.000a 0.032b 0.050c –0.035d 0.018e –0.098f 0.063g

–0.001 0.026 0.037 –0.015 –0.010 –0.047 0.063

0.002

0.004

0.008

Average

Average

Coefficients from Table 4, converted to effect sizes using the means of the standard deviations reported on page 18. b Average coefficients from Table 4. c Coefficients from Table 5 (specifications control for both student and schoolfixed effects). d Coefficients from the model controlling for lagged achievement, see Table 7. e Average of effect sizes (“Cert vs Never”) across grades 4, 5, 6, 7, and 8 (see Table 3d). f Coefficients for “Past NBCT,” from Table 2, Models 6 and 12 (which control for student-fixed effects), converted to effect sizes using the reported standard deviations of 9.94 in reading and 12.34 in math. g Coefficients from Model 15, which controls for student-fixed effects. a

94    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

and the only study that used HLM involved a small sample of 25 NBPTScertified teachers and failed to find any impact on student achievement (McColskey et al., 2005). These studies are limited by key methodological weaknesses (Cunningham & Stone, 2005; Education Commission of the States, 2006).3

Relative Cost-Effectiveness If the average effect size across all students for NBPTS certification is 0.002 SD in reading and 0.004 SD in math, and the average cost per student is $326.53, the effectiveness–cost ratio (effect size in standard deviation units divided by annual cost per student in dollars) for NBPTS certification is 0.000006 in reading and 0.000012 in math. For comparison, Table 6.2 lists the annualized effect sizes, annual costs per student, and effectiveness–cost ratios for a range of alternative approaches for raising student achievement, including Gordon et al.’s (2006) proposal to use value-added statistical methods to identify and replace ineffective probationary teachers with new teachers, a cheaper version of Gordon et al.’s (2006) proposal using 5th-year college graduates, establishing a minimum SAT score of 1000 for new teacher applicants, voucher programs, charter schools, a 10 percent increase in per-pupil expenditure, increased educational accountability (defined as the implementation of high school-level exit exams), comprehensive school reform, class size reduction, high-quality preschool, and rapid assessment, where student performance in math and reading is rapidly assessed 2–5 times per week. The effectiveness–cost ratios for NBPTS teacher certification are at the low end of the ratios in Table 6.2, suggesting that it is not a cost-effective approach for raising student achievement. An alternative approach for improving the quality of the teaching force is Gordon et al.’s (2006) proposal to identify and replace the bottom quartile of novice teachers at the end of their second year of teaching using value-added assessments of teacher performance. However, the student-achievement effect size of 0.057 SD, calculated from Gordon et al.’s (2006) data, is small and the annual cost per student of $624.72 is high, implying a low cost-effectiveness ratio of 0.000091 (Yeh & Ritter, 2009). A version of Gordon et al.’s (2006) proposal using 5th-year college graduates is somewhat cheaper ($512.27 per student) but the cost-effectiveness ratio remains low (0.000111). Since Gordon et al.’s (2006) proposal is not cost-effective, relative to the most cost-effective approaches listed in Table 6.1, it is likely that the value-added teacher-assessment strategy is not cost-effective.

0.324

0.392

0.090 0.129 0.155 0.054

0.104 0.120 0.150 0.150 0.270 0.175

0.004 0.057 0.057 0.015 0.080 0.001 0.083 –0.062 0.510

0.002 0.057 0.057 0.004 0.032 0.009 0.083 0.051 0.510

Math

b

a

Annualized effect size. Annual cost per student, adjusted for inflation to August 2006 dollars. c Effect size in SD units divided by annual cost per student.

NBPTS teacher certification Gordon et al.’s (2006) proposal Gordon et al. (using 5th-year grads) Minimum SAT score of 1000 Voucher programs Charter schools 10 percent increase in spending Increased accountability Comprehensive school reform Class size reduction   Nye et al. (2001)   Finn et al. (2001) Perry Preschool Abecedarian Preschool Rapid Assessment   (high estimates) Rapid Assessment   (low estimates)

Reading

Effect Size (SD)a

$1,379.28 $1,379.28 $12,147.03 $10,188.09 $9.45 $18.89 $9.45 $18.89

$326.53 $624.72 $512.27 $894.10 $9,646.00 $8,086.30 $1118.83 $2025.97 $217.83

Costb

0.018519

0.000075 0.000087 0.000012 0.000015 0.028571

0.000006 0.000091 0.000111 0.000004 0.000003 0.000001 0.000075 0.000253 0.002341

Reading

0.017152

0.020752

0.000065 0.000094 0.000013 0.000005

0.000012 0.000091 0.000111 0.000017 0.000008 0.000000 0.000075 0.000253 0.002341

Math

Effectiveness–Cost Ratioc

Table 6.2  Comparison of Effect Sizes, Costs, and Effectiveness–Cost Ratios for Various Interventions to Raise Student Achievement

The Cost-Effectiveness of NBPTS Teacher Certification    95

96    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Yet another approach for raising the quality of the teaching force is to impose a requirement that all new teacher applicants achieve a minimum score of 1000 on the SAT exam. However, low student-achievement effect sizes of 0.004 SD in reading and 0.015 SD in math, coupled with high costs of raising teacher salaries to recruit the necessary pool of teachers, imply low effectiveness–cost ratios of 0.000004 in reading and 0.000017 in math (Yeh, 2009b). Table 6.2 lists additional ratios drawn from published cost-effectiveness studies: voucher programs, charter schools, a 10 percent increase in educational expenditure, and increased educational accountability (Yeh, 2007); an upper-bound estimate for comprehensive school reform (Yeh, 2008); an upper-bound estimate for class size reduction (Yeh, 2008, 2009a); the Perry and Abecedarian preschool programs (Yeh, 2008) and rapid assessment (Yeh, 2008).

Discussion The ratios in Table 6.2 suggest that the most cost-effective approach for raising student achievement is rapid assessment. Rapid assessment is approximately one magnitude (10 times) as cost-effective as comprehensive school reform; two magnitudes as cost-effective as class size reduction, or a 10 percent increase in per-pupil expenditure, or Gordon et al.’s (2006) proposal, or increased educational accountability; three magnitudes as cost-effective as NBPTS teacher certification or imposing a minimum SAT score of 1000 on new teacher applicants or voucher programs or high-quality preschool; and four magnitudes as cost-effective as charter schools (a magnitude of gain implies that student achievement would increase 10 times faster for every dollar invested in rapid assessment rather than the alternative). The magnitude of the differences in cost-effectiveness suggest that improving student achievement by 0.175 SD in reading for every American student would take 9 months and cost $459 million using rapid assessment ($9.45 multiplied by projected 2006 PK–12 enrollment of 48,574,000 students). To achieve the same effect size through NBPTS certification would take 88 years and cost $1.4 trillion ($326.53 multiplied by 48,574,000 students and 88 years). A question that arises is whether it is appropriate to compare an intervention such as NBPTS certification that is designed to improve the quality of the teaching force with an intervention such as rapid assessment, which aims to provide teachers with information to improve student performance. As noted earlier, however, the ultimate goal of improving the quality of the teaching force is to improve student achievement (if achievement

The Cost-Effectiveness of NBPTS Teacher Certification    97

does not improve, then what does it mean to assert that teacher quality has improved?). Furthermore, the review of research literature conducted by the National Academy of Sciences concluded that there was insufficient information about the effect of the NBPTS certification process on broad outcomes, including teacher recruitment and retention, the quality of school- and district-level teacher professional development, possible spillover effects on the teaching practices of NBPTS teacher colleagues or school and district educational standards, or effects of the program as a form of professional development (see Hakel et al., 2008). The only rigorous studies of the effects of NBPTS certification involve the usefulness of certification as a signal of student performance in math and reading. Since it is well-established that basic skills in math and reading, as measured by standardized tests, predict educational attainment and earnings (Currie & Thomas, 2001; Murnane, Willet, & Levy, 1995; Neal & Johnson, 1996; O’Neill, 1990; Winship & Korenman, 1999), it is appropriate to conduct a cost-effectiveness study that focuses on the signaling effect of NBPTS certification with regard to student achievement in math and reading. If the results of standardized tests of math and reading achievement matter, what do the results of the cost-effectiveness analysis tell us? Perhaps the most striking implication is that additional expenditure on systems such as rapid assessment that provide teachers with information to improve student performance is more cost-effective than expenditure on efforts to improve the quality of the teaching force through NBPTS certification, value-added assessment of teacher performance, or imposing a requirement that new teachers must achieve a minimum score of 1000 on the SAT exam. These analyses suggest a need to reexamine the basic premise that higher standards for teacher selection are the key to improved student achievement. Instead, it may be more productive to implement classroom assessment systems designed to help teachers rapidly diagnose and address student weaknesses. While diagnosing and addressing student weaknesses is important, a second—and perhaps more important—reason that rapid assessment may be effective is that it may directly improve student engagement. Research suggests that an effective way of engaging students and building intrinsic interest in academic work is to provide performance feedback through rapid assessment of math and reading performance (see Yeh, 2010b, for a review). A final concern is that a focus on diagnostic assessment systems might appear to imply a conception of teaching that is driven by narrow measures of math and reading achievement. Clearly, teachers must exercise professional judgment in using the results of diagnostic assessments in math and

98    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

reading, just as physicians must exercise professional judgment in using the results of diagnostic tests that provide narrow measurements of blood pressure and heart rate. The full benefits of diagnostic assessments are likely to depend on mentoring from teachers who have successfully used those assessments, just as novice physicians learn to use diagnostic results under the guidance of more experienced physicians. However, when the results of diagnostic assessments are used properly, the results of the cost-effectiveness analysis suggest that these assessments are a more cost-effective use of society’s resources than NBPTS certification for the purpose of improving student achievement in math and reading.

7 The Cost-Effectiveness of Raising Teacher Quality

T

he analyses in chapters 5 and 6 suggest that the strategy of raising teacher quality is not a cost-effective approach for raising student achievement, whether quality is measured through value-added teacher assessments or NBPTS certification status. However, it is possible that this strategy might prove to be cost-effective if an extremely low-cost measure of teacher quality is utilized. Chapter 7 examines this possibility. It is useful to begin by reviewing an important distinction between the concept of teacher quality and measures of teacher quality. Any analysis of the correlation between teacher quality and student achievement must rely on a quantitative measure of teacher quality that ignores important nuances of quality. Strong teachers display a range of skills, dispositions, and attitudes that are not easily captured in a single quantitative index (Hopkins & Stern, 1996; OECD, 1994; Smith, 2008; Smith & Gorard, 2007; Wang & Fwu, 2007). Teacher quality involves a commitment to keep searching for more effective instructional methods, communication of warmth, use of

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 99–111 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

99

100    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

humor, patience, perseverance, and efforts to develop pupils’ self-esteem, even when confronted by negative student attitudes and behavior (Hopkins & Stern, 1996). Teacher quality also involves command of tactics for teaching specific concepts and skills, and a theoretical and practical understanding of different pedagogical models that may be employed at appropriate times, with particular students, and for enhancing specific outcomes (Hopkins & Stern, 1996). Finally, teacher quality involves the habit of collaborating with other teachers to improve practice; the ability to reflect, consider alternative interpretations, form and test hypotheses during instructional activities, and improvise; and the willingness to accept and embrace longterm change (Hopkins & Stern, 1996). While teacher quality is a multi-dimensional concept, researchers who seek to correlate measures of quality with student achievement must necessarily reduce those dimensions to an index that is suitable for quantitative analyses. The process of reduction may or may not produce an index that is valid. We may say that an index is valid when teachers who are identified as “strong” according to such an index display the skills, dispositions, and attitudes that have previously been identified as characteristic of strong teachers. However, the index should also correctly predict important outcomes. In particular, we expect that teachers who are identified as strong teachers will produce superior student achievement, compared with teachers identified as “weak.” Conclusions about validity are strongest when an index correctly identifies teachers who display the skills, dispositions, and attitudes that are characteristic of strong teachers, and is predictive of the gains in achievement that such teachers produce. A presumption is that an index of quality that samples the skills, dispositions, and attitudes that characterize strong teachers is more likely to identify teachers who produce high student achievement. An index that does not sample those skills, dispositions, and attitudes is likely to be poorly correlated with student achievement, unless the index samples fundamental abilities that underlie the skills, dispositions, and attitudes that are observed in strong teachers. NBPTS certification is an example of a measure that samples many of the skills and dispositions that are believed to characterize strong teachers. On the other hand, value-added assessment samples only the achievement that strong teachers are believed to produce. Yet a third measure is an index of cognitive ability that presumably underlies strong teaching, as well as strong performance in many other occupations. While none of these measures sample the complete range of skills, dispositions, and attitudes that characterize strong teachers, the question of whether they are valid is an empirical question. Critics who argue that other measures have greater

The Cost-Effectiveness of Raising Teacher Quality    101

validity would need to demonstrate, empirically, that those other measures are stronger predictors of student achievement than the measures that are examined in chapters 5, 6, and 7.

Need for Cost-Effectiveness Studies Research is needed to determine the extent to which existing measures of teacher quality provide useful, cost-effective signals of quality because they are already being incorporated into teacher selection criteria. By the year 2007, for example, all but four American states required potential teachers to pass tests that cover basic skills, content knowledge, and/or professional knowledge (Goldhaber, 2007). The majority of these states use one or more of the Educational Testing Service’s (ETS) Praxis series of tests (U.S. Department of Education, 2005e). As noted in chapter 5, econometric studies suggest that differences among teachers contribute to significant differences in student achievement (Hanushek et al., 2005; Jürges & Schneider, 2007; Rivkin et al., 2005; Rockoff, 2004). This research has attracted considerable attention because it suggests that student achievement may be improved if high-performing teachers are substituted for low-performing teachers. These studies use econometric methods in an attempt to statistically isolate the “value-added” contribution of each teacher to student achievement (Sanders & Horn, 1994). However, as noted in chapter 5, research indicates that value-added teacher assessment lacks adequate reliability and validity for high-stakes decisions regarding teacher selection, promotion, and retention. High-stakes performance-based assessments might also undermine the incentive for teachers to mentor their colleagues (Lavy, 2007). If mentoring boosts the performance of a colleague, the colleague may be more likely to be re-hired, rewarded, and promoted, in comparison with the mentor. Thus, performance-based assessments could discourage mentoring and could undermine organizational effectiveness and performance, reducing student achievement. While this problem could be reduced through group incentives that would reward cooperation within each group, disincentives to cooperate between groups would remain. A study conducted in Israel suggests that group incentives may be effective (Lavy, 2002). However, there is little evidence that group incentives are effective in the United States and other contexts (Lavy, 2007). In addition to the reliability, validity, and disincentive problems associated with high-stakes performance assessments, the analysis in chapter 5 of Gordon et al.’s (2006) proposal to replace the bottom quartile of novice teachers using value-added assessments of teacher performance suggests

102    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

that it is unlikely that the general value-added teacher assessment strategy will ever prove to be cost-effective, relative to rapid assessment. The costeffectiveness analysis suggests that the gains from this approach would be meager, the costs would be high, and the proposal would be far less costeffective than the implementation of rapid formative assessment systems that assess student progress in math and reading 2 to 5 times per week and provide rapid feedback of the results to students and teachers (Yeh & Ritter, 2009). An alternative to Gordon et al.’s (2006) proposal is to screen teacher applicants before the hiring decision. Studies using national or state data sets found significant relationships between teacher certification measures and student achievement (Betts, Rueben, & Danenberg, 2000; DarlingHammond, 2000; Darling-Hammond, Holtzman, Gatlin, & Heilig, 2005; Ferguson, 1991; Fetler, 1999; Goe, 2002; Goldhaber & Brewer, 2000; Hawk, Coble, & Swanson, 1985; Strauss & Sawyer, 1986). As noted in chapter 6, perhaps the best test of the efficacy of teacher certification involves the professional standards for teaching developed in the United States by the National Board for Professional Teaching Standards (NBPTS). However, the analysis in chapter 6 suggests that the impact of NBPTS certified teachers on student achievement is relatively small, while the costs of certification are substantial. Thus, NBPTS certification is far less cost-effective than alternatives such as the implementation of rapid formative assessment systems (Yeh, 2010a).

Setting a Minimum SAT Test Score While NBPTS certification is not a strong predictor of teaching performance, it is possible that a general test of math and reading ability is more predictive of teaching performance than the specialized NBPTS exam and may be a more cost-effective screening device (Eide, Goldhaber, & Brewer, 2004; Ferguson, 1998a). A few early studies (Ehrenberg & Brewer, 1995; Ferguson, 1991; Ferguson & Ladd, 1996; Strauss & Sawyer, 1986) suggested that teacher performance on general tests of math or reading might be useful indicators of teacher effectiveness. Until recently, however, no study of teacher performance on tests of math and reading controlled for nonrandom teacher sorting or student-fixed effects. These controls are important because recent studies demonstrate that positive correlations between the strength of teacher qualifications and student achievement observed in cross-sectional data are driven largely by sorting of teachers and students across schools and, to a lesser extent, within schools (Clotfelter et al., 2006, 2007b; Goldhaber, 2007). Once these controls are implemented, the ob-

The Cost-Effectiveness of Raising Teacher Quality    103

served relationship between teacher performance on licensure tests and student achievement is greatly attenuated. The SAT (formerly known as the Scholastic Achievement Test but now known simply as the SAT) is a college-entrance examination focusing on math and reading ability, and is typically administered in the United States to 11th- and 12th-grade high school students. Manski (1987) suggested that the most efficient way to boost the cognitive level of the teaching force would be to require that new teachers meet a minimum SAT test score requirement and to pair this requirement with a substantial increase in teacher salaries to attract high-ability teacher candidates. A salary increase alone is not effective in raising the quality of the teaching force. Drawing upon data for the United States, Ballou and Podgursky (1995) estimated the effect of a 20 percent salary increase on teacher SAT scores and found that “the results challenge the notion that the work force can be improved merely by relying on high wages to improve the applicant pool. The problem is that without effective screening, the applicant pool improves very little” (p. 334). Despite the salary increase, average cognitive ability among teachers would remain below the mean for the collegeeducated population (Ballou & Podgursky, 1995). Based on econometric analyses, Manski (1987) estimated that setting a minimum SAT score of 1000 would result in an average SAT score of 1130, which is one standard deviation (SD) above the average score of 905 achieved by a nationally representative sample of teachers (Angrist & Guryan, 2008; College Board, 2006). Manski (1987) estimated that teacher salaries would need to increase by $90 per week, or 40.56 percent above the average weekly teacher salary of $221.89 (in 1979 dollars), in order to attract the required number of individuals meeting the minimum 1000 SAT score requirement to the teaching profession. This salary premium is equivalent to a $16,244 per-year premium over the average 2006 starting salary of $40,049, adjusted for inflation. Thus, Manski’s (1987) results suggest that a goal of raising the cognitive level of the teaching force by one SD might be achieved by setting a minimum combined (reading and math) SAT score of 1000, while raising teacher salaries by $16,244 per year. These results might not be generalizable to countries other than the United States. For example, teaching is a more prestigious occupation in Taiwan, with salaries and benefits exceeding many civil service and entry-level industry positions (Wang & Fwu, 2007). Thus, it is likely that teaching already attracts many of the best and brightest individuals in Taiwan. However, in many developed countries and the cities of Hong Kong and Singapore, the prestige and compensation of teachers is lower compared with civil servants and positions in industry, business, and the service sector, much as it is in the

104    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

United States (Wang & Fwu, 2007). Thus, the data presented here may provide general guidance about the potential impact on student achievement of implementing a test-based selection criterion for teacher applicants. In the United States, a large salary increase is needed because setting a minimum SAT score of 1000 drastically reduces the eligible pool of individuals. Manski’s (1987) data indicated that only 54 percent of the working college-educated members of a nationally representative sample would have been eligible to become teachers if they were required to have minimum SAT scores of 1000. The most recent data from the National Education Longitudinal Study of 1988 (NELS88), involving a nationally representative sample of American students, indicates that as of the year 2000 an even smaller percentage—only 45.9 percent of working, college-educated individuals in the NELS88 cohort—scored above 1000 on the SAT, suggesting that the proportion of each cohort that meets Manski’s criterion has declined significantly. The decline in the percentage scoring above 1000 reflects growth in the fraction of the college population that is drawn from less-advantaged segments of the overall population. Manski’s (1987) estimate of the wage premium required to attract highability individuals to the teaching profession is consistent with more recent data. Of the 111,700 members of the NELS88 cohort who were college educated and working in the year 2000, 0.8 percent (894 individuals) chose the teaching profession. Assuming that the wage elasticity of the supply of teachers is 3 (at the upper end of estimates by Manski, 1987 and Ballou & Podgursky, 1995), and assuming a very high probability (.88) that a job applicant receives an offer, wages must increase by 44.65 percent in order to increase the annual supply of high-ability teacher applicants by an amount sufficient to replace the 54.1 percent of each cohort of new teachers who may be expected to score below 1000 on the SAT (.541 × 894 = .88 × .459 ×  44.65 × .03 × 894). The 44.65 percent figure is somewhat higher than Manski’s estimate of 40.56 percent. It is underestimated to the extent that the true wage elasticity is lower than 3, or the probability that a job applicant receives an offer is less than 88 percent, suggesting that a 44.65 percent increase in salary, equal to $17,882 per year, is a lower-bound estimate of the wage premium needed to attract sufficient teacher applicants scoring a minimum of 1000 on the SAT. The annual cost to society is equivalent to $894.10 per student, assuming 20 students per classroom. The difficulty of recruiting high-ability teacher applicants and the size of the required wage premium suggests that it would be impractical to raise the cognitive level of the teaching force much further than one SD using a cognitive screening test. Thus, it is likely that a 1 SD improvement in cognitive level is the maximum that may be expected by setting a minimum

The Cost-Effectiveness of Raising Teacher Quality    105

SAT score and increasing teacher salaries, given the existing distribution of cognitive ability and the existing propensity of high-ability individuals to choose teaching as a profession.1

The Impact of Employing High-Ability Teachers Clotfelter et al. (2007) provide the best available econometric estimate of the effect of high-ability individuals on student achievement, where ability is measured through licensure test scores. Clotfelter et al. (2007) measured teacher ability using scores on widely used teacher licensure exams, which are administered throughout the United States by the Educational Testing Service: The Praxis I (reading, writing, and math) and the Praxis II licensure exams (personal communication, H. Ladd, August 19, 2007). Scores from these exams are highly correlated with ACTT test scores (Podgursky, 2004) and, thus, SAT test scores (Dorans, 1999) (the ACT test produced by ACT, Inc., formerly known as the American College Testing Program, is similar to the SAT and is a college-entrance examination that tests reading and math skills). Clotfelter et al. (2007) employed statewide longitudinal data covering 75 percent of all North Carolina students in grades 3, 4, and 5 in years 1995–2004; matched student test scores to specific teachers; and, by controlling for student-fixed effects, controlled for observable as well as unobservable characteristics of students that might influence achievement. Teachers who scored 1 SD above the average on teacher licensure exams boosted math achievement by 0.015 SD and reading achievement by 0.004 SD, relative to the average teacher (an overall average of 0.010 SD) (Clotfelter et al., 2007b, Table 5). The results are consistent with (but not directly comparable to) Goldhaber’s (2007) results. Goldhaber (2007) employed statewide longitudinal data; matched 24,237 North Carolina teachers, their licensure test scores, and student test scores in grades 4, 5, and 6 in years 1995–2003; controlled for student-fixed effects; and found an average effect size of 0.008 SD in reading and 0.022 SD in math, or an average gain of 0.015 SD for the upper 80 percent of teachers, compared with the bottom 20 percent. However, Goldhaber (2007) did not provide information about the effect of increasing average teacher licensure scores by 1 SD (which is the information needed to conduct a cost-effectiveness analysis). In an earlier study, Clotfelter et al. (2006) employed statewide longitudinal data; matched 3,223 North Carolina teachers, their licensure test scores and 5th-grade student test scores from the 2000–2001 school years; and found that teachers who scored 1 SD above the average on teacher licensure exams boosted achievement by 0.005 SD in reading and 0.012 SD

106    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

in math, or an overall average of 0.009 SD, controlling for lagged achievement and school—but not student—fixed effects. Teacher candidates who pass the licensure exams average slightly higher SAT scores than all college-bound seniors. Gitomer, Latham, and Ziomek (1999) analyzed data for a nationwide sample of over 300,000 prospective teachers and found that individuals who passed the Praxis I teacher-licensure exam averaged a total (math and verbal) SAT score of 1039. Individuals who passed the Praxis II teacher-licensure exam averaged a total SAT score of 1029. Since 88 percent of teachers in North Carolina have passed these exams to become licensed, Gitomer et al.’s (1999) data suggest that 88 percent of the teachers in North Carolina would have scored an average of approximately 1029 on the SAT, 13 points higher than the average of 1016 for all college-bound seniors, assuming that the average North Carolina teacher is similar to the average teacher nationwide.

The Cost-Effectiveness of Employing High-Ability Teachers If the average impact of employing high-ability teachers is 0.01 SD, and the cost per student is $894.10, the effectiveness–cost ratio is 0.00001118. By comparison, the effectiveness–cost ratio calculated in chapter 2 for rapid assessment is 0.01125618, implying that rapid assessment is 1007 times as cost-effective as raising teacher salaries by 44.65 percent and instituting a minimum 1000 SAT score requirement. In other words, student achievement would increase 1,007 times faster for every dollar invested in rapid formative assessment rather than raising teacher quality by setting a minimum SAT score of 1000.

Sensitivity Analysis I The effect-size estimate for raising teacher quality is based on a 1 SD increase in scores on the North Carolina teacher licensure examinations, while the cost estimate is based on a 1 SD increase in SAT scores. While both exams measure math and reading ability, it is unlikely that 1 SD on the licensure exams is exactly equal to 1 SD on the SAT. Any difference would influence the calculation of the effectiveness–cost ratio associated with raising teacher quality. The sensitivity of the calculation to this difference may be tested by examining the effect of two hypothetical extremes. If 1 SD on the teacher licensure exams is equivalent to 2 SD on the SAT, average SAT scores would need to increase 2 SD to 1361 in order to achieve the 0.01 SD effect on student achievement reported by Clotfelter et

The Cost-Effectiveness of Raising Teacher Quality    107

al. (2007). This is 20.4 percent above the average SAT score of 1130 that is the basis for the cost figure used to derive the 0.00001118 effectiveness-cost estimate and would, therefore, greatly increase the wage premium needed to attract high-ability individuals (scoring an average of 1361 on the SAT). The effectiveness–cost ratio of setting a minimum SAT score would shrink proportionately, increasing the comparative advantage of rapid formative assessment even further. On the other hand, if 1 SD on the SAT is equivalent to 10 SD on the licensure exams, extrapolation of Clotfelter et al.’s (2007) results suggests that student achievement would increase by 0.1 SD instead of 0.01 SD, under the optimistic assumption that there is no decline in the returns to math and reading ability. The effectiveness–cost ratio of setting a 1000 minimum SAT score would increase tenfold to 0.00011180, but the effectiveness–cost ratio for rapid formative assessment remains 101 times as large as the effectiveness–cost ratio for raising teacher quality. Therefore, rapid formative assessment remains much more cost effective, even under improbably adverse assumptions about the relationship between scores on the SAT and scores on the North Carolina licensure exams.

Sensitivity Analysis II Clotfelter et al. (2007) provide the best available estimate of the effect of raising teacher ability by 1 SD, measured by scores on a test of math and reading ability. If future research demonstrates that the estimate is too low by a factor of 10, student achievement would increase by 0.1 SD instead of 0.01 SD as a consequence of setting a 1000 SAT score minimum. However, rapid formative assessment would remain 101 times as cost effective as the policy of setting a 1000 SAT score minimum and raising teacher salaries. Thus, the advantage of rapid formative assessment is robust to a tenfold increase in the effectiveness of raising the ability of the teaching force by 1 SD, measured by scores on a test of math and reading ability.

Sensitivity Analysis III Combining the hypothetical change described in Sensitivity Analysis I (assuming that 1 SD on the SAT is equivalent to 10 SD on the licensure exams) and the hypothetical change described in Sensitivity Analysis II (future research indicates that student achievement would increase by 0.1 SD instead of 0.01 SD as a consequence of setting a 1000 SAT score minimum) increases the effectiveness–cost ratio for the 1000 SAT score requirement by a factor of 100 to 0.00111800. However, rapid formative assessment would

108    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

remain 10 times as cost-effective as the policy of setting a 1000 SAT score minimum and raising teacher salaries by 44.65 percent. Thus, the advantage of rapid formative assessment is robust to the combination of both adverse scenarios.

Discussion Comparisons of cost-effectiveness suggest that rapid formative assessment is a more promising approach for improving student achievement than raising teacher quality by instituting a minimum 1000 SAT score requirement and raising teacher salaries by 44.65 percent. The cost-effectiveness calculations suggest that achievement gains per dollar from rapid formative assessment are 1,007 times the gains that accrue from the combination of a 1000 SAT score requirement and higher teacher salaries. In concrete terms, the research findings suggest that improving student achievement by 0.319 SD (or approximately 3 months of learning) for every American student would take less than 9 months and cost $1.4 billion using rapid formative assessment ($28.34 multiplied by projected 2006 PK–12 enrollment of 48,574,000 students). To achieve the same effect size by setting a minimum SAT score of 1000 would take 31.9 years and cost $138.54 trillion ($894.10 per student, multiplied by 48,574,000 students and 31.9 years). These calculations suggest the magnitude of the differences in effectiveness and cost. The preceding analysis suggests the futility of attempting to raise student achievement by recruiting significantly more able individuals into the teaching profession, if teacher ability is measured by math and reading test scores. Two parallel analyses (see chapters 5 and 6) suggest the futility of attempting to raise student achievement by recruiting significantly more able individuals into the teaching profession, if teacher ability is measured by value-added measures of student achievement or NBPTS certification status (Yeh, 2010a; Yeh & Ritter, 2009). Thus, rapid formative assessment offers a means to raise the quality of teaching and learning, and to do so in a far more effective and cost-effective way than raising the bar for new teacher applicants or by replacing low-performing teachers. The effect size of raising teacher quality is extremely small—between 0.008 SD and 0.057 SD—whether teacher quality is measured by math and reading test scores, NBPTS certification, or Gordon et al.’s (2006) valueadded teacher assessment approach. These results suggest a need to reconsider the core assumption: that low student achievement can be traced to low teacher quality. The results pose an apparent contradiction with studies demonstrating that there are consistent differences in student achievement that are

The Cost-Effectiveness of Raising Teacher Quality    109

associated with specific teachers (Hanushek et al., 2005; Rivkin et al., 2005; Rockoff, 2004). However, it is useful to recall that there is a wide range of performance in most endeavors. In fact, the distribution of performance in most endeavors is positively skewed, meaning low performance is most frequent and high performance is rare (Walberg, Strykowski, Rovai, & Hung, 1984). The distribution of teacher performance is not likely to be much different from the distribution of performance in other professional occupations. Yet average performance in many professional fields has improved dramatically, and it may be instructive to review examples of how this was accomplished. In the field of law, the introduction of computerized legal databases now allows the average attorney to access an enormous amount of case law, state and federal statutes, administrative codes, and public records, and has dramatically raised the standard of performance in the legal profession (McChrystal, Gleisner III, & Kuborn, 1999). The technology improves the ability of attorneys to quickly search through thousands of judicial decisions, identify strong and weak precedents, and effectively understand and apply the law. In the fields of architecture, engineering, and construction, the use of computer-aided design programs has revolutionized these fields and raised the standard of performance of the average architect and engineer by giving them the ability to quickly model designs in 3-dimensions (Aouad, Lee, & Wu, 2010; Chaszar, 2006). In medicine, enormous advances in medical knowledge have eliminated the use of ineffective techniques such as bloodletting and now permit average doctors to outperform the best doctors of the 19th century in effectively diagnosing and treating disease and illness (Straus & Straus, 2006). In none of these cases, however, were the advances a consequence of improved screening and reward systems. In each field, improvement occurred because of advances in technology and fundamental knowledge about what works. The lesson for the field of education is that it may be more effective to shift the entire distribution of teacher performance through a fundamental advance in technology and knowledge, such as the advances characterizing performance improvements in law, architecture, engineering, and medicine, instead of attempting to substitute teachers in the upper tail of the performance distribution for teachers in the lower tail. The difficulty is specifying the nature of the advance that is needed. It seems likely, however, that no shift in the distribution of teaching performance is possible without a better understanding of how to engage students in academic work. Public schools are notorious for their inability to engage students (Goodlad, 1984), and researchers have been unable to provide a satisfactory explanation of the underlying cause.

110    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Lev Vygotsky’s (1978) concept of the “zone of proximal development” (ZPD) suggests why it can be difficult for a classroom teacher to engage students. The ZPD is defined as “the distance between the actual developmental level as determined by independent problem solving and the level of potential development as determined through problem solving under guidance, or in collaboration with more capable peers” (Vygotsky, 1978, p. 86). Activities that fall within these two bands of difficulty promote learning. However, the ZPD is different for each student and constantly shifts upward as increasingly difficult tasks are mastered. Thus, an optimal level of challenge must be maintained in order to sustain a student’s interest and promote learning. Tasks that are too easy are boring, while tasks that are too difficult are frustrating. The tipping point between boredom and frustration for each student constantly shifts upward, creating an enormous challenge for teachers attempting to gear tasks to the ZPD of each student in classrooms of 20 or more students. Research regarding rapid formative assessment suggests that it provides a supportive structure that helps teachers to meet this challenge (Yeh, 2006a). Each school’s library books were organized according to reading level, and computer software automatically printed math assignments based on each student’s achievement level. Increases in difficulty level were tailored on a daily basis to each student’s performance. The structure allowed each student to coordinate these tasks with minimal assistance from the teacher, except when reading or math scores fell below a set criterion. Thus, the rapid formative assessment system helped to create a structure where each student was able to operate within his or her ZPD during reading and math activities. The structure allowed each student to control his or her rate of progress. Students who read faster or completed math problems faster were able to take on new challenges faster. There appeared to be little or no downtime waiting for other students or waiting for the teacher. To this observer, the boredom and frustration noted by Goodlad (1984) in the public schools he studied were largely missing. Students were able to work quietly on their own without interruptions that would affect their engagement and experience of flow. Students who needed help raised a hand, or were automatically flagged by the rapid formative assessment software for tutoring or small-group instruction. Thus, the classroom routine was structured so that interruptions were minimized and help was provided only when needed. Finally, there were very clear markers of achievement, on a daily basis, that were visible to each respective individual student, but not visible to their peers. Individualization of the curriculum in math and reading allowed students who were achieving at levels several grades apart to work

The Cost-Effectiveness of Raising Teacher Quality    111

side by side in the same classroom. However, each student received separate reports of his or her individualized math and reading assessments on a daily basis. Those results provided objective confirmation to each student regarding his or her progress. Since book selections and math assignments were individualized for each student’s achievement level, the assessments were invariably positive. Thus, students received a steady dose of positive, objective feedback, reinforcing the students’ beliefs that they were competent learners who were able to master new skills and knowledge every day. In this way, the experience of schooling was restructured to meet the psychological need to feel competent. As discussed in chapter 1, a student’s perceived control over his or her academic performance is strongly predictive of academic achievement (Brookover et al., 1979; Brookover et al., 1978; Coleman et al., 1966; Crandall et al., 1965; Kalechstein & Nowicki, 1997; Keith et al., 1986; Skinner et al., 1990; Teddlie & Stringfield, 1993). To the extent that rapid formative assessment enables teachers to provide individualized instruction, keeping each student in his or her own zone of proximal development, students are more likely to be successful and feel that they can control their performance. Feelings of control reinforce effort, which improves achievement, which reinforces feelings of control and engagement. This research suggests that rapid formative assessment targets a fundamental human need for performance feedback, and might explain why students who do not receive performance feedback are disengaged in school, yet are highly engaged when they are playing electronic games that provide rapid feedback regarding performance. The current structure of schooling fails to provide the supportive technology that would allow teachers to routinely tailor instruction to each student and to systematically provide objective feedback to students so that they can control their academic performance. While some critics might argue that this goal is best achieved through a combination of approaches, including improved teacher screening and reward systems, as well as rapid formative assessment systems, the reality is that social resources are scarce. It is not feasible to implement all possible options simultaneously. The cost-effectiveness analyses in chapters 2, 3, 4, 5, 6, and 7 suggest that, in regard to student achievement in math and reading, rapid formative assessment provides greater impact per dollar. As a practical matter, the implementation of rapid formative assessment instead of a less cost-effective approach would help to achieve the important goal of raising math and reading achievement while using fewer resources. The savings in resources may then be used to achieve other important educational goals— those that are not well-addressed through rapid formative assessment.

This page intentionally left blank.

8 The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

T

his chapter synthesizes the results of the previous 7 chapters and integrates original analyses and existing literature regarding the cost-effectiveness of 22 strategies for raising student achievement. The context for this chapter is the pressure on educators to improve student achievement. Under the federal No Child Left Behind Act, schools that are not making adequate yearly progress in raising student achievement in math and reading are subject to a progressive series of sanctions. As a consequence, policymakers have a pressing need for information about the most cost-effective approaches for raising math and reading achievement. The alternatives being considered are diverse, ranging from narrow achievement-focused interventions to complex reforms that aim to achieve multiple outcomes, in addition to increases in math and reading achievement. Cost-effectiveness analysis is appropriate when policymakers are primarily concerned with the goal of reducing the number of students who perform poorly on measures of math and reading achievement. However, a

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 113–140 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved. 113

114    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

concern that arises is that the outcomes that are not measured are not valued. There are two main strategies for addressing this issue. The first strategy is to substitute an approach such as cost-benefit analysis, which is well-suited to handle comparisons involving multiple outcomes. However, comparisons across each of the major approaches for raising student achievement would only be feasible if cost-benefit analyses had been conducted for each approach. This information is not currently available. Therefore, I pursued an alternative strategy. This strategy aims to identify the most cost-effective approach for raising math and reading achievement, with the expectation that this will provide the information necessary to conserve and maximize social resources—including instructional time—which would become available for the pursuit of important goals (such as proficiency in art, music, and science) that are not measured by standardized tests of math and reading. In contrast, the adoption of an approach that is inefficient would reduce the resources available to address these other important objectives. While previous reviews found a very small corpus of cost-effectiveness studies in education (Hummel-Rossi & Ashdown, 2002; Levin, 1991; Monk & King, 1993), cost-effectiveness techniques have now been applied to evaluate the relative cost-effectiveness of 11 promising approaches for raising student achievement: voucher programs, charter schools, a 10 percent increase in per-pupil expenditure, increased educational accountability, comprehensive school reform, class size reduction, high-quality preschool, teacher certification by the National Board for Professional Teaching Standards, a proposal to use value-added teacher assessment to identify and replace the bottom quartile of novice teachers with new teachers, raising teacher quality by requiring new candidates to meet a minimum SAT score of 1000, and rapid formative assessment, involving systems where student performance in math and reading is assessed 2–5 times per week. This review compares the cost-effectiveness of these approaches for raising student achievement, along with brief comparisons of 11 additional approaches for which cost-effectiveness information is available. While the corpus of cost-effectiveness studies has expanded in recent years, it remains limited, and this review reflects those limitations. First, the dearth of studies prior to the author’s recent publication of multiple costeffectiveness studies means that the available corpus of studies is weighted toward comparisons conducted by the author. A second limitation is more fundamental. The comparison of 22 approaches for raising student achievement is necessarily limited to standardized measures that are available in published reports. Almost all studies focus on math and reading achievement. While approaches such as class size reduction and comprehensive school reform aim to improve achieve-

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    115

ment in subjects beyond math and reading, it is not feasible to compare the 22 approaches with regard to other subject areas because standardized measures of performance in those areas were not administered across the 22 approaches. It is not accidental that most studies focus on the results of standardized tests in math and reading. It is well-established that basic skills in math and reading predict educational attainment and earnings (Currie & Thomas, 2001; Murnane et al., 1995; Neal & Johnson, 1996; O’Neill, 1990; Winship & Korenman, 1999). Perhaps the most sophisticated study, by Winship and Korenman (1999), used a nationally representative longitudinal sample, accounted for the reciprocal effects of cognitive skills and schooling, and found that a 1 standard deviation difference in basic math and reading skills is ultimately associated with a 35.8 percent difference in annual earnings, controlling for a host of covariates including ability. Given the importance of math and reading achievement, it is useful to conduct a cost-effectiveness analysis that focuses on these outcomes so that resources devoted to math and reading achievement may be allocated efficiently, maximizing the availability of resources to address subjects beyond math and reading. A third limitation of the present review is that published studies may differ with regard to the age, ethnicity, or level of poverty of the populations that were studied, or other factors influencing achievement. These differences make comparisons problematic. The results of the present review are necessarily limited to statements regarding the particular variations of the interventions that were studied, with the specific samples and under the specific conditions that were studied. Thus, the interpretation of the results of the present review should be tempered by an understanding of the inherent limitations and complexity of cost-effectiveness analysis. Ultimately, further research is needed to explore whether the conclusions presented in this review are stable across age, ethnicity, level of poverty, and variations in the implementation of each intervention. The purpose of the present review is to examine the existing corpus of educational cost-effectiveness studies, adjust all costs to the same base period (August 2006), critically analyze deficiencies in existing studies, synthesize what is currently known about the cost-effectiveness of alternative approaches for raising student achievement, and permit readers to compare the cost-effectiveness of every approach for which information is available, within the limits specified in the review. The Discussion section reviews important limitations of this study, then proposes an explanation of the results in terms of existing literature regarding the psychology of learning and achievement.

116    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Search Protocol Multiple searches were conducted using the Education Resources Information Clearinghouse (ERIC), Web of Science, Academic Search Premier, Education Fulltext, PolicyFile and Econlit databases, as well as GoogleScholar, limited to the years 1999 through 2008. First, searches were conducted using the term cost-effectiveness in combination with student achievement. Second, successive searches were conducted using the term cost-effectiveness in combination with each individual term from the following list: voucher program, charter school, economics of education, educational production, educational expenditure, educational accountability, comprehensive school reform, class size reduction, preschool, Perry, Abecedarian, formative assessment, progress monitoring, curriculum-based assessment, curriculum-based measurement, National Board, teacher evaluation, and teacher assessment. Third, these searches were repeated using the term cost per student in lieu of cost-effectiveness. A small number of older reports were located through citations in the set of more recent studies. A few benefit–cost studies provided the information necessary to calculate effectiveness–cost ratios and were included. Reports were excluded if (a) they omitted the information necessary to calculate effectiveness–cost ratios (annualized student achievement effect size divided by annual cost per student), (b) effect-size information was calculated from uncontrolled or weakly controlled research designs, (c) cost information lacked detail or was collected for a single school, (d) they did not involve a measure related to math or reading achievement with regard to American students in grades PK–12, (e) they were limited to disabled or special-education populations, or (f) they merely summarized results from a previously published cost-effectiveness study. Reports were limited to studies involving American populations because comparisons of effectiveness– cost ratios assume comparability of underlying population variances as well as comparability of cost estimates. The characteristics of American students and the American educational system are sufficiently different from their European and Asian counterparts to suggest that the population variances underlying the effect-size estimates may not be comparable across cultures. In addition, differences in purchasing power create difficulties in comparing costs internationally. Only a limited number of reports included original cost-effectiveness analyses meeting the search criteria, reflecting the general lack of attention to cost-effectiveness methods in education and the need for additional studies. The current review focuses on the most recent cost-effectiveness

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    117

studies, with particular attention to key methodological and analytical issues that have been overlooked by previous researchers. Some interventions and the cost to implement them can be described in simple terms without confusion, while other interventions raise thorny issues that require explication. Thus, less space is devoted to older studies and studies that require less explanation.

Method All cost-effectiveness studies meeting the search protocol were included in the analysis. Each study was reviewed, with special attention to methodological and analytical weaknesses and errors in accounting for the full costs of each intervention, including social opportunity costs. All effect sizes were annualized. All costs were annualized and adjusted to the same base period (August 2006) using the consumer price index. Effectiveness–cost ratios were calculated (annualized effect size divided by annual cost per student). Interventions were ranked according to effectiveness–cost ratios, using upper-bound estimates for all interventions other than rapid assessment (to provide conservative estimates of the difference in efficiency between rapid assessment and the alternatives).

Results The results for each intervention are presented in the order of cost-effectiveness. Table 8.1 lists the annualized effect size, annual cost per student, and effectiveness–cost ratio (effect size divided by cost per student) for each intervention. All cost figures have been adjusted for inflation to the same base period (August 2006), using the consumer price index. This adjustment is necessary to minimize the effect of inflation on the effectiveness– cost ratios in Table 8.1.

Rapid Assessment The most cost-effective approach was rapid assessment, with four effectiveness–cost ratios ranging from 0.017152 to 0.028571, based on two randomized evaluations of the impact of the Reading Assessment program on reading achievement (Nunnery et al., 2006; Ross et al., 2004), one randomized and one national quasi-experimental evaluation of the impact of the Math Assessment program on math achievement (Ysseldyke & Bolt, 2007; Ysseldyke & Tardrew, 2007), and a cost-effectiveness analysis that integrated

118    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement Table 8.1  Comparison of Effect Sizes, Costs, and Effectiveness–Cost Ratios for Various Interventions to Raise Student Achievement Effectiveness–Cost Ratioc

Effect Size (SD)a Reading Rapid Assessment   (high estimates) Rapid Assessment   (low estimates) Comprehensive school reform Cross-age tutoring Computer-assisted instruction Longer school day Teacher education Teacher experience Teacher salary Summer school Rigorous math classes Value-added teacher assessment Class size reduction   Nye et al. (2001)   Finn et al. (2001) 10 percent increase in spending Full-day kindergarten Head Start High-standards exit exam NBPTS teacher certification Higher licensure test scores Perry Preschool Abecedarian Preschool Additional school year Voucher programs Charter schools Notes: See Appendix C.

Math

Costb

Reading 0.028571

0.510

0.324i 0.510j

$9.45f $18.89g $9.45f $18.89g $217.83

0.002341k

0.017152 0.002341k

0.480 0.230

0.970 0.120

$555.61 $238.12

0.000864 0.000966

0.001746 0.000504

0.070 0.220 0.180 0.160 0.190 –– 0.057

0.030 0.220 0.180 0.160 0.190 0.200 0.057

$159.87 $702.62 $702.62 $702.62 $1515.00 $1,911.17 $624.72

0.000438 0.000313 0.000256 0.000228 0.000125 –– 0.000091

0.000188 0.000313 0.000256 0.000228 0.000125 0.000105 0.000091

0.104 l 0.120n 0.083o

0.090l 0.129n 0.083o

$1,379.28m $1,379.28m $1118.83p

0.000075 0.000087 0.000075

0.000065 0.000094 0.000075

0.181 0.324 0.051q

0.181 0.165 –0.062q

$2611.00 $9,000.00 $2025.97r

0.000069 0.000036 0.000025

0.000069 0.000018 –0.000031

0.002

0.004

$326.53

0.000006

0.000012

0.004

0.015

$894.10

0.000004

0.000017

0.150s 0.150u

0.155s 0.054u

$12,147.03t $10,188.09v

0.000012 0.000015

0.000013 0.000005

0.150 0.032w,x 0.009w,z

0.150 0.081w,x 0.001w,z

$14,271.76 $9,646.01w,y $8,086.30w,*

0.000011 0.000003 0.000001

0.000011 0.000008 0.000000

0.270d 0.392e 0.175h j

Math 0.020752

0.018519

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    119

the impact results with cost data (Yeh, 2008). It is useful to review the salient characteristics of this approach: Reading Assessment [is] a popular program designed to encourage students to read books at appropriate levels of difficulty while alerting teachers to learning difficulties and encouraging teachers to provide individualized tutoring or small group instruction. This is achieved through a system of frequently assessing each student’s reading comprehension and monitoring each student’s reading level. First, books in the school’s library are labeled and shelved according to reading level. Second, students select books to read based on their interests and their reading levels, according to the results of the STAR Reading test, a norm-referenced computer-adaptive test (Renaissance Learning, no date). This helps students to avoid the frustrating experience of choosing a book that is too difficult. After finishing a book, the student takes a computer-based quiz, unique to the book, that is intended to monitor basic reading comprehension (Rapid Assessment Corporation has created more than 100,000 quizzes). Similarly, Math Assessment is a popular program that provides individualized, printed sets of math problems, a system of assessing student performance on those problems, and a scoring system where students and teachers receive rapid, frequent feedback on student performance upon completion of every set of problems. (Yeh, 2007, p. 417)

In reading, multiple-choice quiz items test student knowledge and comprehension of each book’s characters and plot (for fiction books) or declarative knowledge (for nonfiction books). Math problems range from simple arithmetic items suitable for 1st-grade students to calculus items for more-advanced students. Students use a classroom device to scan bubble answer sheets, which are scored by the classroom computer (which explains how the process works while minimizing the burden on teachers). Rapid assessment is not intended to minimize the role of the teacher in providing comments about strategies for improvement. Instead, rapid assessment utilizes the advantages of technology to relieve teachers of the burden of providing low-level corrective feedback, facilitating the ability of teachers to focus on individual tutoring and to provide what Hattie and Timperly (2007) refer to as Level 2 and Level 3 feedback—comments by teachers regarding strategies for improvement, both specific process-oriented strategies and general self-regulatory strategies, and assessment of higher-order thinking skills. Since research literature indicates that tutoring is more effective than recitation, with an average effect size of 0.40 SD (Cohen et al., 1982), placing teachers in the role of tutors might make them more effective teachers.

120    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Program activities do not displace standard reading and math activities. During designated periods of the day devoted to reading and math, the majority of students read books selected from the school library or work on printed sets of math problems (Yeh, 2006a). Students who complete a book sit at the classroom computer to take a brief comprehension quiz. Students who complete a set of math problems scan their bubble sheets. Teachers typically tutor individual students or small groups of students. No additional time is allocated to reading or math instruction beyond standard 60-minute daily periods of reading and math instruction, nor is that time used in a way that is much different than standard reading and math learning activities. The primary difference is that books are selected according to each student’s reading level, math problems are assigned according to each student’s math level, and students and teachers are able to quickly diagnose areas where students are having difficulty (Yeh, 2008). A detailed analysis of the operation of the program, based on interviews with teachers, administrators, and students, suggests that individualization of book selections and math assignments for each student’s achievement level creates a task structure where task difficulty is individualized (Yeh, 2006a). In combination with individualization of performance expectations and rapid, test-based performance feedback, it appears that these characteristics promote a steady dose of positive, objective feedback to students, reinforcing student beliefs that they are competent learners who are able to master new skills and knowledge every day, and restructuring the experience of schooling to meet students’ psychological need to feel competent, promoting engagement and commitment to work effort (Yeh, 2006a). The cost analysis assumed the purchase of either Reading or Math Assessment software and a diagnostic assessment for each student receiving the rapid assessment intervention, plus mark scan devices for each math classroom. Initial costs were averaged over an enrollment of 500 students in 25 classrooms per building. Initial fees, teacher and administrator training, and the cost of the scanners were amortized over 7 years, which was (arbitrarily) assumed to be the life of the program. For reading, the costs included access to 100,000 book quizzes for every student. For math, the costs included access to Math Assessment grade-level libraries tagged to state standards for grades 1 through 7, as well as multiple-subject-area libraries for the secondary grades (pre-algebra, algebra 1, algebra 2, geometry, probability and statistics, precalculus, calculus, basic math, chemistry, physics). The cost analysis assumed that every classroom teacher and one administrator (for every 500 students) underwent a full-day training session for Math Assessment and a full-day training session for Reading Assessment. In addition, the cost analysis assumed a 50 percent teacher turnover rate during the

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    121

7-year implementation period and assumed that each new teacher received a full-day training session for Math Assessment and a full-day training session for Reading Assessment. Sensitivity analyses suggested that purchasing a computer and printer for every classroom would not change the relative cost-effectiveness ranking of rapid assessment with regard to voucher programs, charter schools, a 10 percent increase in per-pupil expenditure, increased educational accountability, comprehensive school reform, class size reduction, or high-quality preschool (Yeh, 2007, 2008). Teacher interviews and classroom observations suggested that the use of rapid assessment technology does not require skilled, highly trained teachers, nor does it supplant normal reading and math activities (Yeh, 2006a). The burden on teachers is minimal because assessments are selfadministered by students and scoring and reporting is handled by computer software. The burden is offset by the time saved due to the program’s scoring and student-progress monitoring features, which replace the timeconsuming conventional tasks of grading math homework and assessing reading comprehension (Yeh, 2006a). The annual cost per student in 2006 dollars is $9.45 in reading and $18.89 in math, adjusted for the opportunity costs of teacher training time ($3.02 per student) and adjusted for the opportunity costs created by large upfront fixed costs (Yeh, 2008).

Comprehensive School Reform The second most cost-effective approach was comprehensive school reform, based on a meta-analysis of effect sizes for 29 school reform models by Borman, Hewes, Overman and Brown (2003), cost data from a range of sources (Herman et al., 1999; King, 1994; Odden, 2000) and a cost-effectiveness analysis that integrated the impact and cost data (Yeh, 2008). Of the group of school reform models categorized by Borman et al. (2003) as having the Strongest or Highly Promising Evidence of Effectiveness, the model with the highest effect size, and highest effectiveness–cost ratio, was Expeditionary Learning Outward Bound (effect size = 0.51 SD, effectiveness–cost ratio = 0.002341, see Yeh, 2008). None of the other 28 reform models had an effectiveness–cost ratio larger than the lowest ratio for rapid assessment (0.017152, see Yeh, 2008). The results suggest that rapid assessment is roughly an order of magnitude more effective per dollar compared with the most cost-effective CSR model. However, the true advantage for rapid assessment is likely to be substantially larger than indicated by the ratios in Table 8.1. Given Borman et al.’s (2003) conclusion that effect size estimates for reform models other than Success for All, the School Development Model, and Direct

122    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Instruction are unreliable, the true upper bound for the school reform effectiveness–cost ratios is likely to be closer to the maximum ratio calculated for the three models for which there is reliable evidence (Yeh, 2008, p. 21): Rapid assessment is a minimum of 59 times as cost effective as Direct Instruction, the most cost-effective of the three models, suggesting that the true advantage of rapid assessment may be 59 times as large as the cost-effectiveness of CSR. Regardless, the cost-effectiveness analysis suggests that CSR is not an efficient approach for improving student achievement. This lack of efficiency may explain previous research findings suggesting that enthusiasm for CSR wanes over time and leads to teacher burnout, conflict, disengagement and exhaustion. (Little & Bartlett, 2002)

The advantage of rapid assessment is even greater if the cost-effectiveness of comprehensive school reform is based on Borman and Hewes’s (2002) alternative estimate that Success for All produces “sustained” effect sizes of 0.29 SD in reading and 0.11 in math (over a 3.84-year period of intervention) at a total cost per student of $3,689.04, suggesting small effectiveness–cost ratios of 0.000079 in reading and 0.000030 in math, after adjusting for inflation.

Cross-Age Tutoring Levin et al. (1987) estimated that cross-age tutoring (tutoring of younger students by older peers) would improve student achievement by 0.97 SD in math and 0.48 SD in reading, at an annual cost of $555.61 per student, in 2006 dollars, suggesting effectiveness–cost ratios of 0.001746 in math and 0.000864 in reading.

Computer-Assisted Instruction Computer-assisted instruction is estimated to improve student achievement by 0.12 SD in math and 0.23 SD in reading, at an annual cost of $238.12 per student (in 2006 dollars), suggesting an effectiveness–cost ratio of 0.000504 in math and 0.000966 in reading (Levin et al., 1987).

Longer School Day Lengthening the school day by 60 minutes is estimated to improve student achievement by 0.03 SD in math and 0.07 SD in reading, at an annual cost of $159.87 per student (in 2006 dollars), suggesting an effectiveness–cost ratio of 0.000188 in math and 0.000438 in reading (Levin et al., 1987).

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    123

Teacher Education Greenwald et al. (1996), based on meta-analytic results, estimated that substitution of teachers with a master’s degree for teachers with a bachelor’s degree is associated with an increase in student achievement of 0.22 SD, at a cost per pupil of $702.62, in 2006 dollars, resulting in an effectiveness–cost ratio of 0.000313.

Teacher Experience Greenwald et al. (1996), based on meta-analytic results, estimated that teachers with more experience are associated with an increase in student achievement of 0.18 SD at a cost per pupil of $702.62, in 2006 dollars, resulting in an effectiveness–cost ratio of 0.000256

Teacher Salary Greenwald et al. (1996), based on meta-analytic results, estimated that increases in teacher salaries are associated with an increase in student achievement of 0.16 SD, at a cost per pupil of $702.62, in 2006 dollars, resulting in an effectiveness–cost ratio of 0.000228.

Summer School Belfield (2006) drew upon meta-analytic results of summer school programs estimating a median effect size of 0.19 SD (Cooper, Charlton, Valentine, & Muhlenbruck, 2000), and per-pupil costs estimated at a total of $1,515 (Borman & Dowling, 2006), suggesting an effectiveness–cost ratio of 0.000125.

More-Rigorous Math Courses Students who take a couple of advanced math courses score approximately 0.20 SD higher on math tests (Mayer & Peterson, 1999). Assuming that student achievement can be boosted at no cost, allowing more students to enter and complete advanced math courses, and assuming that it is possible to elicit the required increase in the supply of advanced math teachers through a targeted increase of 33 percent in the salaries of these teachers, the annual cost would be $1,911.17 per student, in 2006 dollars (Mayer & Peterson, 1999). Both of these assumptions are probably unrealistic, suggesting that the effectiveness–cost ratio of 0.000105 is an upper-bound estimate.

124    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Value-Added Teacher Assessment Yeh and Ritter (2009) examined the effects of Gordon et al.’s (2006) proposal to improve the quality of the teaching force by identifying and replacing the bottom quartile of novice teachers, equal to 68,299 of the estimated 3,161,000 elementary and secondary public school teachers in 2006 (see National Center for Education Statistics, 2005e), using value-added assessments of teacher performance. Their proposal is one of the featured proposals by the Hamilton Project at the Brookings Institution and, thus, has attracted attention as perhaps the most promising strategy for implementing a value-added teacher-assessment system. The significance of Gordon et al.’s (2006) proposal is that it converts the general strategy of valueadded teacher assessment into a concrete proposal that can be evaluated. It is currently the only proposal of this type that adequately addresses the fact that most teachers are tenured and, therefore, cannot be replaced without adequate cause. However, Gordon et al.’s (2006) data imply a small effect size of only 0.057 SD. Gordon et al. (2006) propose to replace bottom-quartile rejected teachers with individuals who would be hired through alternative teacher-certification programs. These programs primarily serve individuals who have accumulated substantial work experience in occupations other than teaching but wish to change occupations and enter teaching. Alternative certification typically allows these individuals to enter teaching relatively quickly under the supervision of mentors and while electing courses in curriculum and instructional methods. Expansion of alternative certification programs to accommodate the additional demand for new teachers created by Gordon et al.’s (2006) proposal would generate substantial costs. The total annual cost of implementing the proposal includes costs to the chain of sending employers and replacement employees triggered by the proposal; costs to the hiring school district, the new teacher, the terminated teacher, and other costs to society; adjustments for the gain in output (a benefit) of the terminated teacher once he or she has been retrained and has started a new job; psychic gains to the new teacher and psychic gains to the individual who voluntarily replaces the replacement teacher; the cost of alternative certification; the cost to raise salaries for all teachers that is necessary to recruit replacements for the bottom quartile of novice teachers; and the cost of the assessments. The total annual cost per student is $624.72 (Yeh & Ritter, 2009). If the effect size of Gordon et al.’s (2006) proposal is 0.057 SD per year, the effectiveness–cost ratio is 0.000091.

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    125

Class Size Reduction Yeh (2009a) integrated Reichardt’s (2000) cost analysis with estimates of student-achievement effect sizes from the Tennessee STAR experiment by Finn et al. (2001) and Nye et al. (2001) to estimate the cost-effectiveness of class size reduction. Perhaps the most optimistic estimate, by Finn et al. (2001), employed hierarchical linear modeling, controlled for an array of covariates, and isolated the impact of class size reduction according to duration of exposure. By the end of 2nd grade, the achievement of students randomly assigned to classrooms of 13–17 students increased relative to students assigned to classrooms with 22–26 students, with annualized effect sizes equal to 0.120 SD in reading and 0.129 SD in math. The annual inflation-adjusted cost per student to reduce class size from 24 to a ceiling of 17 students per class is $1,379.28 (Reichardt, 2000), resulting in effectiveness–cost ratios ranging from 0.000065 to 0.000094 (Yeh, 2009a). Other estimates of the cost-effectiveness of class size reduction are limited by relatively unsophisticated cost calculations. Most alternative estimates are smaller than Yeh’s (2009a) estimates. Aos, Miller, and Mayfield (2007) analyzed 38 recent, methodologically rigorous evaluations of class size reduction and estimated that each 1-student reduction in class size increases student achievement by 0.019 SD in grades K–2, and 0.007 SD in grades 3–6, or an average of 0.013 SD. This study found that each 1-student reduction in class size would cost $217 per student per year, suggesting an effectiveness–cost ratio of 0.000060. Krueger (2003) drew upon existing estimates that class size reduction improved student achievement by 0.20 SD in Tennessee, at a cost of $4,428.37 in 2006 dollars, suggesting an effectiveness–cost ratio of 0.000045. Borman and Hewes (2002) estimated effectiveness–cost ratios equivalent to 0.000038 in reading and 0.000052 in math after adjusting for inflation, based on effect-size estimates from Krueger and Whitmore’s (2002) analysis of the Tennessee STAR data and “total” cost information from Levin et al. (1987). In addition, Borman and Hewes (2002) reported a second set of effectiveness–cost ratios, accepting Levin et al.’s (1987) assumption that class size reduction improves student achievement in subjects other than math and reading, as well as the assumption that the cost of class size reduction is divisible across those subjects. Mayer and Peterson (1999) estimated that class size reduction of one third (from 24 to 16 students) would improve student achievement by 0.20 SD over 2 years, or 0.10 SD per year, at an annual cost of $1,551.28 per student, in 2006 dollars, suggesting an effectiveness–cost ratio of 0.000064. Greenwald et al. (1996), based on meta-analytic results, estimated that decreases in pupil/teacher ratios are associated with an in-

126    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

crease in student achievement of 0.04 SD at a cost per pupil of $702.62, in 2006 dollars, resulting in an effectiveness–cost ratio of 0.000057. Three estimates of the cost-effectiveness of class size reduction are somewhat larger than Yeh’s (2009a) estimates, primarily due to cost estimates that are far below the estimates by Aos et al. (2007), Krueger (2003), Reichardt (2000), and Mayer and Peterson (1999). Grissmer (2002), based on the Tennessee STAR data, estimated that reducing class size from 24 to 16 students would improve student achievement for each cohort of students by 0.30 SD over 4 years, or 0.075 SD per year, at an annual cost of $531.49 per student, in 2006 dollars, suggesting an effectiveness–cost ratio of 0.000141. Grissmer (2002) elected to shrink the $531.49 per-pupil cost estimate by a factor of 4/13, arguing that the intervention is implemented in grades K–3 and, thus, the annual per-pupil cost should be deflated to reflect implementation in 4 of 13 grade levels. However, the relevant metric is the annual cost per pupil in the grades where the intervention is delivered, not the cost spread over every pupil from kindergarten through grade 12. Thus, Grissmer’s (2002) shrunken cost estimate is analytically inconsistent with his impact estimate and is underestimated by a factor of 4/13. Grissmer (2002) also estimated the cost-effectiveness of adding teacher aides to regular-sized classrooms, which was nearly three times lower (than his overestimated figure for the cost-effectiveness of class size reduction). Grissmer et al. (2000), based on the Tennessee STAR data, estimated that decreases in pupil-teacher ratios are associated with an increase in student achievement of 0.10 SD at a cost per pupil of $281.05, in 2006 dollars, resulting in an effectiveness–cost ratio of 0.000356. Levin et al. (1987) estimated that class size reduction from 25 to 20 students would improve student achievement by 0.09 SD in math and 0.05 SD in reading, at an annual cost of $736.45 per student, in 2006 dollars, suggesting effectiveness–cost ratios of 0.000122 in math and 0.000068 in reading. Levin et al. (1987) elected to reduce their cost estimate by an additional two thirds under the assumption that class size reduction improves student achievement in subjects other than math and reading.

Ten Percent Increase in Spending Yeh (2007) estimated a maximum effectiveness–cost ratio of 0.000075 for a 10 percent increase in existing patterns of expenditure per pupil (equal to $1,118.83), based on Greenwald et al.’s (1996) relatively optimistic meta-analytic estimate that this would increase student achievement in math and reading by 0.083 SD per year. In contrast, Hanushek (1986, 1989,

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    127

1997) estimated that the relationship between educational spending and student achievement is much weaker. Yeh’s (2007) results suggest that the achievement gains per dollar from rapid assessment are more than two orders of magnitude larger than the gains that accrue through a 10 percent increase in existing patterns of educational expenditures. The results are consistent with alternative estimates. Grissmer (2002) estimated that a comparable increase in per pupil expenditure of $1,207.94 (in 2006 dollars) would increase student achievement by 0.042 to 0.098 SD, or an average of 0.07 SD, suggesting an effectiveness–cost ratio of 0.000058. Wenglinsky (1997) argued that the results of previous meta-analyses (Greenwald et al., 1996; Hanushek, 1989; Hedges et al., 1994) were inconclusive because they relied on studies that did not involve nationally representative populations, and failed to properly model the paths by which particular types of expenditure may influence student achievement. Using path modeling with nationally representative NAEP data, Wenglinsky (1997) estimated that a $1,476.47 increase in instructional expenditures (in 2006 dollars) would reduce pupil-teacher ratios, increasing math achievement by 0.04 SD, suggesting an effectiveness–cost ratio of 0.000027.

Full-Day Kindergarten Aos et al. (2007) analyzed 23 methodologically rigorous evaluations of full-day kindergarten compared with half-day kindergarten and estimated that student achievement would increase by an average of 0.181 SD. However, the gains disappeared by the end of 1st grade. Aos et al. (2007) calculated that the incremental cost of full-day kindergarten compared with halfday kindergarten is $2,611 per student per year. If achievement increases by 0.181 SD, the effectiveness–cost ratio is 0.000069.

Head Start Ludwig and Phillips (2008) drew upon results from the first randomized study of the Head Start preschool program, finding an average effect size of 0.324 SD regarding letter-naming and 0.165 SD regarding math. Ludwig and Phillips (2008) estimated a cost of $9,000 per child, suggesting corresponding effectiveness–cost ratios equal to 0.000036 and 0.000018.

High-Standards Exit Exam Increased educational accountability may be defined as the implementation of high-standards exit exams. A majority of states have now imple-

128    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

mented high school exit exams that students must pass to graduate from high school, affecting 70% of students nationwide (Gayler, et al., 2004) and, with few exceptions, requiring mastery of material at the high school level (Center on Education Policy, 2005). To calculate the cost-effectiveness of increased educational accountability, Yeh (2007) drew upon Hanushek and Raymond’s (2005) longitudinal panel study with controls for fixed-state effects as well as measures of parental education, school spending, and race. This study found that increased accountability is associated with an impact of 0.2 SD over 4 years, or 0.05 SD per year. Based on a study by the U.S. General Accounting Office (2003), the annual cost of administering accountability-related assessments was conservatively estimated to be $12.73 per student. However, unlike previous researchers (see Hoxby, 2002) who focused on the nominal cost of the assessments, Yeh (2007) included the cost to society of the incremental increase in the number of students who are denied diplomas solely because exit exams raise the graduation bar. To calculate these costs, Yeh (2007) drew upon Warren et al.’s (2006) estimate of the incremental increase in the fraction of students who were denied diplomas as a consequence of the implementation of exit exams that require mastery of high school level material. The achievement of 28 graduating classes between 1975 and 2002 was modeled based on data from each of the 50 states and the District of Columbia, yielding 1,428 state-years as the units of analysis for models estimating the effect of the implementation of exit exams. After adjusting for an array of covariates and controlling for state and year fixed effects, high school completion rates were 2.1 percentage points lower in states with this type of exit exam, implying that an extra 89,908 students would be denied diplomas annually (0.021 X national public student 9thgrade enrollment of 4,281,345 in academic year 2004–2005 [U.S. Department of Education, 2005b]). To calculate the economic value of the lost diplomas, Yeh (2007) drew upon Flores-Lagunes and Light’s (2004) estimates of the “sheepskin” (signaling) effect, as well as Census Bureau estimates of lifetime earnings by level of education (Day & Newburger, 2002), to calculate the discounted present value of the lifetime difference in income (a measure of the economic output lost to society) between an individual who graduates from high school (but completes no further schooling) and an individual who does not graduate but is identical with regard to years of schooling, actual work experience, and age. Yeh (2007) estimated that the cost per pupil served is $185.01. This figure is underestimated. After adjustments to reflect Flores-Lagunes and Light’s (2007) latest estimates, and basing the per-pupil cost

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    129

on national public student 9th-grade enrollment of 4,281,345 in academic year 2004–2005 (U.S. Department of Education, 2005b), the cost per pupil served is $2,013.24 to raise the graduation bar. This figure is considerably higher than Yeh’s (2007) original estimate, based on total PK–12 enrollment of 48,359,697. The total cost of increased educational accountability is the cost of the assessments ($12.73), plus the discounted present value of the cost to society of lost economic output as a result of the shift toward standards-based exit exam requirements ($2,013.24), or $2,025.97. Using the adjusted 0.05 SD effect-size estimate derived from Hanushek and Raymond (2005), the effectiveness–cost ratio is 0.000025, implying that the achievement gains per dollar through rapid assessment are 686 times larger than the gains through accountability testing. A more recent analysis suggests that this estimate of the cost-effectiveness of educational accountability is overstated. Grodsky et al. (2009) analyzed longitudinal NAEP data with controls for state fixed effects as well as gender, race, parental education, grade level, a measure of “home resources,” age, and difficulty of each state’s exit exam. This study, which provides the best available evidence for the impact of accountability policies, found no significant effect of exit exams on student achievement. While the results are not statistically significant, they imply mixed effect sizes of 0.051 SD in reading and negative 0.062 SD in math, in comparison with Hanushek and Raymond’s (2005) positive 0.05 SD effect-size estimate. Using the 0.051 SD reading achievement effect-size estimate from Grodsky et al. (2009), the effectiveness–cost ratio is 0.000025 (and since Grodsky et al.’s, 2009, impact estimate with regard to math is negative, the effectiveness–cost ratio is a negative 0.000031 with regard to math achievement). Alternative estimates of the cost and cost-effectiveness of increased educational accountability are flawed. Mayer and Peterson (1999) estimated that external examinations during the senior year of high school increase 8th-grade math scores by 0.2 SD, at an annual cost of $248.20 per student (in 2006 dollars), suggesting an effectiveness–cost ratio of 0.000806. However, the effect-size estimate was based on studies that failed to control for fixed effects, while the cost estimate ignored the cost to society of the lost economic output due to exit examinations that deny diplomas to more students. Similarly, Hoxby (2002) ignored the costs of denying diplomas to students. She estimated that the annual cost per student of test-based accountability systems ranges from a low of $2.05 to a high of $38.90 across 25 states, in 2006 dollars, and concluded that test-based accountability is highly cost-effective. However, testing without consequences has little effect on student achievement (Hanushek & Raymond, 2005). Thus, the most

130    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

defensible approach is to combine effect-size estimates from studies of exit exams and other consequential accountability systems with cost estimates that include the full costs of consequential accountability, including the economic costs of raising the high school exit bar.

NBPTS Teacher Certification Hopes that teacher certification would reliably distinguish effective and ineffective teachers have foundered as evidence has accumulated that certification status is a weak predictor of teacher performance (Hanushek et al., 2005; Hanushek et al., 1999). Perhaps the most sophisticated and rigorous certification system is the process developed by the National Board for Professional Teaching Standards (NBPTS) in collaboration with the Educational Testing Service (ETS). This system is widely considered to be the standard for teacher certification systems, involving an examination on six 30-minute exercises, specific to the candidate’s chosen certificate area, at one of 300 NBPTS computer-based testing centers across the United States, and evaluation of portfolio evidence, including video recordings of teacher-student interactions. The NBPTS certification system offers perhaps the strongest test of the hypothesis that certification can distinguish between more and less effective teachers. Seven large-scale studies have now been conducted of NBPTS certification (Cavalluzzo, 2004; Clotfelter et al., 2006; Clotfelter et al., 2007b; Goldhaber & Anthony, 2007; Harris & Sass, 2007; Ladd et al., 2007; Sanders et al., 2005). These studies offer the necessary power to detect effects, if they exist, and controlled for student or school-fixed effects or used hierarchical linear modeling (HLM). However, the average estimate of the signaling effect of NBPTS certification, in comparison with teachers who were never certified, was only 0.002 SD in reading and 0.004 SD in math (Yeh, 2010a). Rice and Hall (2008) estimated that the full societal cost of NBPTS certification, including opportunity costs, averaged $25,665.37 (adjusted to 2006 dollars) per participant per year. However, while Rice and Hall (2008) assumed that the certification process averages 1 year, it actually takes 2  years (NBPTS, 2010). After adjusting for this difference, the total cost per participant is $51,330.74. Amortized over an average teaching career of 7.86 years as an NBPTS-certified teacher (after a minimum 3 years before becoming eligible to apply for NBPTS-certification and a certification process averaging 2 years), and averaged over 20 students per classroom, the annual cost of NBPTS certification per student is $326.53, resulting in effectiveness–cost ratios of 0.000006 in reading and 0.000012 in math.

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    131

Licensure Test Scores The use of licensure test scores to distinguish between strong and weak teachers is a less expensive approach to raising teacher quality, compared with NBPTS certification. A few early studies (Ehrenberg & Brewer, 1995; Ferguson, 1991; Ferguson & Ladd, 1996; Strauss & Sawyer, 1986) suggested that teacher performance on general tests of math or reading might be useful indicators of teacher effectiveness. Until recently, however, no study of teacher performance on tests of math and reading controlled for nonrandom teacher sorting or student fixed effects. These controls are important because recent studies demonstrate that positive correlations between the strength of teacher qualifications and student achievement observed in cross-sectional data are driven largely by sorting of teachers and students across schools and, to a lesser extent, within schools (Clotfelter et al., 2006, 2007b; Goldhaber 2007). Once these controls are implemented, the observed relationship between teacher performance on licensure tests and student achievement is greatly attenuated. Clotfelter et al. (2007) provide the best available econometric estimate of the effect of high-ability individuals on student achievement, where ability is measured through reading, writing, and math test scores. Clotfelter et al. (2007) employed statewide longitudinal data covering all North Carolina students in grades 3, 4, and 5 in years 1995–2004, matched student test scores to specific teachers and, by controlling for student fixed effects, controlled for observable as well as unobservable characteristics of students that might influence achievement. However, the effect of teacher ability, measured by licensure exams, was very small. Teachers who scored 1 SD above the average on teacher licensure exams boosted math achievement by 0.015 SD and reading achievement by 0.004 SD, relative to the average teacher (Clotfelter et al., 2007b, Table 7). The cost-effectiveness of this approach may be estimated by calculating the cost of setting a minimum SAT score of 1000, augmented with analyses to determine the sensitivity of the calculations to the range of plausible relationships between licensure and SAT test scores (Yeh, 2009b). Based on econometric analyses, Manski (1987) estimated that setting a minimum SAT score of 1000 would result in an average SAT score of 1130, which is one standard deviation (SD) above the average score of 905 achieved by a nationally representative sample of teachers (Angrist & Guryan, 2008; College Board, 2006). Of the 111,700 members of the NELS88 cohort who were college educated and working in the year 2000, 0.8 percent (894 individuals) chose the teaching profession. Assuming that the wage elasticity of the supply of teachers is 3 (at the upper end of estimates by Manski, 1987, and

132    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Ballou & Podgursky, 1995), and assuming a very high probability (.88) that a job applicant receives an offer, wages must increase by 44.65 percent in order to increase the annual supply of high-ability teacher applicants by an amount sufficient to replace the 54.1 percent of each cohort of new teachers who may be expected to score below 1000 on the SAT (.541 × 894 = .88 × .459 × 44.65 × .03 × 894). This figure is underestimated to the extent that the true wage elasticity is lower than 3, or the probability that a job applicant receives an offer is less than 88 percent, suggesting that a 44.65 percent increase in salary, equal to $17,882 per year, is a lower-bound estimate of the wage premium needed to attract sufficient teacher applicants scoring a minimum of 1000 on the SAT. The annual cost to society is equivalent to $894.10 per student, assuming 20 students per classroom, suggesting an effectiveness–cost ratio of 0.000017 in math and 0.000004 in reading.

High-Quality Preschool Roughly similar in cost-effectiveness was high-quality preschool, based on impact estimates from Schweinhart et al. (1993) and Ramey et al. (2000), cost estimates from Barnett (1992) and Barnett and Masse (2007), and a cost-effectiveness analysis that integrated the impact and cost data, resulting in four effectiveness–cost ratios ranging from 0.000005 to 0.000015 (Yeh, 2008). The reported effect sizes for children participating in Perry preschool were 0.150 SD in reading and 0.155 SD in math (annualized over a 2-year implementation period) at the end of 2nd grade (Schweinhart et al., 1993), but the annualized cost was $12,147.03 (Barnett, 1992), resulting in small effectiveness–cost ratios of 0.000012 in reading and 0.000013 in math (Yeh, 2008). Results for participants in Abecedarian preschool were roughly comparable with the results for participants in Perry preschool. Annualized effect sizes for 3rd-grade children were 0.150 in reading and 0.054 in math over a 5-year implementation period (Ramey et al., 2000), at an annual cost of $10,188.09 (Barnett & Masse, 2007) in 2006 dollars, adjusted for the value of formal and informal daycare services provided to the control group, resulting in effectiveness–cost ratios of 0.000015 in reading and 0.000005 in math (Yeh, 2008). These results suggest that rapid assessment is approximately three orders of magnitude more cost-effective than high-quality preschool with regard to student achievement.1 However, some studies of high-quality preschool detect reductions in crime (although studies of the Abecedarian program did not), and cost-effectiveness analyses that focus only on gains in student achievement ignore the benefits of reductions in crime (to the extent that those benefits are not the result of gains in student achievement).2

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    133

After adjusting for inflation, Borman and Hewes’ (2002) effectiveness–cost ratios for the Perry and Abecedarian programs are comparable, ranging from 0.000008 to 0.000027, based on effect-size estimates from Schweinhart et al. (1993) and Ramey et al. (2000), and cost information from Barnett (1992) and Developmental Center for Handicapped Persons (1987). In addition, Borman and Hewes (2002) also reported a second set of effectiveness–cost ratios, under the assumption that the cost of preschool is divisible into the portions responsible for increased student achievement, versus reductions in crime and other social benefits. As a practical matter, however, society incurs the full cost of preschool and cannot choose to incur the partial cost that is assumed in the second set of ratios. The full cost is relevant if policymakers are strictly concerned with selection of the most cost-effective approach for raising student achievement with a limited amount of dollars.

Additional School Year An additional year of schooling is estimated to improve student achievement by 0.10 to 0.20 SD, or an average of 0.15 SD, at an annual cost ranging from $6,205.11 to $22,338.41 (an average of $14,271.76) per student, in 2006 dollars, suggesting an effectiveness–cost ratio of 0.000011 (Mayer & Peterson, 1999).

Voucher Programs The cost-effectiveness of voucher programs was estimated based on effect sizes averaged across three random assignment or lottery-based evaluations (Greene et al., 1999; Howell et al., 2002; Rouse, 1998) and a cost-effectiveness analysis that integrated the impact results with cost estimates (Yeh, 2007). While previous cost estimates focused on the nominal costs of the vouchers themselves, or the costs of voucher record-keeping, monitoring, information and adjudication services, Yeh (2007) pointed out that significant costs to society would arise to the extent that voucher programs pull students at random from public school classrooms, making it difficult (if not impossible) for public schools and districts to reduce expenditures at the same rate that students use vouchers to switch to private schools: For example, in the study by Howell et al. (2002), very small numbers of students were offered vouchers in the baseline year: 1300 in New York, 809 in the District of Columbia, and 515 in Dayton, Ohio. If these 2624 students

134    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement were randomly pulled from the 1427 public schools in those three cities (1213 in New York, 165 in District of Columbia, and 49 in Dayton)(National Center for Education Statistics, 2000, 2001) and, within each school, were randomly pulled from an average of four grade levels, only 0.46 students per grade level, per school, used vouchers to transfer to private schools. (p. 425)

Yeh (2007) performed similar calculations suggesting that the same issue arises in each of the major voucher programs, and pointed out that this would not have allowed any of the public school building principals to eliminate teaching positions in response to the transfers, and there would have been no decrease in overhead, facilities, or administrative costs. At the same time, there are far fewer private schools and, thus, voucher programs funnel public school students into a small number of classrooms, which quickly fill up and require more buildings, teachers, and administrators, in proportion to the ratio of public to private school students (9 to 1; see National Center for Education Statistics, 2005b). If public school students enroll in private schools, the private school classrooms fill up nine times faster than public school classrooms are depleted. Thus, voucher programs typically create a need for private schools to add teachers, administrators and facilities, without corresponding reductions in public school costs. These inefficiencies increase the total costs of teaching the same number of students (both public and private). The issue remains even if voucher programs are adopted nationwide and all schools are privatized. The assumption underlying voucher programs is that competition improves student achievement. However, competitive pressure only exists if a significant proportion of students transfer from low-performing to higher-performing schools every year. The logic underlying voucher programs requires that they continue to draw students at random from the vast majority of lower-performing schools and funnel them into the top 10 percent of all schools, creating a need to build new schools and hire new teachers and administrators without compensating cost reductions in the sending schools. If at some point in the future this competitive pressure has raised student achievement to a satisfactory level, it may be reasonable to eliminate voucher programs and their associated costs, but the situation is then identical to the current situation where there is little or no competitive pressure (in areas where voucher programs are lacking), with presumably the same adverse effects on student achievement. The only way voucher programs can remain effective is if they continue to divert students into the top 10 percent of all schools. Yeh (2007) and others have suggested that the real social cost of educating large numbers of students in private schools (who are currently edu-

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    135

cated in public schools) is difficult to estimate for several reasons: private school tuition figures exclude costs that are offset by corporate and noncorporate subsidies (U.S. General Accounting Office, 2001), as well as the cost of services that would be required by many students (and, by law, are currently provided by public schools, but not private schools), including transportation, free and reduced-price meals, special education, vocational education, and services for students with disabilities and limited English proficiency (C. R. Belfield, 2006; Levin, 1998; Levin & Driver, 1997). While comprehensive cost data are not available for private schools, such data are available for charter schools (Nelson et al., 2003). Charter schools are subject to much of the same competitive cost pressures as private schools, resulting in teacher salaries that are substantially lower than traditional public school salaries (Nelson et al., 2003). For example, in Texas and Minnesota, charter school teacher salaries are more than $13,760 lower, and in Pennsylvania, more than $17,513 lower, in inflation-adjusted dollars, than comparable host-district teacher salaries (Nelson et al., 2003). Yeh (2007) combined charter school cost information with estimates of cost savings resulting from students switching to private schools, and cost estimates for transportation, record-keeping, information systems to monitor and assess voucher eligibility of both students and schools, and adjudication services that would be required by a voucher system, to obtain an inflation-adjusted final estimate of the per-pupil cost to society when typical public school students use vouchers to transfer to private schools and receive comparable services: $9,646.01. This figure may be checked against the per-pupil private school expenditure figure calculated from the Digest of Education Statistics. Expenditures by private elementary and secondary schools totaled $39,300,000,000 in 2003–2004 current dollars (U.S. Department of Education, 2007, Table 26). Dividing by 2003 total private elementary and secondary enrollment of 6,099,221 (U.S. Department of Education, 2007, Table 55), suggests perpupil expenditure of $6,443.45, or $7,094.05 in 2006 dollars. This figure ($7,094.05) is nearly identical to the figure Yeh (2007) estimated based on charter school expenditures ($7,098.88), before adding services that are provided by public schools but not private schools: food service ($323.00); special education ($1,082.55); transportation ($2,005.57); and compensatory, vocational, bilingual, and career and technology education ($793.83); plus services that would be required by a voucher program: voucher record-keeping and monitoring ($68.19), information services ($50.81), and adjudication services ($49.82). Although Yeh (2007) assumed that private school expenditures are lower than expenditures by their public school counterparts, Levin (1998)

136    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

obtained site-based expenditure and revenue data for Milwaukee public schools and their voucher-school counterparts and concluded that voucher schools in Milwaukee are receiving at least as much revenue per student as the public schools. This suggests that Yeh’s (2007) cost estimates for voucher programs are, if anything, underestimated. Levin (1998) estimated annual core per-pupil costs of $4,581.15, plus $1,980.89 to provide the full range of services provided by the public schools, as well as the record-keeping, monitoring, transportation, information and adjudication services that would be required by a voucher program, or a total of $6,562.04, in 2006 dollars. Levin (1998) finds Rouse’s (1998) estimates of the effectiveness of the Milwaukee voucher program to be most valid. She found an effect size of 0.09 SD in math and zero in reading. Dividing 0.09 SD by Levin’s cost figure of $6,562.04 results in an effectiveness–cost ratio of 0.000014, somewhat higher than Yeh’s (2007) maximum ratio of 0.000008. Mayer and Peterson (1999) estimated that voucher programs would improve student achievement by 0.20 SD at no cost. However, the assertion that voucher programs have no cost is untenable.

Charter Schools Yeh (2007) integrated cost data with impact results from existing evaluations of charter schools to estimate cost-effectiveness. While charter schools are heterogeneous, the available evidence fails to demonstrate that any particular type of charter school is especially effective. Thus, it is appropriate to ask whether charter school programs are, in general, an efficient use of public resources. Since no large-scale random-assignment studies have been completed, Yeh (2007) averaged effect sizes drawn from three large-scale impact evaluation studies that used panel data from Florida, North Carolina, and Texas and applied regression discontinuity methods to control for selection bias, maturation, and history effects (Bifulco & Ladd, 2006; Hanushek et al., 2007; Sass, 2006). The average effect size across these key studies of charter schools was a negative 0.088 SD in reading and a negative 0.116 SD in math. The effect of competition from charter schools on the performance of traditional public schools was inconsequential (Bifulco & Ladd, 2006; Sass, 2006). However, in all three studies, negative effects were largest in the first year of charter school operation and diminished over time. The average effect size for charter schools after 5 years in operation was 0.009 SD in reading and 0.001 SD in math.

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    137

Yeh’s (2007) cost-effectiveness analysis drew upon much of the same cost data and logic used in the analysis of voucher programs, except that costs specific to voucher programs (voucher record-keeping, monitoring, information and adjudication services) were dropped and the actual costs for student transportation incurred in Philadelphia were used, resulting in a figure of $8,086.30 per transfer student, including the public school cost savings attributable to students transferring to charter schools. As illustrated by the calculations regarding voucher programs, only a fraction of the costs incurred by charter schools when students transfer from regular schools to charter schools are offset through savings in the sending public schools and districts. The cost to hire charter school teachers will largely be a new cost with very little offset due to reduced need for teachers in the sending schools, which will rarely be able to reduce their own costs in proportion to the costs of establishing and maintaining the new charter schools. In essence, more facilities, administrators, and teachers are required to teach the same total number of students (Nelson, 1997). The inelastic response of cost savings with regard to enrollment declines is empirically well-established. Heinesen (2004) employed panel data to estimate a dynamic school expenditure demand model using the generalized method of moments (GMM) estimator, accounting for unobserved heterogeneity between municipalities and endogeneity of the lagged dependent variable as well as possible endogeneity of the explanatory variables. The use of panel data with a flexible dynamic model accounted for slow resource adjustments in response to declines in enrollment. Heinesen (2004) found that a 1 percent decline in enrollment was associated with a 0.8 percent increase in per pupil expenditures and a decrease in total school expenditures of only 0.2 percent. This translates to a 20 percent cost savings for each pupil who transfers out of a school district—comparable to the 17.38 percent cost savings assumed in the preceding analyses of charter school and voucher program costs (see chapter 2). Similarly, Cibulka (1983) examined enrollment and expenditures in 10 large urban school districts—Atlanta, Baltimore, Boston, Chicago, Dallas, Los Angeles, Milwaukee, New Orleans, New York, and Seattle—and concluded that enrollment declines were not consistently associated with cost reductions: “there is no clear link between enrollment loss and degree of expenditure decline” (Cibulka, 1983, pp. 70–71). These results suggest that it may be difficult for public schools to reduce expenditures when enrollment declines as a result of students transferring to charter schools or private schools.

138    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Discussion The results suggest that rapid assessment is significantly more cost-effective than any of the 21 alternative approaches for raising student achievement listed in Table 8.1. However, this conclusion must be tempered by limitations in the available data. First, the comparison was limited to measures of math and reading achievement. While these measures are strongly related to future earnings (Currie & Thomas, 2001; Murnane et al., 1995; Neal & Johnson, 1996; O’Neill, 1990; Winship & Korenman, 1999), suggesting that achievement in these areas is important, the results should not be interpreted to mean that all spending on education should be directed toward rapid assessment. Instead, the results suggest that in schools where student achievement in math and reading is low, it may be cost-effective to reallocate a portion of funding to rapid assessment. For example, the results suggest that a school system that is currently using voucher programs that cost an average of $9,646.01 per student annually could reallocate $28.34 of this to rapid assessment programs, improve achievement in math and reading, and generate savings of $9,617.67 per student that could be directed toward other important educational objectives. Thus, the application of the results of the current study would permit the vast majority of each school’s budget, including the portion of funding that is saved as a consequence of rapid assessment’s efficiency, to be directed toward important educational objectives that are not measured by standardized assessments in math and reading. It may be the case that funding should be directed toward some of the 21 alternatives analyzed here, if future research indicates that those alternatives are cost-effective approaches for addressing objectives other than math and reading achievement. The limitation of the current study is that it does not provide guidance about where to allocate funding with regard to those other objectives. Second, the results of the present analysis are conditional on future studies that are needed to determine if the present results generalize across the full range of student and school characteristics. It is important to note that, while the rapid assessment approach appears to work at all grade levels in math, effects tend to fade out in the upper grades in reading. More research is needed to explore why this happens. It may be the case that future studies indicate that some of the 21 alternatives that I analyzed are more cost-effective than rapid assessment in raising student achievement in reading, especially for older children. Interestingly, however, the only study of rapid assessment that analyzed the influence of ability level found that rapid assessment was equally effective for gifted students (effect size = 0.45 SD) and nongifted students (effect size = 0.47 SD) with regard to math achievement (Ysseldyke et al., 2004).

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement    139

Third, the results of the present analysis are conditional on the specific variations of each treatment that were analyzed and the conditions under which each treatment was studied. It is possible that other variations of rapid assessment will prove to be less cost-effective, either because they are less effective or more costly, or that the variation of rapid assessment that is the focus of the present analysis will prove to be less cost-effective under conditions that differ from the conditions of the studies reported here. With these caveats, the results suggest that the variation of rapid assessment that is the focus of the present analysis, used with the population of students and under the conditions characterized by the studies reported here, is approximately one magnitude (10 times) more cost-effective than comprehensive school reform, cross-age tutoring, or computer-assisted instruction; two magnitudes more cost-effective than a longer school day, increases in teacher education, teacher experience or teacher salaries, summer school, more rigorous math classes, value-added teacher assessment, class size reduction, a 10 percent increase in per pupil expenditure, or fullday kindergarten; three magnitudes more cost-effective than Head Start, high-standards exit exams, NBPTS teacher certification, higher teacher licensure test scores, high-quality preschool, an additional school year, or voucher programs; and four magnitudes more cost-effective than charter schools (Figure 8.1). A magnitude of gain implies that, for every dollar invested in rapid assessment rather than the alternative, a given increase in student achievement could be purchased for 10 times as many students. Two magnitudes imply a hundred-fold gain; three magnitudes imply a thousand-fold gain, and four magnitudes imply a ten-thousand-fold gain. The size of the differences in cost-effectiveness suggests that the results are robust even if future research indicates that the effect sizes or costs are incorrectly estimated by factors of 10 or more. While the results of the review of cost-effectiveness literature suggest the advantages of rapid assessment, the use of cost-effectiveness analysis to inform policy decisions is hindered by the persistent criticism that this type of analysis assumes that math and reading are the only important educational objectives and, furthermore, that standardized tests are the best way of measuring progress toward those objectives. However, it may be more accurate to say that cost-effectiveness analysis assumes that improvements in math and reading skills are two important educational objectives, that standardized tests are one important way to measure progress toward those objectives, and the implementation of the most efficient approaches for raising math and reading achievement would conserve social resources that would then become available to pursue additional educational goals beyond the important goals of raising student achievement in math and

140    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Figure 8.1  Relative magnitude of 22 effectiveness–cost ratios.

reading. Interviews with teachers and classroom observations in a district that implemented rapid assessment systems district-wide suggest that these systems permitted teachers to efficiently teach math and reading skills, freeing up time that would otherwise have to be devoted to test preparation for the state-mandated test, thereby avoiding the basic skills dominated curriculum that critics fear (Yeh, 2006a). In contrast, the persistent reliance on highly inefficient approaches for raising math and reading achievement in districts that do not use rapid assessment systems forces teachers and students to devote inordinate amounts of time and energy to basic skills, narrowing the curriculum and subverting the broader educational goals that are favored by critics of standardized testing (Yeh, 2006a).

9 Shifting the Bell Curve The Benefits and Costs of Raising Student Achievement

B

enefit–cost analysis provides a conceptual framework and techniques for estimating the ratio of benefits to costs of a given intervention (Gramlich, 1997; Levin & McEwan, 2001). Standard techniques are used to quantify all benefits and costs, including those that are difficult to quantify because they are not traded in markets, and to convert streams of benefits and costs into present value terms. A benefit–cost analysis typically includes sensitivity analyses that assess the degree to which findings are sensitive to changes in the underlying parameters. Benefit–cost analysis has been used to estimate the benefits and costs of raising student achievement for all K–12 students in California (Brady et al., 2005). The analysis projected the benefits and costs of improved college completion rates, including projected gains in earnings, the value of reductions in crime, and California’s fiscal savings due to lower expenditures on welfare programs. Benefit–cost analysis has also been used to project the increase in earnings; increased tax revenues; and savings in public health, welThe Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 141–157 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved. 141

142    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

fare, and criminal justice costs attributable to various interventions aimed at reducing high school dropout rates for all American students (Levin, Belfield, Muennig, & Rouse, 2007a) and specifically for African American males (Levin, Belfield, Muennig, & Rouse, 2007b). In this chapter, I estimate the benefits and costs of nationwide implementation of rapid assessment, a promising intervention for raising student achievement. I use standard benefit–cost techniques to estimate the increase in earnings, increased tax revenues, value of less crime, and reductions in welfare costs that would be attributable to the intervention. The intervention that is proposed has largely been overlooked, and the results reported here are significantly more positive than the results reported by previous investigators regarding alternative interventions (Brady et al., 2005; Levin et al., 2007a, 2007b). The sensitivity analyses demonstrate that the findings are robust to a five-fold change in the underlying parameters, including the arbitrary assumption that the effect of the intervention weakens after grade 5 and primarily maintains the gains achieved in grades 1 through 5.

Effects on Educational Attainment and Earnings It is well-established that basic skills in math and reading predict educational attainment and earnings (Currie & Thomas, 2001; Murnane et al., 1995; Neal & Johnson, 1996; O’Neill, 1990; Winship & Korenman, 1999). However, only one analysis has accounted for the reciprocal effects of cognitive skills and schooling (Winship & Korenman, 1999). An intervention that boosts cognitive skills will boost years of schooling, which will feed back to boost cognitive skills, which will lead to further gains in schooling, in a cycle that eventually ends with cumulative effects on educational attainment and earnings that are multiples of the estimated gains when the reciprocal effect is neglected. Winship and Korenman (1999) investigated this reciprocal effect using data from the National Longitudinal Survey of Youth (NLSY), involving a national sample of 12,686 individuals. After measuring the direct effect of cognitive skills on annual earnings and the indirect effect of cognitive skills operating through increased schooling, and after accounting for the reciprocal effect of cognitive skills and schooling, Winship and Korenman’s results suggest that a 1.5 SD difference in basic math and reading skills is ultimately associated with a 1.945 year difference in educational attainment and a 53.7% difference in annual earnings, controlling for a host of covariates including ability. Thus, a growing body of research indicates that strong preparation in math and reading, captured by math and reading test scores, can overcome

Shifting the Bell Curve    143

socioeconomic disadvantages and “is the major determinant of differences in educational attainment” between advantaged and disadvantaged young people (Bowen, Kurzweil, & Tobin, 2005, p. 224). This conclusion is not altered even after considering the affordability of college. There is no evidence that students are being forced to enroll in inexpensive colleges that are inappropriate for their level of preparedness (Hoxby, 2000b). Instead, students from high- and medium-high income families who have low SAT scores and high school grades are being replaced by highly prepared students from low-income families (Hoxby, 2000b). Less than 8 percent of all students are prevented from enrolling by their inability to pay (Avery & Hoxby, 2004; Carneiro & Heckman, 2003) and federal Pell grants have no significant impact on enrollment (Kane, 1999), leading Bowen et al. (2005) to conclude that “family finances have a fairly minor direct impact on a student’s ability to attend a college” (p. 91). Math and verbal SAT scores “are much more important factors in the college [application] process than financial variables such as family income” (Spies, 2001, p. 17). Furthermore, once students from disadvantaged socioeconomic backgrounds enroll, they do not underperform their more advantaged counterparts (controlling for SAT scores), suggesting that expansion of the college population to include a greater proportion of disadvantaged students would not depress the earnings of college graduates, as long as this expansion is the result of improved academic preparation—as envisioned by the proposed intervention: “When we [regress rank-in-class on SAT scores], we find that (perhaps as expected) SAT scores explain much of the variation in rank-in-class” (Bowen et al., 2005, p. 118). The significance of academic preparation, as measured by SAT scores, extends to graduation rates: When SAT scores are controlled, there is only a 4.7 percentage point difference in graduation rates between students in the top and bottom income quartiles (Bowen et al., 2005). Furthermore, after controlling for SAT scores, there is no difference between advantaged and disadvantaged students in their rates of attainment of lucrative law and business degrees (Bowen et al., 2005). In summary, these findings suggest not only that student achievement in reading and math is highly predictive of future educational attainment and earnings, but interventions that target reading and math achievement are promising ways to improve educational attainment and earnings.

Rapid Assessment The difficulty is identifying an effective way to raise student achievement. However, the research findings reviewed in the previous chapters suggest that rapid assessment is a promising approach. The average effect size

144    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

for rapid assessment (0.319 SD per year) suggests that improving student achievement by 1.5 SD for every American student could be achieved by implementing rapid assessment in grades 1 through 5. However, it is likely that the effects of a 5-year intervention would fade over time without booster shots every year. Furthermore, student mobility, the influx of English-language learners, and the need to close achievement gaps by the No Child Left Behind deadline of 2014 create a need for ongoing intervention to bring all students up to grade level. Implementing the intervention in grades 1 through 12 would serve to maintain gains by students in grades 1 through 5, accommodate the needs of immigrants and English-language learners, and boost the achievement of students who have already advanced beyond grade 5. Thus, it would be desirable to implement the intervention throughout grades 1–8 for all students, and through high school for the bottom 40 percent of all students.

Costs of Rapid Assessment The costs to implement the intervention include the costs to purchase the software and hardware, ongoing fees for each student, and teacher and administrator training costs, including the opportunity cost of the time devoted to training. Adjusted for the proportion of students receiving the intervention, the expected annual cost per student is $28.31 incurred from ages 6 through 13 and $11.32 incurred from ages 14 through 17, or a total of $226.19 after discounting at a rate of 3.5 percent to age 6, when the intervention begins. In addition, however, the intervention would increase the average length of schooling, primarily in the college years, creating extra costs associated with college attendance. For the majority of students who receive the proposed intervention, a 1.945 increase in years of schooling associated with a 1.5 SD increase in AFQT test scores creates costs to society of 1.945 years of college plus foregone wages. Weighted by the proportion of students enrolled in degree-granting public, private nonprofit, and private for-profit institutions (National Center for Education Statistics, 2007b), the annual expenditure is $27,036 per full-time equivalent student (National Center for Education Statistics, 2007c, 2007g, 2007h), adjusted for inflation. Lost wages for individuals whose attainment is “some college” equal $28,383 per year, adjusted for inflation (Day & Newburger, 2002). The total cost is $107,790 for 1.945 years, incurred at ages 19 and 20. However, in the absence of the intervention, 9.3 percent of all students would have dropped out and would not have obtained a high school diploma or a general educational development (GED) certificate within 8 years

Shifting the Bell Curve    145

of their on-time high school graduation date (Mishel & Roy, 2006). Current Population Survey data suggest that the employment-to-population ratio among 16- and 17-year old high school dropouts ranges from 30.3 to 35.2 percent (Herman, 2000), suggesting that the average dropout endures a lengthy period during which he or she is not working before finding a job (which in many cases is only part-time). Thus, a conservative estimate is that dropouts average a 1-year period when they are not working before obtaining full-time employment. An intervention resulting in a 1.945 year increase in schooling might therefore incur costs of 1 year of college plus 1 year of lost wages ($19,045) for individuals whose attainment is “not high school graduate” (Day & Newburger, 2002), equal to a total of $46,081, incurred at age 18. The probability-weighted average across all individuals receiving the intervention is an annual cost of $4,286 at age 18, a cost of $48,883 at age 19, and $48,883 at age 20, or a total of $62,117 after discounting at a rate of 3.5 percent to age 6, when the intervention begins. The costs are shared among private individuals, the federal government, and state governments. Public degree-granting institutions derive 13.7% of revenues from federal funds and 27.3% from state funds (National Center for Education Statistics, 2007d). The comparable figures for private not-for-profit institutions are 13.7% federal and 1.1% state funds, and for private for-profit institutions, 4.4% federal and 0.7% state funds (National Center for Education Statistics, 2007e, 2007f). Thus, at age 18, the expected value of the federal share of college costs is $337.83 and the state share is $470.71, weighted by the .093 probability of incurring the cost (that is, the probability that in the absence of the intervention an individual would have dropped out of high school instead of attending college) and weighted by the proportions of students enrolled in degree-granting public, private nonprofit, and private for-profit institutions (National Center for Education Statistics, 2007b, 2007d, 2007e, 2007f). At ages 19 and 20, the expected value of the federal share of college costs is $3,204.17 and the state share is $4,464.45, weighted by the .907 probability of incurring the cost (that is, the probability that in the absence of the intervention, an individual would have graduated from high school and begun working full-time, either immediately or after completing some college or a college degree, instead of completing an additional 1.945 years of college or postgraduate schooling), and weighted by the proportions of students enrolled in degree-granting public, private nonprofit, and private for-profit institutions (National Center for Education Statistics, 2007b, 2007d, 2007e, 2007f).

146    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Benefits of the Proposed Intervention The average American student completes 13.3 years of schooling (Organisation for Economic Cooperation and Development, 2006), equal to 1.3 years of college. The proposed intervention would increase the average number of years of schooling by 1.945 and increase annual earnings by 53.7 percent. The total increase in earnings over a 44-year working career from ages 21 through 64 was calculated based on Census Bureau estimates of average annual earnings for individuals at each age whose educational attainment is “some college” (Day & Newburger, 2002). After accounting for historical gains in annual real wages of 2.5 percent (Sullivan, 1997) and discounting to age 6 using a 3.5 percent discount rate, the total equals $769,297 per individual receiving the proposed intervention. Income figures were adjusted using state-specific Keynesian multipliers to account for consumption-induced increases in total income (Baumol & Blinder, 2003). The multiplier takes the form: ∆Y =

∆y [1 − c(1 − t ) + m]

where ΔY is the cumulative change in permanent income due to the Keynesian multiplier effect, Δy is the initial change in permanent income due to the intervention, c is the propensity to consume out of permanent income, t is the effective state and federal personal income tax rate, and m is the propensity to import out of permanent income. The average propensity to consume out of permanent income over the most recent 10-year period for which data are available (1997–2006) is 0.941 (Bureau of Economic Analysis, 2007b). Over the 10-year period from 1997 through 2006, the propensity to import out of permanent income is 0.144 (Bureau of Economic Analysis, 2007a). Combined effective tax rates were calculated for each state based on an average effective federal personal income tax rate of .228 (Congressional Budget Office, 2001) and effective state personal income tax rates from the Urban Institute-Brookings Institution Tax Policy Center (2004), resulting in state-specific multipliers. Weighted by each state’s population of 1st-grade students, total income would increase by an annual amount equal to $6.5 trillion for the 3,663,005 students in each 1st-grade cohort that would benefit from the intervention.

Shifting the Bell Curve    147

Value of Less Crime The value of reduced crime may be estimated from existing data regarding victim, property, and incarceration costs per crime, and econometric analyses relating educational attainment and the incidence of crime (Lochner & Moretti, 2004). Annual arrests by age group for eight major categories of crime were drawn from the Uniform Crime Reports (U.S. Department of Justice, 2006), adjusted for undersurvey (the Department of Justice survey covers only 72 percent of the population of the United States), then converted to per capita rates. The reduction in arrests per capita at age 19 associated with a 1.945-year increase in educational attainment was calculated based on Lochner and Moretti’s (2004) econometric analysis, then multiplied by the number of crimes per arrest (from victim surveys) and the cost per crime and totaled across the categories of crime to derive the social benefit per capita of the reduction in crime at age 19 associated with the proposed intervention. To estimate the value of less crime at each age throughout a lifetime, the age19 figure was adjusted to reflect the relative proportion of crimes committed at each age over the course of a lifetime. The age-specific figures were then discounted at 3.5 percent to age 6, when the intervention begins, to derive an estimate of the total social value of less crime attributable to the proposed intervention, equal to $9,794 per individual receiving the intervention. For the purpose of calculating the purely fiscal savings to state and federal treasuries, a separate calculation isolated the portion of the total social benefit of reduced crime attributable to savings in incarceration costs, following the methodology outlined above. The annual incarceration cost per inmate is $25,935 (Stephan, 2004), adjusted for inflation. This cost was multiplied by the incarceration rate and average time served per offender for eight major categories of crime (Farrington, Langan, & Tonry, 2004; Lochner & Moretti, 2004). Age-specific savings in incarceration costs were calculated based on arrest data from the Uniform Crime Reports, adjusted for undersurvey and crimes per arrest, and multiplied by Lochner and Moretti’s estimated reduction in arrest rates associated with a 1.945-year increase in schooling, then discounted to age 6, equal to a total of $253.90 per individual receiving the intervention, 10 percent saved by the federal treasury and 90 percent by state treasuries (Stephan & Karberg, 2003).

Welfare Savings The value of welfare savings may be estimated from Census Bureau data relating the incidence of poverty and educational attainment (U.S.

148    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Census Bureau, 2006). Age-specific reductions in poverty rates were estimated based on the projection that a 1.945-year increase in schooling shifts each individual in the categories of “less than high school,” “high school,” “some college,” and “bachelor’s or higher” into the next category of educational attainment. The corresponding reductions in poverty rates were used to calculate age-specific savings out of current per capita federal and state welfare costs of $702.25 (Lewin Group and the Nelson A. Rockefeller Institute of Government, 2004). The savings were apportioned by the average federal and state shares. Since the $702.25 figure excludes food stamps, projected age-specific food stamp savings were calculated separately, based on the average monthly benefit of $94.32 (U.S. Department of Agriculture, 2007) multiplied by 7.73 average months per spell, calculated from participation spells over the 1990–1999 period (Cody, Gleason, Schechter, Satake, & Sykes, 2005) multiplied by the age-specific probability of food stamp receipt, and multiplied by the age-specific percentage reduction in poverty resulting from the proposed intervention. All figures were adjusted to reflect that each woman will bear an average of 2.05 children during her child-bearing years (Martin et al., 2006). Adjusted for the ratio of females to males (50.8 to 49.2), each adult is associated with an average of 1.04 children. Thus, each individual in the intervention group would be associated during adulthood with an average of 1.04 children and, thus, the per capita welfare savings must be multiplied by 2.04 during adult childbearing years (ages 22–40) to account for the link between adult and child poverty and welfare consumption. The age-specific welfare savings were discounted at 3.5 percent to age 6, when the intervention begins, and apportioned by state and federal shares for the purpose of estimating the savings by state and federal treasuries.

Tax Savings Gains in federal tax revenue were estimated based on an average effective federal personal income tax rate of .228 (Congressional Budget Office, 2001) multiplied by the increase in income for the 3,663,005 students in each 1st-grade cohort who would benefit from the intervention. Gains in state tax revenues were estimated based on the gains in state personal income tax and state sales tax revenues attributable to the intervention, using state-specific effective tax rates from the Urban InstituteBrookings Institution Tax Policy Center (2004). State sales tax revenues were estimated by multiplying the gain in disposable income generated by each state’s cohort of 1st-grade students by .941, the average propensity to

Shifting the Bell Curve    149

consume out of disposable income (Bureau of Economic Analysis, 2007b), and each state’s average effective sales tax rate. Actual revenues would exceed this amount if the additional income is taxed at upper income tax brackets.

Benefit–Cost Analysis A benefit–cost ratio may be calculated for society as a whole, disregarding the distribution of the benefits and costs across particular individuals, the federal government, and state or local governments. This calculation includes the increased earnings and value to society of reductions in crime, as well as all costs (cash, opportunity, and college costs) that are attributable to the intervention. A second calculation may be performed to estimate the fiscal benefits and costs to the federal treasury, under the assumption that the federal government pays for the intervention. This provides useful information to federal policymakers in judging whether governmental outlays for the intervention would be recouped through increased federal taxes and the federal share of reduced welfare and incarceration costs resulting from the projected increases in educational attainment and earnings. This tells federal policymakers whether the intervention “pays for itself” if the federal government foots the bill for the K–12 costs plus the federal share of college costs. A third set of calculations may be performed to estimate the social benefits (increased earnings and value of less crime) that would be purchased if each state government paid the K–12 costs plus the state share of college costs out of the state treasury. A fourth set of calculations may be performed to estimate the fiscal benefits and costs to each state treasury, under the assumption that each state government paid the K–12 costs plus the state share of college costs out of the state treasury. This provides useful information to state policymakers in judging whether governmental outlays for the intervention would be recouped through increased state taxes and state shares of reduced welfare and incarceration costs resulting from the projected increases in educational attainment and earnings. This calculation tells state policymakers whether the intervention “pays for itself” if each state foots the bill for the intervention.

National Social Benefit–Cost Ratio For the nation as a whole, the net social benefits (increased earnings and value of less crime, minus all social costs), total $6.3 trillion annually in present value terms, if the intervention is applied to each successive cohort of 1st-grade students. The social benefit–cost ratio (social benefits divided

150    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

by social costs) is 28.47, meaning that society gains $28.47 in social benefits for every dollar of social resources consumed. These resources include the K–12 costs of the proposed intervention to raise student achievement as well as the value of foregone earnings and the cost of college for the extra 1.945 years of schooling that each individual would attain as a result of the intervention.

Federal Treasury Benefit–Cost Ratio A more limited calculation can assist federal policymakers to determine whether the fiscal benefits would exceed the fiscal costs that would be incurred if the federal government paid for the proposed intervention. The federal government would gain $1.5 trillion annually in increased federal personal income tax revenues, saved welfare costs, and reduced costs of incarceration. The benefit–cost ratio for the federal government is 93, meaning that the federal treasury would gain $93 in personal tax revenues, savings in the federal portion of welfare costs, and savings in the federal portion of incarceration costs, for every dollar spent on the proposed intervention, including increased federal expenditures if each student attends college for 1.945 additional years.

State Social Benefit–Cost Ratios The lowest state-specific social benefit–cost ratio is 286 in Oregon, meaning that the residents in each of the 50 states would gain a minimum of $286 in increased earnings and the value of less crime for every dollar invested by state treasuries in the proposed intervention. This figure exceeds the national social benefit–cost ratio (28.47), because the cost to each state treasury excludes the extra costs of college borne by college students and their parents as a result of the intervention.

State Fiscal Benefit–Cost Ratios A more limited calculation for each state can assist state policymakers to determine whether the fiscal benefits would exceed the fiscal costs that would be incurred if state governments (instead of the federal government) paid for the proposed intervention. For example, Minnesota’s treasury would net $4.3 billion annually, over and above the costs of the intervention to the state treasury, due to increased sales and personal income tax revenues, reduced welfare costs, and reduced costs of incarceration.

Shifting the Bell Curve    151

The fiscal benefit–cost ratio for the State of Minnesota is 13.5, meaning that Minnesota’s treasury would gain $13.50 for every dollar spent on the proposed intervention. Excluding Alaska and New Hampshire, which do not levy sales taxes or taxes on earned income, the state benefit–cost ratios are no lower than 5, in Texas, meaning that all but two state treasuries would reap a minimum of $5.00 for every dollar spent on the proposed intervention (Table 9.1).

Sensitivity Analysis I A sensitivity analysis provides information about the degree to which the results are sensitive to hypothetical changes in the underlying parameters. This is especially important in benefit–cost analysis, which is heavily based on estimations and projections of future outcomes. Note that the preceding analyses arbitrarily assumed that in grades 6 through 12 the effect size for testing feedback drops to zero. During these grades, the analyses assumed that the intervention serves primarily to hold the ground that was gained during grades 1 through 5 (and, secondarily, to boost the achievement of immigrant students who arrive after grade 5). The results suggest that the benefit–cost ratios are robust to this conservative assumption.

Sensitivity Analysis II A key assumption is that schools do not need to purchase additional computers and printers in order to implement rapid assessment. This assumption is based on research implying that virtually all classrooms had access to at least one online computer by the year 2006 (Parsad & Jones, 2005), plus the ability of all online computers to print from a high-capacity printer in a school’s media center (Reilly, no date), and was verified through interviews with teachers and observations of classrooms where rapid assessment was used. However, if each classroom requires new equipment, an entire system, including a computer, monitor, keyboard, mouse, software, service plan and printer, may be purchased from Dell for $1,015. Assuming that a complete system is purchased for every classroom of 20 students and is amortized over a 7-year period, the annual cost per student is $7.25, raising the total cost of rapid assessment to $35.56 per student. This would reduce each state’s fiscal benefit–cost ratio, but each state treasury (except for Alaska and New Hampshire) would reap a minimum of $4.90 for every dollar spent on the proposed intervention.

Cost to State Treasury (millions)

57 348 216 475 2,841 350 259 33 54 1,199 719 85 202 116 921 477 205 312 348 433 362 84 742

State

AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI

17,521 103,721 63,464 142,449 826,227 103,547 75,487 9,588 15,847 371,313 210,518 24,601 59,256 34,574 272,895 141,046 60,555 91,575 103,492 124,991 106,388 24,509 220,680

Increased Personal Income (millions)

93 574 355 781 4,677 576 426 55 89 1,974 1,184 139 332 191 1,517 786 338 514 572 713 596 139 1,221

Value of Less Crime (millions) 311 299 296 302 292 298 293 289 294 311 294 293 295 300 298 297 296 295 299 290 295 292 299

Social Benefit– Cost Ratio 0 1,857 1,504 2,009 23,795 2,123 2,053 326 417 0 5,410 699 1,434 567 5,512 3,018 1,362 2,289 1,842 4,087 2,543 721 3,994

State Income Tax Gain (millions) 0 1,111 1,551 2,956 12,943 842 1,042 0 0 8,497 2,779 838 763 681 2,992 2,550 940 1,416 1,888 1,209 921 394 3,820

State Sales Tax Gain (millions) 0.2 1.3 0.8 1.8 10.9 1.3 1.0 0.1 0.2 4.6 2.8 0.3 0.8 0.4 3.6 1.8 0.8 1.2 1.3 1.7 1.4 0.3 2.9

State Welfare Savings (millions)

Table 9.1  Benefits and Costs of Raising Student Achievement, by State

2.2 13.4 8.3 18.2 109.1 13.4 9.9 1.3 2.1 46.1 27.6 3.2 7.8 4.4 35.4 18.3 7.9 12.0 13.4 16.6 13.9 3.2 28.5

State Prison Savings (millions) –54 2,634 2,848 4,510 34,017 2,630 2,848 294 365 7,348 7,501 1,456 2,004 1,137 7,622 5,111 2,105 3,406 3,397 4,881 3,117 1,034 7,104

Net Gain to State Treasury (millions) 0.0 8.6 14.2 10.5 13.0 8.5 12.0 9.8 7.7 7.1 11.4 18.2 10.9 10.8 9.3 11.7 11.2 11.9 10.8 12.3 9.6 13.2 10.6

State Treasury Benefit– Cost Ratio

152    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

348 400 238 62 663 42 122 91 602 147 192 1,209 811 302 246 769 69 320 53 441 2,057 243 537 38 444 348 124 37

21,795

MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY

Total

6,466,774

100,783 118,121 71,244 18,335 192,401 12,782 36,112 28,056 178,104 43,473 59,433 348,101 238,146 88,667 69,991 228,968 20,365 94,507 16,280 136,454 636,876 71,133 156,547 11,331 137,358 101,056 36,469 11,440

35,874

573 658 391 103 1,092 70 201 150 990 242 316 1,990 1,336 496 405 1,266 114 527 87 727 3,385 401 884 63 730 573 204 61 298

291 297 301 296 292 304 297 310 298 298 311 290 295 296 286 299 295 297 311 311 311 294 293 297 311 292 296 311 125,135

3,124 2,504 1,062 433 5,772 146 794 34 3,651 878 0 11,627 5,811 2,093 2,723 4,053 497 2,022 0 109 0 1,871 4,289 247 0 3,001 831 0 98,496

1,546 1,402 1,802 0 2,297 188 668 0 2,166 892 1,459 3,460 3,753 980 0 3,023 319 1,615 292 3,288 10,179 1,253 1,196 122 4,091 1,553 589 229 84.0

1.3 1.5 0.9 0.2 2.6 0.2 0.5 0.4 2.3 0.6 0.7 4.7 3.1 1.2 0.9 3.0 0.3 1.2 0.2 1.7 7.9 0.9 2.1 0.1 1.7 1.3 0.5 0.1 837.0

13.4 15.4 9.1 2.4 25.5 1.6 4.7 3.5 23.1 5.6 7.4 46.4 31.2 11.6 9.4 29.5 2.7 12.3 2.0 17.0 79.0 9.3 20.6 1.5 17.0 13.4 4.8 1.4 202,757

4,337 3,523 2,636 373 7,434 294 1,345 -53 5,241 1,629 1,275 13,929 8,786 2,784 2,487 6,339 750 3,331 242 2,974 8,209 2,891 4,971 333 3,666 4,221 1,302 193 10.3

13.5 9.8 12.1 7.0 12.2 7.9 12.0 0.4 9.7 12.1 7.6 12.5 11.8 10.2 11.1 9.2 11.8 11.4 5.6 7.7 5.0 12.9 10.3 9.7 9.3 13.1 11.5 6.2

Shifting the Bell Curve    153

154    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Sensitivity Analysis III A second assumption, based on interviews with teachers and observations of classrooms where rapid assessment is used, is that the rapid assessment software saves more teacher time (primarily time that would otherwise be spent grading math homework and assessing reading comprehension) than is consumed in noninstructional tasks such as supervising student use of the computer, scanner, and printer. However, if teachers do not save time and, instead, lose 15 minutes per day (or 1.25 hours per week), the annual cost is $81.06 per student, assuming 20 students per teacher and an annual teacher salary of $51,880 (U.S. Department of Education, 2005c). The total cost for rapid assessment increases to $109.37 per student. This would reduce each state’s fiscal benefit–cost ratio, but each state treasury (except for Alaska and New Hampshire) would reap a minimum of $4.50 for every dollar spent on the proposed intervention.

Sensitivity Analysis IV A simple way to examine the robustness of the results calculated above is to examine how the benefit–cost ratios were calculated. The ratio for each state was calculated according to the following equation: BCi = [IncTaxi + SalesTaxi + WelfSavi + IncarcSavi] / Costi where BCi is the Benefit–Cost Ratio for the ith state, IncTaxi is the gain in personal income tax revenue in the ith state, SalesTaxi is the gain in sales tax revenue in the ith state, WelfSavi is the savings in welfare costs in the ith state, IncarcSavi is the saved incarceration costs in the ith state, and Costi is the cost to the state treasury of the proposed intervention in the ith state. Inspection of this equation indicates that the fiscal benefits of the proposed intervention exceed the costs to each state treasury unless all of the benefits of the proposed intervention are overestimated by a factor of 5, or the costs of the proposed intervention are underestimated by a factor of 5, or some combination thereof (excepting Alaska and New Hampshire).

Sensitivity Analysis V Sensitivity Analysis V examines the projection based on Winship and Korenman’s (1999) regression analysis that annual earnings would increase by 53.7 percent and schooling would increase by 1.945 years in response to a 1.5 SD increase in AFQT scores. This projection underlies the pro-

Shifting the Bell Curve    155

jected increase in personal income and sales tax revenues, which account (on average) for 99.6 percent of the total projected fiscal gain to each state treasury. Therefore, the projected increase in earnings is the single most critical assumption, other than the assumption regarding the effect size of the proposed intervention. If instead earnings increase by only half of the projected amount, the average person in the treatment group would still gain a present value sum of $384,649 in lifetime earnings. After recalculating the benefit–cost ratios, the smallest ratio drops to 2.5, in Texas, meaning that the fiscal benefits of the proposed intervention exceed the costs to each state treasury, unless all of the benefits of the proposed intervention are overestimated by a factor of 2.5, or the costs of the proposed intervention are underestimated by a factor of 2.5, or some combination thereof (excepting Alaska and New Hampshire). The outcome of this hypothetical change is that the benefit–cost ratio for society would fall to 14.31, but society would still gain $14.31 of benefits for every dollar spent on the proposed intervention. The benefit–cost ratio for the federal government would fall to 46.46, but the federal government would still gain $46.46 for every dollar spent on the proposed intervention. Thus, the basic conclusion—that the proposed intervention provides benefits that exceed the costs—is relatively insensitive to a drastic change in projected gains in earnings.

Discussion The results suggest that the benefits of the proposed intervention would greatly exceed the costs, and that it is feasible and cost-effective for either the federal government or individual state governments to raise student achievement and earnings—perhaps dramatically. The cost-effectiveness analyses in chapter 2 through chapter 8 suggest that the proposed intervention is substantially more cost-effective than alternative approaches for raising student achievement, including voucher programs, charter schools, a 10 percent increase in existing patterns of educational expenditures, increased educational accountability, comprehensive school reform, teacher certification by the National Board for Professional Teaching Standards, a proposal to replace the bottom quartile of novice teachers with new teachers, raising teacher quality by requiring new candidates to meet a minimum SAT score of 1000, class size reduction, and high-quality preschool (Yeh, 2007, 2008, 2009a, 2009b, 2010a; Yeh & Ritter, 2009). It is useful to compare the results reported here with Levin et al.’s (2007a) estimate of the fiscal benefits and costs of five leading interventions for raising high school graduation rates. Levin et al. drew upon raw Current

156    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Population Survey data to estimate differences in annual earnings by level of education, gender, and ethnicity. Their figures imply that white males who are high school graduates earned, on average, 48.7 percent more than high school dropouts. The comparable figure for white females is 111.5 percent; for black males, 61.5 percent; and for black females, 42 percent. These figures are not directly comparable with Winship and Korenman’s (1999) estimate that a 1.5 SD gain in AFQT scores is associated with a 53.7 percent increase in annual earnings, averaged across gender and ethnicity, because Winship and Korenman controlled for covariates including ability, and accounted for the reciprocal effect of cognitive skills and schooling. However, the figures reinforce the conclusion that increases in educational attainment have powerful effects on earnings. In any case, Levin et al. (2007) estimated that the fiscal benefits of the five interventions to improve high school graduation rates exceed the fiscal costs by a ratio of 2.5 to 1. This ratio is half of the lowest fiscal benefit– cost ratio that I estimated, above (for Texas), and is equal to the lowest fiscal benefit–cost ratio that was calculated in Sensitivity Analysis V, which investigated the outcome of shrinking the Winship and Korenman (1999) estimate by 50 percent. The differences in the estimated benefit–cost ratios for the two sets of interventions are attributable to a combination of differences in assumptions regarding effects on earnings, differences in the estimated cost-effectiveness of the interventions, and differences in analytical techniques. Perhaps the largest difference is that the intervention that is proposed and examined in this book would be applied to every student in grades 1 through 8, and through grade 12 for the lowest performing 40 percent of all students. As a result, the intervention would benefit considerably more students than interventions targeted only to potential high school dropouts. Since 9.3 percent of all 8th-grade students drop out and fail to graduate within 8 years of their on-time graduation date (Mishel & Roy, 2006), interventions designed to reduce the dropout rate would benefit a maximum of 9.3 percent of all 8th-grade students, whereas the proposed intervention potentially could benefit every student, including gifted and talented students (Ysseldyke et al., 2004). The research reviewed in chapter 1 suggests hypotheses about the causal mechanisms underlying the proposed intervention and why the intervention appears to be effective with both high- and low-achieving students. Research suggests that this type of intervention allowed teachers to individualize and target instruction, provide more tutoring, reduce drill and practice, improve student readiness for—and spend more time on—critical-thinking activities, resulting in a more balanced curriculum

Shifting the Bell Curve    157

(Yeh, 2006a). Teachers reported that the assessments provided a common point for discussion, increased collaboration among teachers to improve instruction and resolve instructional problems, and supported both new and experienced teachers in implementing sound teaching practices (Yeh, 2006a). The individualized curriculum and rapid feedback on progress reportedly gave students the feeling that they were successful and in control of their own learning, engaging students who previously disliked reading and math, including dyslexic children and children in special education, reducing stress and improving student achievement (Yeh, 2006a). It appears that when students see that they are making academic progress, they become excited about learning (Yeh, 2006a). This enthusiasm translates into greater effort and further improvements in achievement, which further strengthens student engagement (Yeh, 2006a). Arguably, the vast majority of alternative interventions and policies for improving student achievement—including voucher programs, charter schools, higher standards, increased accountability, increased expenditure per pupil, comprehensive school reform, reduced class size and improvements in teacher quality—fail to directly address the need to improve student engagement in academic work. Instead, these interventions increase pressure on teachers and students, increase resources, change the organization of schools, or improve the quality of instruction. Direct comparisons show that these alternatives are far less cost-effective than the proposed rapid assessment intervention (Yeh, 2007, 2008, 2009a, 2009b, 2010a; Yeh & Ritter, 2009). If performance feedback builds students’ self-esteem, confidence, and desire to perform at a high level, then lack of adequate feedback is perhaps the most important cause of low student achievement and might explain why an intervention that provides rapid performance feedback can be an extremely cost-effective way to achieve the goal of improving student engagement and achievement.

This page intentionally left blank.

Conclusion

T

he review of cost-effectiveness studies presented in this book suggests that rapid assessment is one magnitude (10 times) more cost-effective with regard to student achievement than comprehensive school reform, crossage tutoring, or computer-assisted instruction; two magnitudes more costeffective than a longer school day, increases in teacher education, teacher experience or teacher salaries, summer school, more rigorous math classes, value-added teacher assessment, class size reduction, a 10 percent increase in per pupil expenditure, or full-day kindergarten; three magnitudes more cost-effective than Head Start, high-standards exit exams, NBPTS teacher certification, higher teacher-licensure test scores, high-quality preschool, an additional school year, or voucher programs; and four magnitudes more cost-effective than charter schools. Overall, these results are consistent with Hattie and Timperley’s (2007) finding that performance feedback is more powerful than an array of alternative interventions. If these results are surprising, the reason may be that this is the first time that the cost-effectiveness of these alternatives has been systematically compared using a standard metric. In medicine, systematic comparisons using standard metrics serve to quickly rule out incorrect hypotheses regarding the sources of illness and mortality. Hypotheses that high rates of mortality are due to inadequate resources, accountability, or competition can be quickly tested. Whatever the role of these factors, it quickly becomes apparent that the primary causes of illness and mortality are traceable to specific diseases and, thus, it makes little sense to think that major reducThe Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 159–165 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved. 159

160    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

tions in adverse outcomes may be achieved simply through increases in existing patterns of funding, increased accountability for doctors and patients, increased competition, hospital reform, stiffened credentialing of doctors, replacing the bottom quartile of doctors, and reduced overcrowding in hospitals. It clearly makes more sense to identify particular diseases and then target interventions based on knowledge of those diseases. The evidence that lack of performance feedback is the primary factor limiting improvements in student achievement is straightforward: when feedback is applied, the effect size per dollar of intervention is larger than the effect per dollar for any of the alternatives that have been considered. The cost-effectiveness data suggest that feedback is important, that it is largely missing, and supplying the missing feedback improves performance more efficiently than the alternatives. The cost-effectiveness of this approach is evidence that it directly addresses a root problem. The low effectiveness and high cost of achieving any effect through alternative approaches is evidence that those alternatives do not address limiting factors. A major source of confusion is Hanushek’s (1992) finding that a highquality teacher can increase learning by an entire grade level above the amount contributed by a low-quality teacher, suggesting the possibility that aggregate performance may be improved by substituting high-performing teachers for low-performing counterparts. However, this is not much different than the statement that the top 16 percent of all doctors, architects, and engineers perform at a level that is 2 standard deviations above the performance of the bottom 16 percent, which must be true if performance is distributed normally, in the shape of a bell curve. The fallacy is that it is practical and cost-effective to chop off the lower tail of the performance distribution in order to substitute high-performing individuals. That hypothesis was tested—and rejected—in chapter 5 through the cost-effectiveness analysis of Gordon et al.’s (2006) proposal to eliminate the bottom quartile of novice teachers. One reason that the approach is not practical is that measurements of teacher performance are not stable from year to year. More importantly, improved performance in any field usually depends on shifting the entire distribution of performance through improvements in knowledge that boost the capacity of every professional to perform at high levels, rather than weeding out “bad apples.” Thus, a more promising way to raise student achievement is to identify and address a root cause of poor performance: lack of knowledge about areas that need improvement. Lack of knowledge can be addressed through technology that assists all teachers and students to identify and correct areas of weakness, thereby boosting their capacity to perform at high levels and shifting the entire distribution of performance. The quantitative

Conclusion    161

evidence that rapid assessment is highly cost-effective suggests that a root cause of poor performance is a lack of knowledge about areas that need improvement—in other words, a lack of performance feedback. What is not well-recognized is the research reviewed in chapter 1, which links the provision of performance feedback to improved feelings of control and improved engagement and, thus, provides knowledge of a basic process through which feedback influences performance by changing the psychology of learning as well as cognition.

Control, Engagement, and Achievement Perhaps the most obvious difference between strong and weak students is the level of engagement in academic work. Strong students like to read books and to do math problems. They become engrossed in their work. They lose track of time and achieve a psychological state that Csikszentmihalyi (1990) describes as “flow,” where an individual is completely involved in an activity for its own sake. Flow can be achieved in almost any activity, including activities that are widely considered to be boring and uninteresting, such as call-center employment, where workers answer telephones 8 hours a day, 5 days a week. What Csikszentmihalyi found, however, is that workers who were able to achieve flow did so by inventing steadily more challenging goals for themselves—a structure where routine work gained meaning because the workers saw themselves advancing through increasingly higher levels of achievement. Some of the most jaded students routinely achieve flow when they play video games. It is easy to think that students are merely attracted by vivid, often violent graphics. However, the original Pac Man game offered little more than a yellow disc eating white dots, chased by four ghosts, yet was wildly popular. What all video games have in common is rapid feedback regarding progress. Players instantly know when they are performing well, and there is a clear link between effort, tactics, and performance. This is also true of performing arts and athletics, where feedback is auditory, visual, or kinesthetic, and might explain why students who are disengaged academically may be superb musicians or athletes. A ubiquitous feature of the most popular video games is that players who finish the challenges presented to novice players are permitted to advance to the next level, where the difficulty level is increased. World of Warcraft presents players with 70 challenge levels, plus the “end game,” a series of epic challenges that only begin after a player has accumulated the maximum number of points. These final challenges require tightly coordinated play by groups of 40 or more players. World of Warcraft is an extremely

162    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

popular game. Professional players “work” 80-hour weeks, and then play the same game after the regular 12-hour work day is over. Players are easily engrossed, describing the feeling as “playful” and “exhilarating” (Dibbell, 2007, pp. 39, 41). In essence, video games represent a technology that can turn academically unmotivated students into highly dedicated, extremely skillful workaholics. The underlying principle is that humans are biologically adapted through natural selection to be highly responsive to performance feedback. This principle may be applied to structure academic work in a way that fosters the same intrinsic enjoyment and deep engagement that is achieved through video games. The key characteristics appear to be (a) a structure where any student can enter at his or her own ability level and steadily progress to higher levels, and (b) a structure where students receive rapid feedback regarding progress. Note that video games are engaging despite the lack of formal instruction in game strategies and tactics. Not only is formal instruction not necessary in order to engage students—it can easily break the flow and concentration that are characteristic of students who are highly engaged. Similarly, while instruction is a necessary part of the training of musicians and athletes, interruptions by teachers and coaches tend to disrupt the natural flow that is experienced during a performance. With this in mind, it is instructive to review the characteristics of the rapid assessment programs. First, the materials are individualized so that any student can enter at a level tailored to his or her achievement level. The school library is re-organized so that books are shelved according to reading level, and any student can go directly to books that are at his or her individual reading level. Math problems are individualized so that a student only receives problems that are at his or her individual achievement level. Second, students receive rapid feedback regarding their progress. After completing a book, the student takes a brief comprehension quiz, tailored to the book, and receives a report regarding his or her comprehension score and his or her book reading level (as measured through the FleschKincaid reading index). After completing a set of math problems, the student scans his or her bubble sheet and receives a report regarding the number correct, as well as feedback regarding the topics or strands that have been completed. Third, students manage this entire process without interruption from the teacher, except for individualized tutoring or small-group instruction for students who have been flagged by the software program as having difficulty reading or accurately completing math problems.

Conclusion    163

Clearly, there are differences between rapid assessment and video games. But teachers who use rapid assessment trace its impact to the effect of feedback in fostering emotional commitment to work effort (Yeh, 2006a). Students enjoyed working at their own levels and the feeling that they were in control (Yeh, 2006a). Teachers traced the motivational effect of the rapid assessment programs to the fact that students could start at a point where they immediately experienced success and then experienced repeated, incremental success as they progressed from one level to the next (Yeh, 2006a). A recurring theme was the positive impact of the individualized curricula and rapid assessment on children’s self esteem, confidence, independence, and intrinsic motivation to read (Yeh, 2006a). Teachers felt that a big part of the Rapid Assessment program’s success was that differentials in achievement among students were hidden in a way that allowed each child to feel successful (Yeh, 2006a). To the extent that low-achieving students lack self-efficacy and engagement, rapid assessment and individualized instruction may be a more precise intervention than increased educational expenditures, competition through voucher programs and charter schools, increased accountability, comprehensive school reform, teacher certification, replacing the bottom quartile of novice teachers, and class size reduction. Unlike the alternatives, which simply provide more resources, more pressure, reorganized schools, more qualified teachers, better teachers, or smaller classes, it appears that rapid assessment leads to student feelings of control and emotional involvement, and this reinforces student engagement and achievement. This may explain why rapid assessment has a larger effect, per dollar, than the alternatives. Increased spending may increase the resources that are available to teachers; choice and competition might increase pressure on teachers; accountability might increase pressure on students and teachers; school reform might provide a supportive environment; teacher certification and replacement of the bottom quartile of novice teachers might improve the quality of the teaching force; and class size reduction might allow teachers to focus on fewer students, but the available evidence suggests that rapid assessment is more likely to increase student self-efficacy, engagement, and achievement. The cost-effectiveness analyses presented in this book are significant because they demonstrate that rapid formative assessment is substantially more efficient than 21 other approaches for raising student achievement, contradicting popular views among teachers, administrators, policymakers, and researchers about “what works.” The analyses suggest that the most popular approaches for raising student achievement—voucher programs, charter schools, class size reduction, comprehensive school reform, in-

164    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

creased educational accountability, and increased teacher quality—are far less efficient than rapid formative assessment. This suggests that enormous resources, time, and energy are currently being devoted to inefficient approaches. The failure to use resources efficiently dooms large numbers of students to failure and dooms their schools to federal No Child Left Behind (NCLB) sanctions that are imposed on public schools that fail to demonstrate “adequate yearly progress” in improving student achievement. The threat of these sanctions creates severe pressure to raise test scores. Existing research shows that this type of pressure can lead to unintended negative consequences: teaching to the test, stress for students and teachers, and increased dropout rates. Even schools where most students demonstrate high achievement risk sanctions under the federal law if disadvantaged students do not perform well. Ultimately, entire school teaching staffs can be fired and the schools can be taken over by the state. Rapid formative assessment offers a potential solution to the dilemma of raising student achievement while avoiding teaching to the test, excessive stress, and increased dropout rates. According to teachers who have implemented rapid formative assessment, the use of this approach reduces the number of students who fall behind, increases their preparation for critical thinking activities, and allows teachers to spend more time on critical thinking because fewer students need remediation on basic skills (Yeh, 2006a). As a result, teachers can prepare students for the state test without having to give up critical thinking activities (Yeh, 2006a). Cost-effectiveness analysis is not an infallible approach for identifying the most efficient strategy for raising student achievement. However, it is striking that the results of the analysis are consistent with the theory of learning presented in chapter 1 that explains the gap in student achievement between disadvantaged students and their more advantaged peers. The cost-effectiveness results lend support to the theory. Conversely, the basic research findings reviewed in chapter 1 that underlie the theory provide insight into the cost-effectiveness results and explain why rapid formative assessment is a substantially more efficient approach for raising student achievement than any of the 21 competing approaches that were analyzed. Four pieces of evidence fit together: (a) basic research findings about the conditions under which humans become passive, disengaged, and apathetic; (b) findings about the rapid loss of self-confidence experienced by many students, apparently in the years after entering school and after encountering grading practices that constantly remind students that half of them are below-average; (c) findings about the conditions under which learned helplessness can be reversed, and (d) findings about the effect of individualized task difficulty and rapid assessment feedback regarding progress, measured

Conclusion    165

individually for each student against his or her previous level of achievement. Together, these four bodies of research suggest a coherent explanation of the efficiency of rapid assessment. This explanation points to the importance of individualizing task difficulty and combining this individualization with rapid formative assessment to produce an effective treatment for learned helplessness. Thus, the cost-effectiveness results are consistent with diverse bodies of research literature spanning much of what is currently known about the conditions that promote or hinder engagement and learning. More than any single piece of evidence, this consistency lends support to the conclusion that rapid formative assessment is perhaps the most efficient strategy for raising student achievement.

This page intentionally left blank.

A ppen d i x

A

Notes to Table 5.1

All figures adjusted for inflation to August 2006 dollars using the consumer price index. I.A, II.B, IV.B, V.B, V.E, VI.B, VII, VIII.A, VIII.B: Figures converted to total compensation, based on U.S. Department of Labor data indicating that the ratio of total compensation to wages for civilian workers is 1.43 (U.S. Department of Labor, 2008a). I.A: Lost output until replacement is hired. Assumes 53 business days (.204 years) to fill vacated position (Texas Center for Educational Research, 2000) at salary equivalent to average beginning teacher salary of $40,049 (U.S. Department of Education, 2005c). While the information regarding the time to fill a vacated position is somewhat dated, it is consistent with 2005 data showing that the average American corporation took 48 days to fill a middle management vacancy (Green & Pope, 2006). I.B, I.C, I.E, III.A, III.B, III.C: Texas Center for Educational Research (2000), Table 9. I.A, I.B, I.C, I.D, I.E, II.A, II.B, II.C: Assumes the turnover-related replacement-cost multiplier derived in Model 2 (1.567). The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 167–170 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved. 167

168    Notes to Table 5.1

I.B, III.B: Including advertising, recruiting, background check, interviews, and administrative costs (but not a signing bonus or subject-matter stipend, which are transfers that offset the cost to the worker of relocation). I.D, VI.A: Based on Sorensen’s (1995) formula for calculating the reduced output while an employee learns a new job, based on monthly salary (MS): Cost = (0.8 × MS) + (0.6 × MS) + (0.4 × MS) + (0.2 × MS) =  2.0 × MS = .1666 × (annual salary) = $6,672.16 assuming normal productivity equal to beginning teacher salary of $40,049. Sorensen’s formula assumes that new employees reach full productivity after 4 months. Other estimates suggest that full productivity is not reached until 6 months (Fitz-enz, 1997) or even 9 months (Texas Center for Educational Research, 2000) after starting a new job, or that the cost is nearly twice the cost given by Sorensen’s formula (Bliss, 2000), suggesting that Sorensen’s formula is a conservative estimate of the reduction in output while an employee learns a new job. II.A, IV.A, V.A: It might be argued that job search expenses are costs that would be voluntarily incurred by job applicants even in the absence of Gordon et al.’s (2006) proposal. However, employer costs to hire replacements to fill the positions vacated by employees in response to the turnover of a teacher would be substantially higher if potential workers were not actively seeking those positions. Thus, it is appropriate to include the costs incurred by applicants in the social costs of teacher turnover. Job search expenses include travel, postage, telephone, lodging, fax services, meals, photocopies, newspapers, professional attire, books, memberships, stationary, subscriptions, transportation, job fair fees, and dues. The U.S. Department of Labor (2005) provides an allowance of $1,311.09 for job search expenses (including travel and lodging) incurred outside of a job seeker’s normal commuting distance. However, only a small fraction of job hunters seek jobs outside of that distance. Using annual national survey data from the Panel Study of Income Dynamics, the probability of a residential move in response to a change in employment, controlling for an array of covariates, is 7.65% (Kan, 2002), suggesting that 7.65% of job hunters seek employment outside of normal commuting distance and may incur job search expenses of approximately $1,311.09. However, most job hunters confine their job searches to the local commuting area and incur costs that are perhaps 50 percent lower. Thus, the expected cost of a job search is (.0765) ($1,311.09) + (.9235)($655.55) = $705.70.

Notes to Table 5.1    169

II.B, IV.B: Based on weighted average of 11.87 hours per week devoted to job search (Keeley & Robins, 1985) × 11.5 average weeks of job search while employed (Blau, 1992) = 136.51 hours (0.079366 years) devoted to job search while employed, at annual beginning teacher salary of $40,049. See note (e). Costs incurred by job applicants to actively seek out and match themselves to the positions that become vacant as a result of teacher turnover reduce employer hiring costs and, thus, should be included in the social costs of teacher turnover. Includes personal time and lost leave time from work while preparing résumés, job applications, preparing for interviews, traveling to interviews, and interviewing. II.C, IV.C, V.D: In 2003, the average cost of a residential move was $52,109 for homeowners and $14,008 for renters (Employee Relocation Council, 2006), or an average of $31,610.66 (weighted by the fraction of home ownership (46.2 percent) for the lowest income quintile, where most teachers are likely to fall (Orzechowski & Sepielli, 2003). Using annual national survey data from the Panel Study of Income Dynamics, the probability of a residential move in response to a change in employment, controlling for an array of covariates, is 7.65% (Kan, 2002). Thus, the expected cost of a residential move is $31,610.66 × .0765 = $2,418.22, or $2,684.13 after adjusting for inflation. V.B: Based on weighted average of 11.87 hours per week devoted to job search (Keeley & Robins, 1985) × weighted average of 13.13 weeks of job search while unemployed (Blau, 1992) = 155.85 hours (0.090610 years) devoted to job search while unemployed, at annual beginning teacher salary of $40,049. V.C: Retraining may include, for example, the cost of community college coursework to prepare for an alternative occupation. Estimates of the average cost to retrain workers range between $2,374 and $3,406. It cost the Commonwealth of Massachusetts $3,406 to retrain a displaced worker in 2001, while the State of North Dakota spent $2,374 retraining a dislocated worker in 2000–2001 (Dunbar, 2003). The average cost is $2,890. V.E: Lost income (lost output to society). Based on an average of 27.36 weeks of retraining (Congressional Budget Office, 1994) and an average of 10.4 weeks to find a new position (Gottschalck, 2006) (equivalent to a total of .726 years) to find a new position, at a salary equivalent to the average beginning teacher salary of $40,049 (U.S. Department of Education, 2005c). VI.B: The decision to terminate and replace a bottom-quartile teacher creates an opportunity cost to society of using the replacement worker’s labor, which is calculated as a weighted average of the cost of each source of

170    Notes to Table 5.1

replacement labor, based on the probabilities of drawing the replacement from each source of labor. VII: Society gains the value of the output of the terminated teacher once that individual is retrained and finds and begins a new job, which is estimated to start at the salary of a new beginning teacher ($40,049) and grow at a real (inflation-adjusted) rate of 3.5 percent per year but is discounted at the same rate to calculate the present value. The present value of this income stream (the gain in output), $335,610.62, is calculated over 8.38 years, equal to the average duration of the new teacher’s expected teaching career (9.11 years) minus an average of 27.36 weeks of retraining (Congressional Budget Office, 1994) and an average of 10.4 weeks to find a new position (Gottschalck, 2006). VIII.A: A voluntary job-hopper gives up a salary averaging $45,216.35 (total compensation of $64,659.38) to accept an annual beginning teacher salary of $40,049 (total compensation of $57,270.07). Presumably, the psychic gain must be equal to the difference in salary minus relocation costs (see Section IV), otherwise job-hoppers would not voluntarily switch jobs. The present value of this stream of psychic gains over the expected career duration of a new teacher, assumed to grow at 3.5 percent per year (see Sullivan, 1997, regarding long-term trends in real income growth) but discounted at 3.5 percent per year for the present value calculation, equals $59,381.48. VIII.B: A voluntary job-hopper accrues psychic gains equal to the costs of job search and relocation (see Section II). Presumably, the psychic gains must equal these costs, otherwise job-hoppers would not voluntarily switch jobs. IX: The total cash cost of $3,959.46 (Feistritzer, 2006), adjusted for inflation and amortized over an expected 9.11 year teaching career.

A ppen d i x

B

Opportunity Cost to Society of Employing the Labor of a Worker Who Replaces a Bottom-Quartile Teacher

Activity Before Certification Professional Outside Education Nonprofessional Outside Education Undergraduate Student Graduate Student Military Service Caring for Family Substitute Teacher or Paraprofessional Teacher Postsecondary Teacher Education, Not a Teacher Retired Unemployed Other Total

Imputed Annual Time Proportion Value .40 .07 .09 .03 .09 .02 .10 .06 .01 .05 .01 .02 .05 1.00

51,338.51a 33,981.22a 30,958.00b 53,755.45c 67,707.68d 21,260.11a,e 25,071.73f 40,049.00g 58,438.05a 52,809.88a,h 40,049.00i 42,328.00j 40,049.00k —

Expected Value (2006 Dollars) 20,535.40 2,378.69 2,786.22 1,612.66 6,093.69 425.20 2,507.17 2,402.94 584.38 2,640.49 400.49 846.56 2,002.45 45,216.35

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 171–173 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved. 171

172    Appendix B

Notes: All figures adjusted for inflation to August 2006 dollars using the consumer price index.















a. (U.S. Department of Labor, 2008b). b. The average starting salary of a new liberal arts college graduate ($30,958; National Association of Colleges and Employers, 2006), assuming that individuals in this category are most likely to fill the position left vacant by those who are pursuing a career path leading to alternative certification as a teacher. This underestimates the time value of new college graduates to the extent that some of the graduates who enter teaching earn degrees in engineering and other fields that pay salaries that are higher than salaries commanded by liberal arts graduates. Note that the costs of educating the college graduate are sunk costs (they are incurred by society whether or not teacher turnover occurs) and, therefore, are not included as costs of teacher turnover. c. Based on average earnings of individuals with master’s degrees, at the start of their careers (ages 25–29) (Day & Newburger, 2002). This underestimates earnings of individuals with graduate degrees to the extent that some individuals earn doctoral degrees and may be expected to earn more than graduates with master’s degrees. d. The vast majority of military personnel who pursue teaching through alternative certification are retired enlisted personnel who retire at rank E-7 after 20 years of service with annual compensation equal to $67,707.68 (Congressional Budget Office, 2004). e. Based on average earnings of personal and homecare aides. This understates the time value of caring for family to the extent that there are special nonpecuniary benefits that are not reflected in market-determined wages for personal and homecare aides (for example, when mothers care for their children). f. Substitute teachers averaged $13.54 per hour in July, 2004 (U.S. Department of Labor, 2004, July), or $25,071.73 per year, assuming 40 hours per week, 43 weeks per year. g. The average salary of a beginning teacher (U.S. Department of Education, 2005c). This underestimates the time value of teachers to the extent that some of them (perhaps working as private tutors) were performing at a level that exceeds the productivity of beginning teachers. h. This category does not include paraprofessionals and is unlikely to include principals, but is likely to include vocational and educational counselors, who earned, on average, $28.52 per hour in July, 2004 (U.S. Department of Labor, 2004, July), or $52,809.88 per year, assuming 40 hours per week, 43 weeks per year. i. The opportunity cost for a retired individual is the value of leisure. To the extent that retirement is voluntary, leisure is valued more than the wage in the retiree’s last job (since the retiree chose leisure over work), which (for college graduates who are qualified to work as teachers) is likely to be higher than the wage of a new teacher ($40,049), because the wage for the former reflects mature career productivity, whereas wages in teaching are relatively low and, among all teachers, new teachers are at the bottom of the salary lad-

Appendix B    173





der. Paradoxically, it may be argued that leisure is valued less than the starting wage of a new teacher, since the teaching position was chosen in preference to retirement (and leisure). The paradox is resolved if teaching generates nonpecuniary benefits that are greater than the difference between the value of leisure to the individual and the starting wage of a new teacher. Thus, the starting wage of a new teacher is a lower bound on the opportunity cost of employing a retiree as a teacher. See also Isley and Rosenman (1998) who suggest that the market wage at which work is accepted (i.e., $40,049) is a lower bound of the value of leisure foregone when a retiree accepts a job. j. The value of leisure time for unemployed individuals with 15–16 years of education (the minimum level of education required for beginning teachers) was estimated to be $20.35 per hour or $42,328 per year, based on the average reservation wage of unemployed individuals from a 1993 followup observation of the 1980 National Longitudinal Survey of Youth cohort (Mohanty, 2005). Note that unemployment compensation is a transfer from state government to unemployed individuals, which does not affect the overall social cost of hiring an unemployed individual. k. It is unclear from the survey of alternative certification teachers what this category includes, except that it does not include individuals in the previously listed categories. However, given the value of leisure calculated for unemployed individuals ($42,328) and the lower bound on the value of leisure for retired individuals ($40,049), a reasonable estimate is that the reservation wage for any individual who falls into the “Other” category is probably no lower than $40,049. Note that Feistritzer’s (2005) survey accounts for 99 percent of alternative-certification teachers. For the purpose of the current analysis, the 1 percent of teachers who are unaccounted in Feistritzer’s data were grouped in the “Other” category.

This page intentionally left blank.

A ppen d i x

C

Notes to Table 8.1



a. b. c. d. e. f. g. h. i. j.

k.

l. m.

n.

Annualized effect size. Annual cost per student, adjusted for inflation to 2006 dollars. Effect size in SD units divided by annual cost per student. Ross et al. (2004) Ysseldyke & Tardrew (2007). Reading. Math. Nunnery et al. (2006). Ysseldyke and Bolt (2007). (Yeh, 2008). Of the subgroup of comprehensive school reform models categorized by Borman et al. (2003) as having the Strongest or Highly Promising Evidence of Effectiveness, the model with the highest effect size was Expeditionary Learning Outward Bound (effect size = 0.51 SD). (Yeh, 2008). Of the subgroup of comprehensive school-reform models categorized by Borman et al. (2003) as having the Strongest or Highly Promising Evidence of Effectiveness, the model with the highest effectiveness–cost ratio was Expeditionary Learning Outward Bound (effectiveness–cost ratio = 0.002341). Nye et al. (2001). Average of effect sizes in grades 1, 2, and 3. Reichardt (2000). Annual cost per student of reducing class size from 24 to a ceiling of 17 students per class, adjusted for inflation using the September 1997 price index (161.2) and the August 2006 price index (203.9). Finn et al. (2001). In grade 2, the achievement advantage for students who participated in small classes for 1, 2, and 3 years was 0.12 SD, 0.24 SD, and 0.36 SD,

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 175–176 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved. 175

176    Notes to Table 8.1

o.

p.

q. r.



s. t.

u. v.

w. x. y.

z. *

respectively, in reading, or an average of 0.12 SD per year, and 0.16 SD, 0.24 SD, and 0.32 SD, respectively, in math, or an average of 0.129 SD per year. Greenwald et al.’s (1996) meta-analytic estimate that a 10% increase in perpupil expenditure would increase student achievement in math and reading by 0.083 SD annually. Total expenditure per pupil in constant 2004–2005 dollars equals $10,464 (National Center for Education Statistics, 2005d). A 10% increase equals $1,046.40, or $1,118.83 in 2006 dollars, using the January 2005 price index (190.7) and the August 2006 price index (203.9). Grodsky et al. (2009). Including the cost to society of the incremental increase in the number of students who are denied diplomas solely because exit exams raise the graduation bar. Schweinhart et al. (1993), Table 13, annualized over 2-year period. Barnett (1992), adjusted for inflation using the January 1985 price index (105.5) and the August 2006 price index (203.9). Ramey et al. (2000), Figure 3, annualized over 5-year period. Barnett & Masse (2007). Cost in a public school setting, minus the value of formal and informal childcare services provided to the control group and adjusted for inflation using the January 2002 price index (177.1) and the August 2006 price index (203.9). Note that since preschool costs are incurred in years prior to the K–6 years when rapid assessment is typically implemented, preschool costs are underestimated relative to the cost estimate for rapid assessment (costs for rapid assessment should be discounted to the time period when preschool costs are incurred). Yeh (2007). Average effect size across three randomized and lottery-based studies. Significant costs to society would arise to the extent that voucher programs pull students at random from public school classrooms, making it difficult (if not impossible) for public schools and districts to reduce expenditures at the same rate that students use vouchers to switch to private schools. Private school costs were reduced by public school savings of 17.38 percent. Average effect size across three large-scale studies controlling for student fixed effects after 5-year shakedown period. To the extent that charter schools are not simply converted public schools, but instead are newly created schools that require new facilities, new administrators, and new teachers, and pull students from widely scattered classrooms in traditional public schools, society pays for the creation and maintenance of a larger number of school-level administrative units, facilities, and teachers. Charter school costs were reduced by regular school savings of 17.38 percent.

Notes

Chapter 1 1. Scriven (1991), who coined the formative/summative distinction in 1967, points out that a major fallacy regarding formative assessment is that it cannot be limited to information about correctness. 2. Reading Assessment, Math Assessment, and Rapid Assessment Corporation are pseudonyms for Accelerated Reader, Accelerated Math, and the Renaissance Learning Corporation, used to avoid the appearance that the author endorses the assessment software. The author is neither affiliated with, nor has received any funding from, the vendor.

Chapter 2 1. As noted by a reviewer, comparison of effect sizes across studies assumes consistency in underlying population variances. However, it seems unlikely that the magnitude of the differences in cost effectiveness that are reported in this study are entirely attributable to underlying differences in population variances. 2. Average baseline scores of all students in the three studies ranged from the 19th to the 30th national percentile on the Iowa Tests of Basic Skills, or ITBS (Howell et al., 2002, Table 3, p. 199). In this range on the ITBS, each NPR point is equal to 0.636 NCE points; thus, 0.9 NPR points equal 0.572 NCE points (see www.usoe.k12. ut.us/eval/DOCUMENTS/IOWA_FAQ.pdf). Throughout this The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 177–192 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved. 177

178    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

review, when gains were reported using the NCE metric, effect sizes were calculated by dividing the NCE gain score by the NCE standard deviation (21.06). 3. As noted by a reviewer, these studies typically examine a small fraction of charter school students—those who enter in 5th grade and fill a small number of slots created by exiting students. This may ignore heterogeneity by grade level and by years spent in the charter school. Second, and more importantly, students who switch into charter schools at late ages may be different from students who do not switch in ways that cannot be controlled using student fixed effects. 4. The use of the student-level standard deviation as opposed to the state-level standard deviation would reduce the effect size to 0.08 SD (Lee, 2008). 5. Large fixed costs ($7,861.05 in reading and $14,948.55 in math) incurred at start-up create opportunity costs equal to the income that would otherwise be earned if this amount were instead expended in seven equal annual installments (the arbitrary lifetime of the program) and the remaining funds were invested in an interest-bearing account. Assuming a real interest rate of 3 percent and a discount rate of 3.5 percent, the foregone income is $819.55 in reading and $1,229.04 in math, or $0.23 per student per year in reading and $0.35 per student per year in math, in a school with 500 students amortized over the 7-year lifetime of the program. 6. Cost figures reflect the actual operating experience of schools in a typical district, audited by the researcher through classroom observation of operating procedures as well as teacher and administrator interviews in 8 schools (spanning elementary, middle, and high school levels). While a school of 500 students taking 2 to 5 assessments per week in math and reading suggests that 2,000 to 5,000 assessments are processed weekly, the burden on teachers is minimal because students scan their own bubble sheets, the software scores each assessment, and summary reports are available to teachers and administrators electronically.

Chapter 3 1. John Henry was a worker who outperformed a machine under an experimental setting because he was aware that his performance

Notes    179

was being compared with that of a machine, thus, a “John Henry” effect is a type of Hawthorne effect. 2. An instrument is a variable (such as cohort size) that does not belong in the regression predicting student achievement, but happens to be correlated with the explanatory variable (class size) and uncorrelated with the error term (i.e., uncorrelated with omitted covariates) and, therefore, is substituted into the regression for the explanatory variable as an analytic strategy to avoid correlation with the error term.

Chapter 4 1. Effect sizes calculated from studies using comparison groups are comparable to the effect sizes presented (below) for studies of rapid assessment feedback interventions, all of which involve effect sizes calculated from studies using comparison groups. Borman et al. (2003) also calculated and reported effect sizes for a larger group of studies, including studies where no comparison group was used and effect sizes were calculated from pretest to posttest for the treatment group only. However, this method fails to address threats to internal validity, including maturation and history. 2. Sikorski (1990) also conducted a cost-effectiveness analysis of Direct Instruction. However, she calculated the total, rather than incremental, increase in costs associated with the intervention. Therefore, her cost figures are not consistent with the incremental cost methodology used here. In addition, Sikorski only analyzed cost-effectiveness with regard to reading instruction. 3. Separate cost-effectiveness analyses suggest that Reading and Math Assessment are more cost-effective than the remaining CSR models in the Borman et al. meta-analysis, including those that offered only Promising Evidence of Effectiveness and those that required the Greatest Need for Additional Research. The model with the highest effect size reported by Borman et al. (2003) is Integrated Thematic Instruction (ITI). The effect size of 0.92 SD is based on a single matched-group quasi-experimental study that compared students in one school that used ITI with one school that did not. This unpublished doctoral dissertation compared 19 ITI students with 45 non-ITI students over a 2-year period, starting in 3rd grade, with regard to reading achievement,

180    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

implying an annualized effect size of 0.46 SD. Average annual professional development costs over the first 3 years are $73,000, excluding the value of teacher time for required training workshops (American Institutes for Research, 2006). If 25 teachers participate in a 1-week training workshop and their time is valued at $282 per day, the annual cost is $35,250. Materials, including a library of professional books that each school is required to purchase, are estimated to cost approximately $5,000. Assuming that annual training costs in years 4, 5, and 6 are half the annual costs in years 1, 2, and 3, and if the cost of materials and high initial training costs are amortized over the average 6-year life of a CSR program, the total per-pupil cost is $164.04 per year, or $116.66 less than the lowest cost estimated by Odden (2000) for all of the CSR models in his cost analysis, after adjusting for inflation ($280.70). The effectiveness–cost ratio for Integrated Thematic Instruction is 0.002804. The CSR model with the second-highest effect size in Borman et al.’s (2003) meta-analysis is the Paideia model (effect size = 0.57 SD). Based on a sample of three sites and information from the developer, the first-year cost, adjusted for inflation using the January 1999, price index (164.3) and the August 2006 price index (203.9), is $181,189.29, including the salary of a facilitator, professional development, teacher release time, and materials (Herman, 1999). The annual inflation-adjusted cost in subsequent years includes the cost of a school facilitator ($62,051.13), teacher release time ($28,200 for 4 days for 25 teachers), and assessments ($43,435.79) (Herman, 1999). There are additional costs in year 2 for the implementation of coaching ($55,846.01) (Herman, 1999). Amortizing high initial costs in years 1 and 2 over the expected 6-year life of a CSR program, the annual cost per student in a school with 500 students is $301.82, adjusted for inflation. The effectiveness–cost ratio is 0.001889. Effect sizes for the remaining CSR models in Borman et al.’s (2003) meta-analysis are smaller than the 0.46 SD effect size for Integrated Thematic Instruction, and cost data are either not available beyond the initial year of implementation, or suggest annual costs equal to or greater than the cost of Integrated Thematic Instruction, implying maximum effectiveness–cost ratios no larger than 0.002804, the ratio for Integrated Thematic Instruction. This ratio is smaller than the smallest ratios for Reading and Math Assessment.

Notes    181

4. Adjusted for inflation using the June 1997 price index (160.3) and the August 2006 price index (203.9) (see U.S. Department of Labor, 2006b, for price indices).

Chapter 5 1. Aaronson et al. (2007) matched students and teachers for 783 Chicago public school 9th-grade math teachers, controlled for an array of covariates, and found that 6 of every 10 teachers in the top and bottom quartiles shifted out of those quartiles the following year. McCaffrey et al. (2008) investigated the intertemporal stability of teacher fixed effects using Florida data for over 400,000 students and found a comparable lack of stability. In the largest school district, for example, a minimum of 61 percent of teachers shifted out of each quintile the following year. Koedel and Betts (2007) compared the rankings of San Diego teachers and found that a minimum of 65 percent of teachers shifted out of the top and bottom quartiles the following year. Ballou (2005) compared the rankings of teachers in one Tennessee district and found that a minimum of 52 percent of teachers shift out of the top and bottom quartiles the following year. These results suggest that value-added modeling lacks adequate reliability for compensation and retention decisions. The probability of correctly rewarding high-performing teachers and punishing low-performing teachers using value-added modeling was less than 50 percent; in other words, less reliable than flipping a coin. The results suggest the existence of factors beyond the control of teachers that are not adequately controlled through existing statistical techniques. One factor is that multiple teachers systematically influence the performance of students in each subject area. For example, Aaronson et al. (2007) found significant fixed effects of English teachers on math students’ performance. While the spillover effect was not quite half as large as the fixed effect attributed to the students’ math teachers, the size of the effect was startling, leading the researchers to recommend future studies to “explore why there are such large achievement effects estimated for teachers whom one would not expect to be the main contributors to a subject area’s learning” (p. 125). Aaronson et al. (2007) concluded that any scheme that linked teacher pay to performance “would require serious attention to implementation problems (Murnane, Singer, Willett, Kemple, & Olsen, 1991), including, but far from

182    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

limited to, important measurement issues associated with identifying quality” (p. 132). It is interesting that although Hanushek bases his recommendations on value-added performance models, his own results suggest a lack of stability in judgments based on these models. Of the 22 teachers who were observed in multiple grades and had at least 3 students in each grade, only 2 teachers consistently demonstrated superior performance, as measured by their students’ reading and vocabulary performance (Hanushek, 1992). Of the 39 additional teachers who were observed in different years but in the same grade, only 4 teachers consistently demonstrated superior performance, as measured by their students’ reading and vocabulary performance (Hanushek, 1992). The measured performance of the rest of the teachers varied across years and/or subjects. Hanushek’s (1992) results, along with the results of the Aaronson et al. (2007), McCaffrey et al. (2008), Koedel and Betts (2007), and Ballou (2005) studies, suggest two possibilities: either actual performance varies from year to year for most teachers, or the value-added statistical models that measure that performance are flawed, as indicated by the results from the Aaronson et al. (2007) study. In either case, it would be inappropriate to use value-added statistical models for high-stakes decisions regarding teacher hiring, firing, compensation, and promotion. Those models assume that teacher quality is a stable trait that can be measured with high reliability, including high intertemporal reliability. However, it appears that teacher quality is either not a stable trait, or it cannot be measured with high reliability. 2. Suppose that worker W1 occupies Job 1 (J1) at firm F1, W2 occupies J2 at F2, W3 occupies J3 at F3, and so on. If W1 quits J1 in order to take a job as a teacher, W2 can then move into J1. W3 can move into J2, and so on. Thus, the recruitment of one teacher from currently employed workers potentially unleashes a chain of job-hopping that ends only when a replacement for a job-hopper is drawn from the ranks of the not-employed. The total cost to society is a multiple of the cost to one employer to replace one employee. The cost is layered on top of the job-hopping that would occur in the absence of Gordon et al.’s (2006) proposed policy. 3. If x is the present value cost (assumed to grow at an inflationadjusted 3.5 percent annual rate, but discounted at the same rate for the purpose of calculating present value) to replace a single worker, and the probability that an employer hires a replacement

Notes    183

who is currently working in another job is .83 for the teacher replacement and .50 for subsequent replacements, the sum of the expected costs of all worker replacements is given by Model 1, where each term represents the expected replacement cost for each individual job that may be expected to turn over, in chronological sequence: Model 1: .83x + (.83)(.5)x + (.83)(.5)2x + (.83)(.5)3x + . . . + (.83)(.5)Nx = Ax In this model, the first term (.83x) is the expected cost incurred by F1 (the new teacher’s former employer) to replace her. The second term [(.83)(.5)x] is the expected cost incurred by F2 (the replacement employee’s former employer), the third term is the expected cost incurred by F3, and so forth, while N+1 equals the number of positions that potentially turn over as a consequence of hiring an alternative certification teacher. The parameter N is constrained by labor dynamics. If the average duration of employment is 2 years, then each new hire is, on average, 2 years younger than departing employees, implying that the chain of job-hoppers tends to become younger as the chain progresses. W1 quits, becomes a teacher, and is replaced at F1 with a younger W2 from F2. W2 is replaced at F2 by a younger W3 from F3. W3 is replaced at F3 by a younger W4 from F4, and so on. This information can be used together with the age distribution of alternative teacher certification candidates and longitudinal survey data collected by the U.S. Bureau of Labor Statistics (BLS) to determine a reasonable value for N. On average, individuals enter alternative teacher certification programs at age 36 (Feistritzer, 2005). Individuals who are age 36 have accumulated a job history involving an average of 6 jobs since college graduation (U.S. Department of Labor, 2006a), implying an average employment duration of 2 years. Thus, once a 36-year-old individual enters teaching, he or she potentially sets off a chain of worker replacements, each younger (on average) than the previous worker by 2 years. The chain terminates when any employer in the chain hires from the ranks of not-employed workers, with probability equal to .50. However, if the chain of replacements lasts 6 jobs, it is increasingly probable that the replacement for the 6th job is drawn from the ranks of fresh college graduates or other non-employed individuals, rather than employed workers. Thus, a conservative as-

184    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

sumption is that the probability of drawing a replacement worker from the ranks of employed workers drops to zero after 6 terms, implying that N equals 5. The interpretation of N + 1, equal to 6, is that this is (on average) the maximum length of the chain of job replacements and, thus, the maximum number of jobs (on average) for which costs must be calculated. The parameter N + 1 should not be confused with average chain length, which is governed by the law of joint probability. In Model 1, x is the present cost to replace a worker who quits and becomes a new teacher. However, when these costs are calculated based on the average salary for beginning teachers in 2006 (see notes a and e to Table 5.1), replacement costs in early career positions for individuals destined to become teachers are likely to be overstated because beginning teachers are paid more than the average liberal arts graduate and most of the search and replacement costs in Table 5.1 are scaled to income. Therefore, Model 2 gradually reduces x in every successive job replacement through coefficients that would reduce the average beginning teacher’s starting salary ($40,049) to the average salary of new liberal arts college graduates ($30,958) (National Association of Colleges and Employers, 2006) by the sixth term, corresponding to the individual’s first job upon graduation from college. This adjustment reflects the lower replacement costs associated with early career positions. Model 2: .83x + .9546(.83)(.5)x + .9092(.83)(.5)2x + .8638(.83)(.5)3x + .8184(.83)(.5)4x + .7730(.83)(.5)5x = 1.567x 4. The largest cost is the foregone output of the worker who quits in order to accept the new teaching job (Table 5.1, VI.B.). Appendix B lists the sources of alternative certification teachers, their proportional distributions (from Feistritzer, 2005), imputed annual time values, and expected time values (imputed annual time values multiplied by the proportional weights). The weighted average indicates that the annual value of the foregone labor of a worker who replaces a bottom quartile teacher is $64,659.38 in 2006 dollars ($45,216.35 adjusted for a total compensationto-salary ratio of 1.43 [U.S. Department of Labor, 2008a]). The present value of this stream over the expected career duration of a new teacher (9.11 years, see Yeh, 2010a), assumed to grow at 3.5 percent per year (see Sullivan, 1997, regarding long-term trends

Notes    185

in real income growth) but discounted at 3.5 percent per year for the present value calculation, is $589,046.96 (Table 5.1, VI.B.). This cost is potentially offset by three benefits: (a) the gain to society of the terminated teacher’s output once that individual has been retrained and has found a new position (Table 5.1, VII), (b) the psychic gain to the individual who voluntarily relinquishes a relatively high-paying job to pursue a teaching position (Table 5.1, VIII.A), and (c) psychic gains to replacement workers who voluntarily hop into new jobs (Table 5.1, VIII.B). The gain to society of the terminated teachers’ output is limited by the time it takes for the terminated teacher to be retrained and to find a new job, as well as the lower productivity of that teacher as a novice in a new occupation, relative to the imputed time value of individuals who switch careers in order to become a teacher and typically have achieved a more mature level of productivity, in occupations where salaries are higher than the salary of beginning teachers and, thus, output is valued, in dollar terms, more highly than the output of teachers (Table 5.1, VI.B.). It is important to note that, in Gordon et al.’s (2006) scheme, young novice teachers are being replaced by mature, skilled workers coming from other occupations outside teaching. Since fired teachers would necessarily be novice teachers, they would generally be younger, less experienced, and their output would be less valuable to society than the older, more experienced individuals who replace them, arriving through the alternative teacher certification program at an average age of 36 years, with an average employment history of 14 years beyond college in occupations outside teaching. This is costly to society. Psychic gains for new teachers may be estimated by subtracting the annual compensation for beginning teachers ($57,270.07, equal to a starting salary of $40,049 adjusted for a total compensation-to-salary ratio of 1.43) from the average value of time for individuals who choose to switch to teaching ($64,659.38, equal to $45,216.35 adjusted for a total compensation-to-salary ratio of 1.43), multiplying this difference by 9.11 years (the expected career duration of a new teacher), and subtracting search and relocation costs of $7,935.13 (see Table 5.1, IV). Presumably, the psychic gain must be equal to the difference in salary minus search and relocation costs, otherwise job-hoppers would not voluntarily switch jobs. The present value of this stream of psychic gains over the expected career duration of a new teacher, as-

186    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

sumed to grow at 3.5 percent per year (see Sullivan, 1997, regarding long-term trends in real income growth) but discounted at 3.5 percent per year for the present value calculation, is $59,381.48 (Table 5.1, VIII.A.). Psychic gains to replacement workers who voluntarily hop into new jobs may be calculated in a similar fashion. Voluntary jobhoppers accrue psychic gains equal to the costs of job search and relocation (see Section II). Presumably, the psychic gains must equal these costs, otherwise job-hoppers would not voluntarily switch jobs. The present value of this stream of psychic gains over the expected career duration of a new teacher, assumed to grow at 3.5 percent per year but discounted at 3.5 percent per year for the present value calculation, equals $12,434.34. The calculation of psychic gains depends on the strong assumption that large numbers of additional individuals who are currently employed elsewhere would voluntarily choose to leave their jobs and hop into the newly vacated teaching and replacement worker positions that would open up as a result of Gordon et al.’s (2006) proposal. While the current flow of individuals into those positions is adequately stimulated by their calculation of the psychic gains, there is no obvious reason why large numbers of additional individuals would suddenly conclude that the psychic gains outweighed the costs, leading them to pursue teaching or the positions vacated by replacement workers. Instead, it is likely that filling those positions would require increases in salaries that ought to be counted as costs of Gordon et al.’s (2006) proposal. Thus, the psychic gains in Table 5.1 are overestimated, and the net present value cost to society of replacing one teacher ($153,570.94) is underestimated, to the extent that existing workers, given existing wage rates, decide that they do not wish to change jobs. The increase in teacher salaries required to attract a sufficient number of new individuals to the teaching profession may be estimated using conservative assumptions. If Q equals the annual supply of second-year teachers, Gordon et al.’s (2006) proposal implies that Q must be increased by one third (33.33 percent) to an amount equal to 4Q/3 (elimination of the bottom quartile of second-year teachers reduces 4Q/3 to 3Q/3). The cost is determined by the elasticity of teacher supply, defined as the percentage change in the annual supply of new teachers for every 1 percent change in average annual teacher salary: %ΔQ/%ΔS. We assume a supply elasticity of 3, which is near the top of the range

Notes    187

of ordinary supply elasticities estimated by Manski (1987). However, the correct elasticity is likely to be lower because Gordon et al.’s (2006) proposal increases the risk of being fired, making teaching a less desirable career choice. Thus, a 1 percent salary increase is likely to be insufficient to induce a 3 percent increase in the supply of new teachers, implying that our estimate of the required increase in teacher salaries is likely to be a lower-bound estimate of the true cost. A supply elasticity of 3 implies that teacher salaries must increase by 11.11 percent (33.33/3) to elicit the number of new teachers required to replace the bottom quartile of novice teachers who would be fired under Gordon et al.’s (2006) proposal. Assuming an average teacher salary of $51,055.19 per year after adjusting for inflation (National Center for Education Statistics, 2005c), a compensation-to-salary ratio of 1.43 (U.S. Department of Labor, 2008a) and assuming 20 students per teacher, it would cost an extra $405.56 per student per year to raise salaries sufficiently to attract the teachers necessary to replace the bottom quartile of novice teachers. In addition, employers of potential replacement workers may incur additional costs to the extent that the increased availability of teaching positions as a consequence of Gordon et al.’s (2006) proposal increases the compensation required to retain those workers, although some employers may instead choose to tolerate a higher level of employee turnover in order to reduce retention costs (Abelson & Baysinger, 1984). While some economists have suggested that turnover serves a useful purpose in improving match quality between workers and employers (Antel, 1986; Jovanovic, 1979; Liu, 1986), more recent research suggests that returns on human capital do not change as a consequence of job mobility and, thus, average match quality is unaffected by mobility (Widerstedt, 1998). 5. Weighted by the proportion of students attending public and private institutions (National Center for Education Statistics, 2006a), 4-year colleges expend $34,683 per student annually (National Center for Education Statistics, 2006b). Thus, the cost to society for each new teacher includes the cost of an additional year of education: $34,683 of actual college expenditures, adjusted for inflation using the January 2005 price index (190.7) and the August 2006 price index (203.9) and lost compensation for new liberal arts college graduates of $44,269.94, based on lost wages of

188    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

$30,958 (National Association of Colleges and Employers, 2006) and a total compensation-to-wage ratio of 1.43 (U.S. Department of Labor, 2008a), or a total of $78,952.94. The costs also include the opportunity costs of employing 5th-year college graduates, which equals the value of their foregone output in the next best use of their labor. This may be imputed based on the average starting salary of new liberal arts college graduates ($30,958) (National Association of Colleges and Employers, 2006). The present value of this stream over the expected career duration of a new teacher (9.11 years, see Yeh, 2010a), adjusted for a total compensation-to-salary ratio of 1.43, and assumed to grow at 3.5 percent per year (see Sullivan, 1997, regarding longterm trends in real income growth) but discounted at 3.5 percent per year for the present value calculation, is $403,299.15.

Chapter 6 1. The best available estimate of the career duration of the average teacher was derived using proportional-hazards modeling, which accounts for the difficulty of estimating career duration when some members of the research sample have not exited the teaching profession by the end of the research study period (Murnane et al., 1988). Proportional-hazards modeling incorporates information about the pattern of teacher attrition during the study period to predict the median length of each spell of teaching. Using data from Michigan covering a 12-year time period, Murnane et al. (1988) provided separate estimates, for six subject-area specialties, of the duration of the average teacher’s first two spells of teaching. The authors reported the percentage distribution of teachers across the six subject-area specialties, as well as the percentage of teachers in each of the six subject areas who returned to teaching after a career interruption. I used this information to calculate the average career duration for an average teacher, weighted by the percentage distribution of teachers across the six subject-area specialties, and including the expected length of a second spell of teaching based on the probability of a second spell. This average (9.11 years) may be conceptualized as a weighted combination of two averages: (a) the average career duration for teachers who exit the teaching profession before they can be certified (3 years, based on a survival function that falls to zero by the end of year 5, since National Board certification is only avail-

Notes    189

able to teachers who have 3 or more years of teaching experience [Cavalluzzo, 2004], and since the certification process takes approximately 2 years after teachers become eligible and apply for certification [Goldhaber & Anthony, 2007; Sanders et al., 2005], it probably takes a minimum of 5 years from the date a teacher begins teaching to the date of certification), and (b) the average career duration for teachers who stay long enough to be certified (that is, all teachers who stay longer than 5 years). Based on the survival functions and proportional weights provided by Murnane et al. (1988), the average career duration of teachers who stay for at least 5 years is 12.86 years, suggesting that the average NBPTS certified teacher could be expected to teach for 7.86 years (12.86 minus 5) after certification by NBPTS. Thus, the costs of the certification process may be amortized over 7.86 years, the average period over which the benefits would be realized. While Murnane et al. (1988) offer the most sophisticated estimate of the average career duration for all teachers, an alternative estimate may be derived from National Research Council (NRC) data regarding the career duration of elementary and secondary teachers in Michigan and North Carolina (Boe & Gilford, 1992). Based on the NRC data, the average career duration for elementary teachers is 8.45 years, while the average duration for secondary teachers is 5.85 years. Weighting by the total number of elementary (1,363,937.3) and secondary (1,033,065.5) teachers (U.S. Department of Education, 2005b) suggests an average teaching career of 7.33 years. This estimate is shorter than the estimate of 9.11 years used in the present analysis. An analysis using the shorter figure would effectively increase the annual cost of NBPTS certification by reducing the period over which the costs of certification would be amortized. 2. The benefits of certified teachers would be gained very slowly. Students can gain the benefits of certified teachers only when building principals replace teachers who are performing below NBPTS standards with certified teachers (there is no benefit to students if existing teachers become certified). However, lack of NBPTS certification would not be grounds for firing existing teachers, so dismissal is not an option. Furthermore, releasing probationary teachers who are uncertified (but would otherwise be rehired) would not only create a glut of uncertified teachers, but would choke off the future supply of certified teachers, since NBPTS certification is not available to teachers until they have accumulated

190    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

a minimum of 3 years of teaching experience. Thus, the primary process through which a building principal could replace teachers who are performing below NBPTS standards with NBPTS-certified teachers is through normal turnover of existing staff. While annual teacher turnover is 13.2 percent, approximately half of the replacements are new teachers (Ingersoll, 2001) who cannot be NBPTS certified, since certification occurs only after 3 years of teaching plus an application process that takes an average of 2 years (see note 2, above), suggesting that building principals could replace a maximum of 6.6 percent of their teaching staffs every year with NBPTS-certified teachers (assuming an adequate supply of certified teachers). If “x” is the size of the teaching staff, then .066x teachers may be replaced in the first year, and .066(x.066x) teachers may be replaced in the second year with NBPTS certified teachers, and so forth. The cumulative 5-year replacement rate is given by the following function: .066x + .066(x – .066x) + .066[x – .066x – .066(x – .066x)] + .066[x – .066x – .066(x – .066x) – .066[x – .066x – .066(x – .066x)]] + .066[x – .066x – .066(x – .066x) – .066[x – .066x – .066(x – .066x)] – .066[x – .066x – .066(x – .066x) – .066[x – .066x – .066(x – .066x)]]] = 5(.066)x – 10(.066)2x + 10(.066)3x – 5(.066)4x + (.066)5x = .28922134x Thus, the cumulative replacement function reaches 28.92 percent after 5 years. In practical terms, for a building with 25 teachers, the principal may replace (on average) 1.65 teachers in the first year, and a total of 7.23 teachers after 5 years, with NBPTS-certified teachers. The annual replacement rate declines every year because the principal has replaced a growing proportion of teachers with certified replacements. Thus, a building principal can replace at most 28.92 percent of the teaching staff (7.23 teachers in a building with 25 teachers) with NBPTS-certified teachers within 5 years, given the current rate at which experienced teachers turn over, and one additional NBPTS-certified teacher can be added in year 6. If normal turnover means that a constant 21.3 percent of teachers have less than 5 years of teaching experience, 78.7 percent of the teaching staff, or 19.68 experienced teachers in a building of

Notes    191

25 teachers, could potentially be replaced with NBPTS-certified teachers. However, based on an average NBPTS passing rate of 59.7 percent, 11.75 of these teachers are already performing at the NBPTS level. The principal can improve student achievement by replacing the remaining 7.93 experienced teachers who are performing below the NBPTS standard with Board-certified teachers. The cumulative replacement function suggests that a building principal could not achieve the objective of replacing the 7.93 teachers any earlier than 6 years, even under the optimistic assumption that all of the teachers who leave voluntarily are the teachers who are performing below the NBPTS level. If highperforming teachers are just as likely as low-performing teachers to leave the profession, the rate at which a building principal can replace low-performing teachers with Board-certified teachers is halved, and the process takes twice as long. Thus, any benefits of Board certification would be realized very slowly. 3. A study finding positive effects of teacher certification (DarlingHammond et al., 2005) did not control for student or school fixed effects (Podgursky, n.d.).

Chapter 7 1. The assumption is that high-ability teachers scoring one SD above the mean are recruited and replace average teachers scoring at the mean. In contrast, if teachers who exit the profession are below average in ability, replacing those individuals with high-ability individuals would result in gains that are larger than one SD. However, while econometric evidence suggests that teachers who quit are below average in ability, as measured by student performance in the exit year, their performance during the exit year is noticeably worse than in the previous year: “This strongly suggests that those who exit are not systematically worse in a longer term sense but only in the year in question” (Hanushek et al., 2005, p. 26). While the precise reason for the temporary reduction in performance is unclear, the finding that it is a temporary phenomenon suggests that the ability of teachers who exit is average, rather than below average. Thus the potential gain in cognitive ability as a consequence of the proposed policy of teacher replacement is limited to one SD.

192    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Chapter 8 1. Note that the math and reading achievement measures for the Perry and Abecedarian programs were administered 2 to 3 years after the preschool treatment ended. Since the effect of a treatment may be expected to decay over time, the delay in measurement may disadvantage high-quality preschool, compared with rapid assessment and the other interventions where achievement was measured immediately after the treatments ended. 2. If the effects of preschool on earnings and reductions in crime are mediated entirely by increased student achievement, it is appropriate to infer that any intervention that improves student achievement by an equivalent amount would have similar effects on earnings and crime, and a cost-effectiveness analysis that ranks each intervention along the dimension of student achievement accurately predicts rankings along the dimensions of earnings and reductions in crime. If, however, the effects of preschool on earnings and reductions in crime involve a different path, it would not be appropriate to infer that rankings along the dimension of student achievement accurately predict rankings along the dimensions of earnings and reductions in crime. The best available data suggest that a significant portion of the effects of preschool is channeled through increased student achievement, but other mediators are also important (Reynolds, Ou, & Topitzes, 2004).

References

Aaronson, D., Barrow, L., & Sander, W. (2007). Teachers and student achievement in the Chicago public high schools. Journal of Labor Economics, 25(1), 95–135. Abelson, M. A., & Baysinger, B. D. (1984). Optimal and dysfunctional turnover: Toward an organizational level model. The Academy of Management Review, 9(2), 331–341. Allen, R., & Kickbusch, K. (1997, April 25). Reducing K–3 class sizes. Retrieved June 17, 2007, from http://www.weac.org/sage/research/reducingclasssize.htm American Institutes for Research. (2006). CSRQ Center report on elementary school comprehensive school reform models. Washington, DC: American Institutes for Research. Ames, C. (1992). Classrooms: Goals, structures and student motivation. Journal of Educational Psychology, 84(3), 261–271. Amrein, A. L., & Berliner, D. C. (2002a). High-stakes testing, uncertainty, and student learning. Education Policy Analysis Archives, 10(18). Retrieved from http://epaa.asu.edu/epaa/v10n18/ Amrein, A. L., & Berliner, D. C. (2002b, December). The impact of high-stakes tests on student academic performance: An analysis of NAEP results in states with high-stakes tests and ACT, SAT, and AP test results in states with high school graduation exams. Retrieved June 5, 2004, from http://www. asu.edu/educ/epsl/EPRU/documents/EPSL-0211-126-EPRU.pdf Anderman, E. M., & Maehr, M. L. (1994). Motivation and schooling in the middle grades. Review of Educational Research, 64, 287–309. Angrist, J. D., & Guryan, J. (2004). Teacher testing, teacher education, and teacher characteristics. American Economic Review, 94(2), 241–246.

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, pages 193–221 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved. 193

194    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Angrist, J. D., & Guryan, J. (2008). Does teacher testing raise teacher quality? Evidence from state certification requirements. Economics of Education Review, 27(5), 483–503. Antel, J. J. (1986). Human capital investment specialization and the wage effects of voluntary labor mobility. Review of Economics and Statistics, 68, 477–483. Aos, S., Miller, M., & Mayfield, J. (2007). Benefits and costs of K–12 educational policies: Evidence-based effects of class size reductions and full-day kindergarten. (Document No. 07-03-2201). Olympia: Washington State Institute for Public Policy. Aouad, G., Lee, A., & Wu, S. (2010). Computer aided design guide for architecture, engineering and construction. New York: Taylor and Francis. Avery, C., & Hoxby, C. M. (2004). Do and should financial aid packages affect students’ college choices? In C. M. Hoxby (Ed.), College choices: The economics of where to go, when to go, and how to pay for it (pp. 239–301). University of Chicago Press. Ballou, D. (2005). Value-added assessment: Lessons from Tennessee. In R. Lissetz (Ed.), Value added models in education: Theory and applications (pp. 1–26). Maple Grove, MN: JAM Press. Ballou, D., & Podgursky, M. (1995). Recruiting smarter teachers. The Journal of Human Resources, 30(2), 326–338. Baltes, M. M., & Baltes, P. B. (1986). The psychology of control and aging. Hillsdale, NJ: Erlbaum. Bangert-Drowns, R. L., Kulik, C. C., Kulik, J. A., & Morgan, M. (1991). The instructional effect of feedback in test-like events. Review of Educational Research, 61(2), 213–238. Bangert, R. L., Kulik, J. A., & Kulik, C. C. (1983). Individualized systems of instruction in secondary schools. Review of Educational Research, 53(2), 143–158. Barnett, W. S. (1992). Benefits of compensatory preschool education. The Journal of Human Resources, 27(2), 279–312. Barnett, W. S. (1996). Economics of school reform: Three promising models. In H. F. Ladd (Ed.), Holding schools accountable: Performance-based reform in education (pp. 299–326). Washington, DC: The Brookings Institution. Barnett, W. S., & Masse, L. N. (2007). Comparative benefit–cost analysis of the Abecedarian program and its policy implications. Economics of Education Review, 26, 113–125. Baumol, W., & Blinder, S. (2003). Macroeconomics: Principles and policy (9th ed.). Mason, OH: Thomson South-Western. Belfield, C. (2006). The use of cost-benefit analysis in guiding investments in human capital in elementary and secondary school. Flushing, NY: City University of New York. Belfield, C. R. (2006). The evidence on education vouchers: An application to the Cleveland Scholarship and Tutoring Program. Flushing, NY: Queens College, City University of New York, and National Center for the Study of Privatization in Education, Teachers College.

References    195

Betts, J. R., Rueben, K. S., & Danenberg, A. (2000). Equal resources, equal outcomes? The distribution of school resources and student achievement. San Francisco: Public Policy Institute of California. Bifulco, R., & Ladd, H. F. (2005). Results from the Tar Heel state. Education Next, 5(4), 60–66. Bifulco, R., & Ladd, H. F. (2006). The impacts of charter schools on student achievement: Evidence from North Carolina. Education Finance and Policy, 1(1), 50–90. Biggers, D. (2001). The argument against Accelerated Reader. Journal of Adolescent & Adult Literacy, 45(1), 72–75. Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7–74. Blankenship, V. (1992). Individual differences in resultant achievement motivation and latency to and persistence at an achievement task. Motivation and Emotion, 16(1), 35–63. Blau, D. M. (1992). An empirical analysis of employed and unemployed job search behavior. Industrial and labor relations review, 45(4), 738–752. Bliss, W. (2000). The business cost and impact of employee turnover. Retrieved November 28, 2006, from http://www.hrfocus.com/forum Block, J. H., & Burns, R. B. (1976). Mastery learning. Review of Research in Education, 4, 3–49. Boe, E. E., & Gilford, D. M. (Eds.). (1992). Teacher supply, demand, and quality: Policy issues, models, and data bases. Washington, DC: National Academies Press. Bohrnstedt, G. W., & Stecher, B. M. (Eds.). (2002). What we have learned about class size reduction in California. Sacramento, CA: California Department of Education. Bond, L., Smith, T., Baker, W. K., & Hattie, J. A. (2000). The certification system of the National Board for Professional Teaching Standards: A construct and consequential validity study. Greensboro, NC: Center for Educational Research and Evaluation, The University of North Carolina at Greensboro. Borman, G. D., & Dowling, N. M. (2006). Longitudinal achievement effects of multiyear summer school: Evidence from the Teach Baltimore randomized field trial. Educational Evaluation and Policy Analysis, 28(1), 25–48. Borman, G. D., & Hewes, G. M. (2002). The long-term effects and cost-effectiveness of Success for All. Educational Evaluation and Policy Analysis, 24(4), 243–266. Borman, G. D., Hewes, G. M., Overman, L. T., & Brown, S. (2003). Comprehensive school reform and achievement: A meta-analysis. Review of Educational Research, 73(2), 125–230. Borman, G. D., Hewes, G. M., Overman, L. T., & Brown, S. (2004). Comprehensive school reform and achievement: A meta-analysis. In C. T. Cross (Ed.), Putting the pieces together: Lessons from comprehensive school reform research (pp. 53–108). Washington, DC: The National Clearinghouse for Comprehensive School Reform.

196    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Bouffard-Bouchard, T. (1989). Influence of self-efficacy on performance in a cognitive task. Journal of Social Psychology, 130, 353–363. Bowen, W. G., Kurzweil, M. A., & Tobin, E. M. (2005). Equity and excellence in American higher education. Charlottesville, VA: University of Virginia Press. Brady, H., Hout, M., Stiles, J., Gleeson, S., & Hui, I. (2005). Return on Investment: Educational choices and demographic change in California’s future. Berkeley, CA: Survey Research Center, University of California Berkeley. Bratton, S. E., Jr., Horn, S. P., & Wright, S. P. (n.d.). Using and interpreting Tennessee’s Value-Added Assessment System: A primer for teachers and principals. Retrieved December 1, 2006, from http://www.shearonforschools. com/documents/TVAAS.HTML#Let’s Braun, H. (2004). Reconsidering the impact of high-stakes testing. Education Policy Analysis Archives, 12(1). Retrieved from http://epaa.asu.edu/epaa/ v12n1/v12n1.pdf Brewer, D. J., Krop, C., Gill, B. P., & Reichardt, R. (1999). Estimating the cost of national class size reductions under different policy alternatives. Educational Evaluation and Policy Analysis, 21(2), 179–192. Brookover, W. B., Beady, C. H., Flood, P. K., Schweitzer, J. G., & Wisenbaker, J. M. (1979). Schools, social systems and student achievement: Schools can make a difference. New York: Praeger. Brookover, W. B., Schweitzer, J. G., Schneider, J. M., Beady, C. H., Flood, P. K., & Wisenbaker, J. M. (1978). Elementary school social climate and school achievement. American Educational Research Journal, 15, 301–318. Buckingham, J. (2003). Reforming school education: Class size and teacher quality. Policy, 19(1), 15–20. Bureau of Economic Analysis. (2007a). National income and product accounts, Table 1.10: Gross domestic income by type of income, and Table 4.1: Foreign transactions in the national income and product accounts. Retrieved December 2, 2007, from http://www.bea.gov/national/nipaweb/ Bureau of Economic Analysis. (2007b). National income and product accounts, Table 2.1: Personal income and its disposition. Retrieved December 2, 2007, from http://www.bea.gov/national/nipaweb/ Butler, R. (2006). Are mastery and ability goals both adaptive? Evaluation, initial goal construction and the quality of task engagement. British Journal of Educational Psychology, 76, 595–611. California Department of Education. (2007). Frequently asked questions: K–3 class size reduction program. Retrieved August 31, 2007, from http://www.cde. ca.gov/ls/cs/k3/ California State Board of Education. (1999). New California mathematics adoptions. Retrieved March 15, 2007, from http://www.mathematicallycorrect.com/calbooks.htm Carneiro, P., & Heckman, J. J. (2003). Human capital policy. In J. J. Heckman & A. Krueger (Eds.), Inequality in America: What role for human capital policies? (pp. 77–239). Cambridge, MA: MIT Press.

References    197

Carnoy, M. (2001). School vouchers: Examining the evidence. Washington, DC: Economic Policy Institute. Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A cross-state analysis. Educational Evaluation and Policy Analysis, 24(4), 305–331. Cavalluzzo, L. (2004). Is National Board certification an effective signal of teacher quality? Retrieved October 29, 2006, from http://www.cna.org/ documents/CavaluzzoStudy.pdf Center for Greater Philadelphia. (2004). Value-added assessment. Retrieved February 8, 2008, from http://www.cgp.upenn.edu/ope_value.html#9 Center on Education Policy. (2005). High school exit exams: Basic features. (Exit Exams Policy Brief No. 1). Washington, DC: Center on Education Policy. Chaszar, A. (Ed.). (2006). Blurring the lines: Computer-aided design and manufacturing in contemporary architecture. Hoboken, NJ: Wiley. Choy, S. P., Chen, X., Bugarin, R., & Broughman, S. P. (2006). Teacher professional development in 1999–2000: What teachers, principals, and district staff report. (Report No. 2006-305). Washington, DC: National Center for Education Statistics. Cibulka, J. G. (1983). Response to enrollment loss and financial decline in urban school systems. Peabody Journal of Education, 60(2), 64–78. Clayton, M. (2000, May 23). How a new math program rose to the top. Christian Science Monitor. Retrieved from http://www.csmonitor.com/sections/ learning/mathmelt/p-2story052300.html Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2006). Teacher-student matching and the assessment of teacher effectiveness. Journal of Human Resources, 41(4), 778–820. Clotfelter, C. T., Ladd, H. F., & Vigdor, J. L. (2007). Teacher credentials and student achievement: Longitudinal analysis with student fixed effects. Economics of Education Review, 26(6), 673–682. Cody, S., Gleason, P., Schechter, B., Satake, M., & Sykes, J. (2005). Food stamp program entry and exit: An analysis of participation trends in the 1990s, Final report. (Report No. 8823-600). Washington, DC: Mathematica Policy Research, Inc. Cohen, P. A., Kulik, J. A., & Kulik, C.-L. C. (1982). Educational outcomes of tutoring: A meta-analysis of findings. American Educational Research Journal, 19(2), 237–248. Coleman, J. S., Campbell, E., Hobson, C., McPartland, J., Mood, A., Weinfeld, R., et al. (1966). Equality of educational opportunity. Washington, DC: Government Printing Office. College Board. (2006). SAT percentile ranks: Critical reading, mathematics, and writing. Retrieved February 17, 2007, from http://www.collegeboard. com/prod_downloads/highered/ra/sat/SATPercentileRanks.pdf Congressional Budget Office. (1994). An analysis of the administration’s proposed program for displaced workers. Washington, DC: Author.

198    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Congressional Budget Office. (2001). Effective federal tax rates, 1979–1997. Washington, DC. Congressional Budget Office. (2004). Educational attainment and compensation of enlisted personnel. Washington, DC: Author. Cooper, H., Charlton, K., Valentine, J. C., & Muhlenbruck, L. (2000). Making the most of summer school: A meta-analytic and narrative review. [Serial No. 260]. Monographs of the Society for Research in Child Development, 65(1). Crandall, V., Katkovky, W., & Crandall, U. (1965). Children’s beliefs in their own control of reinforcements in intellectual academic achievement situations. Child Development, 36, 91–109. Csikszentmihalyi, M. (1990). Flow: The psychology of optimal experience. New York: Harper and Row. Cuban, L. (2001). Oversold and underused: Computers in the classroom. Cambridge, MA: Harvard University Press. Cunningham, G. K., & Stone, J. E. (2005). Value-added assessment of teacher quality as an alternative to the National Board for Professional Teaching Standards: What recent studies say. In R. Lissitz (Ed.), Value added models in education: Theory and applications (pp. 209–232). Maple Grove, MN: JAM Press. Currie, J., & Thomas, D. (2001). Early test scores, socioeconomic status, school quality and future outcomes. Research in Labor Economics, 20, 103–132. Danner, F. W., & Lonsky, D. (1981). A cognitive-developmental approach to the effects of rewards on intrinsic motivation. Child Development, 52, 1043– 1052. Darling-Hammond, L. (2000). Teacher quality and student achievement: A review of state policy evidence. Education Policy Analysis Archives, 8(1). Retrieved from http://epaa.asu.edu/ojs/article/viewFile/392/515 Darling-Hammond, L., Holtzman, D.J., Gatlin, S.J., & Heilig, J.V. (2005). Does teacher preparation matter? Evidence about teacher certification, Teach for America, and teacher effectiveness. Education Policy Analysis Archives, 13(42). Retrieved from http://epaa.asu.edu/epaa/v13n42/ Day, J. C., & Newburger, E. C. (2002). The big payoff: Educational attainment and synthetic estimates of work-life earnings. (Report No. P23-210). Washington, DC: U.S. Census Bureau. Dee, T. S., & Jacob, B. A. (2007). Do high school exit exams influence educational attainment or labor market performance? In A. Gamoran (Ed.), Standards-based reform and the poverty gap: Lessons for No Child Left Behind (pp. 154–200). Washington, DC: Brookings Institution Press. Developmental Center for Handicapped Persons. (1987). Early Intervention Research Institute, final report. Logan, UT: Utah State University. (ERIC Document Reproduction Service No. ED293242) Dibbell, J. (2007, June 17). How the world of online gaming spawned a multimillion-dollar shadow economy measured in virtual coins, 80-hour workweeks and very real money. The New York Times Magazine, 36–41.

References    199

Diener, C. I., & Dweck, C. S. (1978). An analysis of learned helplessness: Continuous changes in performance, strategy, and achievement cognitions following failure. Journal of Personality and Social Psychology, 36, 451–462. Dorans, N. J. (1999). Correspondences between ACT and SAT I scores. (College Board Report No. 99-1, ETS RR No. 99-2). New York: College Entrance Examination Board. Dougherty, M. R., & Harbison, J. I. (2007). Motivated to retrieve: How often are you willing to go back to the well when the well is dry? Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(6), 1108–1117. Drucker, P. M., Drucker, D. B., Litto, T., & Stevens, R. (1998). Relation of task difficulty to persistence. Perceptual and Motor Skills, 86, 787–794. Dunbar, F. C. (2003). Testimony of Dr. Fredrick C. Dunbar, Senior Vice President, National Economic Research Associates, before the Committee on the Judiciary, United States Senate. Retrieved June 4, 2003, from http:// judiciary.senate.gov/testimony.cfm?id=777&wit_id=2188 Dweck, C. S. (1991). Self-theories and goals: Their role in motivation, personality, and development. In R. A. Dienstbier (Ed.), Perspectives on motivation: Nebraska symposium on motivation Vol. 38 (pp. 199–235). Lincoln: University of Nebraska Press. Eccles, J. S., Wigfield, A., Flanagan, C.A ., Miller, C., Reuman, D. A., & Yee, D. (1989). Self-concepts, domain values, and self-esteem: Relations and changes at early adolescence. Journal of Personality, 57, 283–310. Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation to succeed. In N. Eisenberg & W. Damon (Eds.), Handbook of child psychology (5th ed., Vol. 3, pp. 1017–1095). New York: John Wiley and Sons. Education Commission of the States. (2006). Synthesis of reviews of “The value-added achievement gains of NBPTS-certified teachers in Tennessee: A brief report.” Retrieved October 30, 2006, from http://www.ecs.org/ html/special/nbpts/PanelReport.htm Educational Testing Service. (2004). Where we stand on teacher quality. Retrieved October 29, 2006, from http://ftp.ets.org/pub/corp/position_ paper.pdf Ehrenberg, R. G., & Brewer, D. J. (1995). Did teacher verbal ability and race matter in the 1960s? Coleman revisited. Economics of Education Review, 14(1), 1–21. Eide, E., Goldhaber, D., & Brewer, D. (2004). The teacher labour market and teacher quality. Oxford Review of Economic Policy, 20(2), 230–244. Employee Relocation Council. (2006). Workforce mobility facts page. Retrieved December 21, 2006, from http://www.erc.org/who_is_ERC/facts.shtml Fallick, B., & Fleischman, C. A. (2004). Employer-to-employer flows in the U.S. labor market: The complete picture of gross worker flows. Washington, DC: Federal Reserve Board. Farrington, D. P., Langan, P. A., & Tonry, M. (2004). Cross-national studies in crime and justice: England and Wales, United States, Australia, Canada, Netherlands,

200    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Scotland, Sweden, Switzerland. (Report No. NCJ 200988). Washington, DC: Bureau of Justice Statistics, U.S. Department of Justice. Feistritzer, C. E. (2005). Profile of alternative route teachers. Washington, DC: National Center for Education Information. Feistritzer, C. E. (2006). Alternative teacher certification: A state-by-state analysis. Retrieved December 22, 2006, from http://www.teach-now.org/booktoc2006.html Ferguson, R. F. (1991). Paying for public education: New evidence on how and why money matters. Harvard Journal on Legislation, 28(2), 465–498. Ferguson, R. F. (1998a). Can schools narrow the Black-White test score gap? In C. Jencks & M. Phillips (Eds.), The Black-White test score gap (pp. 318–374). Washington, DC: Brookings Institution Press. Ferguson, R. F. (1998b). Teachers’ perceptions and expectations and the BlackWhite test score gap. In C. Jencks & M. Phillips (Eds.), The Black-White test score gap (pp. 273–317). Washington, DC: The Brookings Institution Press. Ferguson, R. F., & Ladd, H. F. (1996). How and why money matters: An analysis of Alabama schools. In H. F. Ladd (Ed.), Holding schools accountable: Performance-based reform in education (pp. 265–298). Washington, DC: The Brookings Institution. Fetler, M. (1999). High school staff characteristics and mathematics test results. Education Policy Analysis Archives, 7(9). Retrieved from http://epaa.asu. edu/ojs/article/viewFile/544/667 Findley, M. J., & Cooper, H. M. (1983). Locus of control and academic achievement: A literature review. Journal of Personality and Social Psychology, 44(2), 419–427. Finkelstein, N. W., & Ramey, C. T. (1977). Learning to control the environment in infancy. Child Development, 48, 806–819. Finn, J. D., Gerber, S. B., Achilles, C. M., & Boyd-Zaharias, J. (2001). The enduring effects of small classes. Teachers College Record, 103(2), 145–183. Fitz-enz, J. (1997). It’s costly to lose good employees. Workforce, 76, 50–51. Flores-Lagunes, A., & Light, A. (2004). Identifying sheepskin effects in the returns to education. Tucson, AZ: Department of Economics, University of Arizona. Flores-Lagunes, A., & Light, A. (2007). Interpreting sheepskin effects in the returns to education. Unpublished manuscript. University of Arizona and Ohio State University. Florida Department of Education. (2006, July). Opportunity Scholarship Program. Retrieved June 1, 2007, from http://www.floridaschoolchoice.org/ Information/OSP/files/Fast_Facts_OSP.pdf Fuchs, L. S., & Fuchs, D. (1986). Effects of systematic formative evaluation: A meta-analysis. Exceptional Children, 53(3), 199–208. Gayler, K., Chudowsky, N., Hamilton, M., Kober, N., & Yeager, M. (2004). State high school exit exams: A maturing reform. Washington, DC: Center on Education Policy.

References    201

Gitomer, D. H., Latham, A. S., & Ziomek, R. (1999). The academic quality of prospective teachers: The impact of admissions and licensure testing. (Report No. RR-03-35). Princeton, NJ: Educational Testing Service. Goe, L. (2002). Legislating equity: The distribution of emergency permit teachers in California. Education Policy Analysis Archives, 10(42). Retrieved from http://epaa.asu.edu/ojs/article/viewFile/321/447 Goldhaber, D. (2007). Everyone’s doing it, but what does teacher testing tell us about teacher effectiveness? Journal of Human Resources, 42(4), 765–794. Goldhaber, D., & Anthony, E. (2007). Can teacher quality be effectively assessed? National Board certification as a signal of effective teaching. Review of Economics and Statistics, 89(1), 134–150. Goldhaber, D., & Brewer, D. J. (2000). Does teacher certification matter? High school certification status and student achievement. Educational Evaluation and Policy Analysis, 22, 129–145. Goldhaber, D., Brewer, D. J., & Anderson, D. (1999). A three-way error components analysis of educational productivity. Education Economics, 7(3), 199–208. Goldhaber, D., Perry, D., & Anthony, E. (2003). NBPTS certification: Who applies and what factors are associated with success? Seattle: University of Washington. Goldhaber, D. D. (2002). The mystery of good teaching: Surveying the evidence on student achievement and teachers’ characteristics. Education Next, 2(1), 50–55. Goodlad, J. I. (1984). A place called school. New York: McGraw Hill. Gordon, R., Kane, T. J., & Staiger, D. O. (2006). Identifying effective teachers using performance on the job (Discussion Paper 2006-01). Washington, DC: The Brookings Institution. Gottfried, A. E., Fleming, J. S., & Gottfried, A. W. (2001). Continuity of academic intrinsic motivation from childhood through late adolescence: A longitudinal study. Journal of Educational Psychology, 93(1), 3–13. Gottlob, B. J. (2004). The fiscal impacts of school choice in New Hampshire. Concord, NH: The Josiah Bartlett Center for Public Policy. Gottschalck, A. O. (2006). Dynamics of economic well-being: Spells of unemployment 2001–2003. (Report No. P70-105). Washington, DC: U.S. Census Bureau. Gramlich, E. M. (1997). A guide to benefit–cost analysis (2nd ed.). Long Grove, IL: Waveland Press. Green, A., & Pope, K. (2006). Trends report. Retrieved November 29, 2006, from http://www.alliancelibrarysystem.com/pdf/Trends2006.pdf Greene, J. P., Peterson, P. E., & Du, J. (1999). Effectiveness of school choice: The Milwaukee experiment. Education and Urban Society, 31(2), 190–213. Greene, J. P., & Winters, M. A. (2004). Competition passes the test: Vouchers improve public schools in Florida. Education Next, 4(3), 66–71. Greenwald, R., Hedges, L. V., & Laine, R. D. (1996). The effect of school resources on student achievement. Review of Educational Research, 66(3), 361–396.

202    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Grissmer, D. (2002). Cost-effectiveness and cost-benefit analysis: The effect of targeting interventions. In H. M. Levin & P. J. McEwan (Eds.), Cost-effectiveness and educational policy (pp. 97–110). Larchmont, NY: Eye on Education. Grissmer, D. W., Flanagan, A., Kawata, J., & Williamson, S. (2000). Improving student achievement: What state NAEP test scores tell us. Santa Monica, CA: RAND Corporation. Grodsky, E., Warren, J. R., & Kalogrides, D. (2009). State high school exit examinations and NAEP long-term trends in reading and mathematics, 1971–2004. Educational Policy, 23(4), 589–614. Hakel, M. D., Koenig, J. A., & Elliott, S. W. (Eds.). (2008). Assessing accomplished teaching: Advanced-level certification programs. Washington, DC: The National Academies Press. Hanushek, E. A. (1986). The economics of schooling: Production and efficiency in public schools. Journal of Economic Literature, 24, 1141–1171. Hanushek, E. A. (1989). The impact of differential expenditures on school performance. Educational Researcher, 18(4), 45–50. Hanushek, E. A. (1992). The trade-off between child quantity and quality. Journal of Political Economy, 100(1), 84–117. Hanushek, E. A. (1996). A more complete picture of school resource policies. Review of Educational Research, 66(3), 397–409. Hanushek, E. A. (1997). Assessing the effects of school resources on student performance: An update. Educational Evaluation and Policy Analysis, 19(2), 141–164. Hanushek, E. A. (1999). Some findings from an independent investigation of the Tennessee STAR experiment and from other investigations of class size effects. Educational Evaluation and Policy Analysis, 21(2), 143–163. Hanushek, E. A., Kain, J. F., O’Brien, D. M., & Rivkin, S. G. (2005). The market for teacher quality. (Working Paper No. 11154). Cambridge, MA: National Bureau of Economic Research. Hanushek, E. A., Kain, J. F., & Rivkin, S. G. (1999). Do higher salaries buy better teachers? (Working Paper No. 7082). Cambridge, MA: National Bureau of Economic Research. Hanushek, E. A., Kain, J. F., Rivkin, S. G., & Branch, G. F. (2007). Charter school quality and parental decision making with school choice. Journal of Public Economics, 91(5–6), 823–848. Hanushek, E. A., & Raymond, M. E. (2005). Does school accountability lead to improved student performance? Journal of Policy Analysis and Management, 24(2), 297–327. Harris, D. N., & Sass, T. R. (2007). The effects of NBPTS-certified teachers on student achievement. Madison: University of Wisconsin. Harter, S. (1978). Pleasure derived from optimal challenge and the effects of extrinsic rewards on children’s difficulty level choices. Child Development, 49, 788–799.

References    203

Harter, S. (1981). A new self-report scale of intrinsic versus extrinsic orientation in the classroom: Motivational and informational components. Developmental Psychology, 17, 300–312. Hartley, S. S. (1978). Meta-analysis of the effects of individually paced instruction in mathematics. Unpublished doctoral dissertation. University of Colorado. Dissertation Abstracts International, 1978, 38(7-A), 4003. (University Microfilms No. 77-29,926). Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. Hawk, P., Coble, C. R., & Swanson, M. (1985). Certification: It does matter. Journal of Teacher Education, 36(3), 13–15. Hedges, L. V., Laine, R. D., & Greenwald, R. (1994). Money does matter somewhere: A reply to Hanushek. Educational Researcher, 23(4), 9–10. Heid, C., & Webber, A. (1999). School-level implementation of standards-based reform: Findings from the Public School Survey on Education Reform. Retrieved October 1, 2006, from http://www.ed.gov/pubs/SchoolSurvey/title.html Heinesen, E. (2004). Determinants of local public school expenditure: A dynamic panel data model. Regional Science and Urban Economics, 34, 429– 453. Herman, A. (2000). Report on the youth labor force. Washington, DC: U.S. Department of Labor, Bureau of Labor Statistics. Herman, R., Aladjem, D., McMahon, P., Masem, E., Mulligan, I., O’Malley, et al. (1999). An educator’s guide to schoolwide reform. Arlington, VA: American Institutes for Research. Hill, P. T., Angel, L., & Christensen, J. (2006). Charter school achievement studies. Education Finance and Policy, 1(1), 139–150. Hiroto, D. S., & Seligman, M. E. P. (1975). Generality of learned helplessness in man. Journal of Personality and Social Psychology, 31, 311–327. Hirsch, C. R. (1976). A review of research on individualized instruction in secondary mathematics. School Science and Mathematics, 76, 499–507. Holmes, G. M., Desimone, J., & Rupp, N. G. (2003). Does school choice increase school quality? (NBER Working Paper No. 9683). Cambridge, MA: National Bureau of Economic Research. Holmes, G. M., Desimone, J., & Rupp, N. G. (2006). Friendly competition: Does the presence of charters spur public schools to improve? Education Next, 6(1), 67–70. Hopkins, D., & Stern, D. (1996). Quality teachers, quality schools: International perspectives and policy implications. Teaching and teacher education, 12(5), 501–517. Horn, J., & Miron, G. (2000). An evaluation of the Michigan Charter School Initiative: Performance, accountability, and impact. Kalamazoo, MI: The Evaluation Center, Western Michigan University.

204    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Howell, W. G., Wolf, P. J., Campbell, D. E., & Peterson, P. E. (2002). School vouchers and academic performance: Results from three randomized field trials. Journal of Policy Analysis and Management, 21(2), 191–217. Hoxby, C. M. (2000a). The effects of class size on student achievement: New evidence from population variation. The Quarterly Journal of Economics, 115(4), 1239–1285. Hoxby, C. M. (2000b). Testimony by Caroline M. Hoxby: The rising cost of college tuition and the effectiveness of government financial aid. U.S. Senate, 106th Congress, 2d Sess. Hoxby, C. M. (2002). The cost of accountability. In W. M. Evers & H. J. Walberg (Eds.), School accountability: An assessment by the Koret task force on K–12 education (pp. 47–73). Stanford, CA: Hoover Institution Press. Hoxby, C. M. (2002). How school choice affects the achievement of public school students. In P. T. Hill (Ed.), Choice with equity (pp. 141–177). Stanford, CA: Hoover Institution Press. Hoxby, C. M., & Rockoff, J. E. (2005). Findings from the city of big shoulders. Education Next, 5(4), 52–58. Hruz, T. (2000). The costs and benefits of smaller classes in Wisconsin: A further evaluation of the SAGE Program. Thiensville, WI: Wisconsin Policy Research Institute. Hummel-Rossi, B., & Ashdown, J. (2002). The state of cost-benefit and cost-effectiveness analyses in education. Review of Educational Research, 72, 1–30. Ingels, S. J., Burns, L. J., Charleston, S., Chen, X., Cataldi, E. F., & Owings, J. A. (2005). A Profile of the American high school sophomore in 2002: Initial results from the base year of the Education Longitudinal Study of 2002 (NCES 2005-338). Retrieved August 22, 2006, from http://nces.ed.gov/ pubs2005/2005338.pdf Ingersoll, R. M. (2001). Teacher turnover and teacher shortages: An organizational analysis. American Educational Research Journal, 38(3), 499–534. Isley, P. W., & Rosenman, R. (1998). Using revealed preference to evaluate the reservation wage and the value of leisure. Litigation Economics Digest, 3(1), 61–67. Jacobs, J. E., Lanza, S., Osgood, D. W., Eccles, J. S., & Wigfield, A. (2002). Changes in children’s self-competence and values: Gender and domain differences across grades one through twelve. Child Development, 73, 509–527. Jepsen, C., & Rivkin, S. (2002). What is the tradeoff between smaller classes and teacher quality? (Working Paper No. 9205). Cambridge, MA: National Bureau of Economic Research. Jovanovic, B. (1979). Job matching and the theory of turnover. Journal of Political Economy, 87, 972–990. Jürges, H., & Schneider, K. (2007). Fair ranking of teachers. Empirical economics, 32(2–3), 411–431. Kalechstein, A. D., & Nowicki, S., Jr. (1997). A meta-analytic examination of the relationship between control expectancies and academic achievement:

References    205

An 11-yr follow-up to Findley and Cooper. Genetic, Social, & General Psychology Monographs, 123(1), 27–56. Kan, K. (2002). Residential mobility with job location uncertainty. Journal of Urban Economics, 52, 501–523. Kane, T. J. (1999). The price of admission: Rethinking how Americans pay for college. Washington, DC: Brookings Institution Press. Kearns, D., & Anderson, J. (1996). Sharing the vision: Creating New American Schools. In S. Stringfield, S. Ross & L. Smith (Eds.), Bold plans for school restructuring (pp. 9–23). Mahwah, NJ: Erlbaum. Keeley, M. C., & Robins, P. K. (1985). Government programs, job search requirements, and the duration of unemployment. Journal of Labor Economics, 3(3), 337–362. Keith, T. Z., Pottebaum, S. M., & Eberhart, S. (1986). Effects of self concept and locus of control on academic achievement: A large sample path analysis. Journal of Psychoeducational Assessment, 4(1), 61–72. Kennelly, K. J., Dietz, D., & Benson, P. (1985). Reinforcement schedules, effort vs. ability attributions, and persistence. Psychology in the Schools, 22(4), 459–464. King, J. R. (1994). Meeting the educational needs of at-risk students: A cost analysis of three models. Educational Evaluation and Policy Analysis, 16(1), 1–19. Klein, D. C., Fencil-Morse, E., & Seligman, M. E. P. (1976). Learned helplessness, depression, and the attribution of failure. Journal of Personality and Social Psychology, 33, 508–516. Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. Psychological Bulletin, 119(2), 254–284. Koedel, C., & Betts, J. R. (2007). Re-examining the role of teacher quality in the educational production function. (Working Paper No. 2007-03). Columbia: University of Missouri. Krashen, S. D. (2003). The (lack of) experimental evidence supporting the use of Accelerated Reader. Journal of Children’s Literature, 29(2), 9, 16–30. Krashen, S. D. (2005). Accelerated Reader: Evidence still lacking. Knowledge Quest, 33(3), 48–49. Krueger, A. (2002). Understanding the magnitude and effect of class size on student achievement. In L. Mishel & R. Rothstein (Eds.), The class size debate (pp. 7–35). Washington, DC: Economic Policy Institute. Krueger, A. B. (2003). Economic considerations and class size. The Economic Journal, 113(Feb), F34–F63. Krueger, A. B., & Whitmore, D. M. (2002). Would smaller classes help close the Black-White achievement gap? In J. E. Chubb & T. Loveless (Eds.), Bridging the Achievement Gap (pp. 11–46). Washington, DC: Brookings Institution. Kulik, J. A. (1981, April). Integrating findings from different levels of instruction. Paper presented at the annual meeting of the American Educational Re-

206    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

search Association, Los Angeles. (ERIC Document Reproduction Service No. ED 208 040) Ladd, H. F., Sass, T. R., & Harris, D. N. (2007, February 28). The impact of National Board certified teachers on student achievement in Florida and North Carolina: A summary of the evidence prepared for the National Academies Committee on the Evaluation of the Impact of Teacher Certification by NBPTS. Retrieved June 25, 2010, from http://www7.nationalacademies.org/BOTA/NBPTS-MTG4-Sass-paper.pdf Lavy, V. (2002). Evaluating the effect of teachers’ performance incentives on pupils’ achievements. Journal of Political Economy, 110, 1286–1317. Lavy, V. (2007). Using performance-based pay to improve the quality of teachers. The Future of Children, 17(1), 87–109. Lee, J. (2008). Is test-driven external accountability effective? Synthesizing evidence from cross-state causal-comparative and correlational studies. Review of Educational Research, 78(3), 608–644. Levin, H., Belfield, C., Muennig, P., & Rouse, C. (2007a). The costs and benefits of an excellent education for America’s children. Available from http:// www.cbcse.org/media/download_gallery/Leeds_Report_Final_Jan2007. pdf Levin, H., Belfield, C., Muennig, P., & Rouse, C. (2007b). The public returns to public educational investments in African American males. Economics of Education Review, 26, 700–709. Levin, H. M. (1988). Cost-effectiveness and educational policy. Educational Evaluation and Policy Analysis, 10(1), 51-69. Levin, H. M. (1991). Cost-effectiveness at quarter century. In M. W. McLaughlin & D. C. Phillips (Eds.), Evaluation and education at quarter century (pp. 188–209). University of Chicago Press. Levin, H. M. (1998). Educational vouchers: Effectiveness, choice, and costs. Journal of Policy Analysis and Management, 17(3), 373–392. Levin, H. M., & Driver, C. E. (1997). Costs of an educational vouchers system. Education Economics, 5(3), 265–283. Levin, H. M., Glass, G. V., & Meister, G. (1987). A cost-effectiveness analysis of computer-assisted instruction. Evaluation Review, 11(1), 50–72. Levin, H. M., & McEwan, P. J. (2001). Cost effectiveness analysis: Methods and applications (2nd ed.). Thousand Oaks, CA: Sage. Levin, J. (2001). For whom the reductions count: A quantile regression analysis of class size and peer effects on scholastic achievement. Empirical Economics, 26, 221–246. Lewin Group and the Nelson A. Rockefeller Institute of Government. (2004). Spending on social welfare programs in rich and poor states. Report prepared for the Assistant Secretary for Planning and Evaluation, Department of Health and Human Services. Lewis, M., Sullivan, M., & Brooks-Gunn, J. (1985). Emotional behavior during the learning of a contingency in infancy. British Journal of Developmental Psychology, 3, 307–316.

References    207

Little, J., & Bartlett, L. (2002). Career and commitment in the context of comprehensive school reform. Teachers and teaching: Theory and practice, 8(3– 4), 345–354. Liu, P.-W. (1986). Human capital, job matching and earnings growth between jobs: An empirical analysis. Applied Economics, 18, 1135–1147. Lochner, L., & Moretti, E. (2004). The effect of education on crime: Evidence from prison inmates, arrests, and self-reports. The American Economic Review, 94(1), 155–189. Ludwig, J., & Phillips, D. A. (2008). Long-term effects of Head Start on low-income children. Annals of the New York Academy of Sciences, 1136, 257–268. Lysakowski, R. S., & Walberg, H. J. (1982). Instructional effects of cues, participation, and corrective feedback: A quantitative synthesis. American Educational Research Journal, 19, 559–578. Mac Iver, D. (1988). Classroom environments and the stratification of pupils’ ability perceptions. Journal of Educational Psychology, 80(4), 495–505. Mac Iver, D. J., & Reuman, D. A. (1993/1994). Giving their best: Grading and recognition practices that motivate students to work hard. American Educator, 17(4), 24–31. Mac Iver, D. J., Reuman, D. A., & Main, S. R. (1995). Social structuring of the school: Studying what is, illuminating what could be. Annual Review of Psychology, 46, 375–400. Mac Iver, D. J., Stipek, D. J., & Daniels, D. H. (1991). Explaining within-semester changes in student effort in junior high school and senior high school courses. Journal of Educational Psychology, 83(2), 201–211. Maier, P., Molnar, A., Percy, S., Smith, P., & Zahorik, J. (1997). First-year results of the Student Achievement Guarantee in Education Program. SAGE Evaluation Team, Center for Urban Initiatives and Research, University of WisconsinMilwaukee. Manski, C. F. (1987). Academic ability, earnings, and the decision to become a teacher: Evidence from the National Longitudinal Study of the high school class of 1972. In D. Wise (Ed.), Public sector payrolls (pp. 291–312). University of Chicago Press. Marsh, H. W. (1989). Age and sex effects in multiple dimensions of self-concept: Preadolescence to early adulthood. Journal of Educational Psychology, 81, 417–430. Martin, J. A., Hamilton, B. E., Sutton, P. D., Ventura, S. J., Menacker, F., & Kirmeyer, S. (2006). National vital statistics reports, Vol. 55, Number 1, Births: Final data for 2004. Retrieved December 1, 2007, from http:// www.cdc.gov/nchs/data/nvsr/nvsr55/nvsr55_01.pdf Marvel, J., Lyter, D. M., Peltola, P., Strizek, G. A., Morton, B. A., & Rowland, R. (2007). Teacher attrition and mobility: Results from the 2004–05 teacher followup survey. (NCES 2007-307). Washington, DC: National Center for Education Statistics.

208    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Mattingly, D., Prislin, R., McKenzie, T., Rodriquez, J., & Kayzar, B. (2002). Evaluating evaluations: The case of parent involvement programs. Review of Educational Research, 72(4), 549–576. Mayer, S. E., & Peterson, P. E. (1999). The costs and benefits of school reform. In S. E. Mayer & P. E. Peterson (Eds.), Earning and learning: How schools matter (pp. 341–354). Washington, DC: Brookings Institution Press. McCaffrey, D. F., & Rivkin, S. G. (2007). Empirical investigations of the effects of National Board of Professional Teacher Standards certified teachers on student outcomes: A report prepared for the National Research Council. Available from http://www7.nationalacademies.org/BOTA/NBPTSMTG4-McCaffrey-Paper.pdf McCaffrey, D. F., Sass, T. R., & Lockwood, J. R. (2008). The intertemporal stability of teacher effect estimates. Santa Monica, CA: RAND. McChrystal, M. K., Gleisner III, W. C., & Kuborn, M. J. (1999). Laptop litigation: The impact of technology on litigation. Wisconsin Lawyer, 72(9). Retrieved from http://www.wisbar.org/AM/Template.cfm? Section=Home&TEMPLATE=/CM/ContentDisplay.cfm&CONTENTID =48357 McColskey, W., Stronge, J. H., Ward, T. J., Tucker, P. D., Howard, B., Lewis, K., et al. (2005). Teacher effectiveness, student achievement, and National Board certified teachers: A comparison of National Board certified teachers and non-National Board certified teachers: Is there a difference in teacher effectiveness and student achievement? Retrieved October 30, 2006, from http://www.nbpts.org/UserFiles/File/Teacher_Effectiveness_Student_Achievement_and_National_Board_Certified_ Teachers_D_-_McColskey.pdf McMullin, D. J., & Steffen, J. J. (1982). Intrinsic motivation and performance standards. Social Behavior and Personality, 10(1), 47–56. Miles, K. H., & Roza, M. (2004). Understanding student-based budgeting as a means to greater school resource equity.Unpublished manuscript. Wayland, MA. Miller, R. L. (1976). Individualized instruction in mathematics: A review of research. Mathematics Teacher, 69, 345–351. Mishel, L., & Roy, J. (2006). Rethinking high school graduation rates and trends. Washington, DC: Economic Policy Institute. Mohanty, M. S. (2005). An alternative method of estimating the worker’s reservation wage. International Economic Journal, 19(4), 501–522. Molnar, A., Smith, P., & Zahorik, J. (1999). 1998–99 Evaluation results of the Student Achievement Guarantee in Education (SAGE) Program. University of Wisconsin-Milwaukee, School of Education. Molnar, A., Smith, P., & Zahorik, J. (2000). 1999–2000 Evaluation results of the Student Achievement Guarantee in Education (SAGE) Program. University of Wisconsin-Milwaukee, School of Education. Molnar, A., Smith, P., & Zahorik, J. (2001). 2000–2001 Evaluation results of the Student Achievement Guarantee in Education (SAGE) Program. University of Wisconsin-Milwaukee, School of Education.

References    209

Monk, D. H., & King, J. A. (1993). Cost analysis as a tool for education reform. In S. L. Jacobson & R. Berne (Eds.), Reforming education: The emerging systemic approach (pp. 131–152). Thousand Oaks, CA: Corwin Press. Murnane, R., Singer, J., Willett, J., Kemple, J., & Olsen, R. (1991). Who will teach? Policies that matter. Cambridge, MA: Harvard University Press. Murnane, R., Willet, J., & Levy, F. (1995). The growing importance of cognitive skills in wage determination. Review of Economics and Statistics, 77, 251–266. Murnane, R. J., Singer, J. D., & Willett, J. B. (1988). The career paths of teachers: Implications for teacher supply and methodological lessons for research. Educational Researcher, 17(6), 22–30. Musher Eizenman, D. R., Nesselroade, J. R., & Schmitz, B. (2002). Perceived control and academic performance: A comparison of high and low performing children on within-person change patterns. International Journal of Behavioral Development, 26(6), 540–547. National Academies. (2008). National Board certification identifies strong teachers, but many school systems are not using Board-certified teachers’ expertise. Retrieved July 12, 2009, from http://www8.nationalacademies. org/onpinews/newsitem.aspx?RecordID=12224 National Association of Colleges and Employers. (2006). A good job market sweetens salary offers to new graduates. Retrieved November 20, 2006, from http://www.jobweb.com/SalaryInfo/06_springupdate.htm National Board for Professional Teaching Standards. (2006a). About us. Retrieved October 29, 2006, from http://www.nbpts.org/about_us National Board for Professional Teaching Standards. (2006b). The certification process. Retrieved October 26, 2006, from http://www.nbpts.org/ become_a_candidate/assessment_process National Board for Professional Teaching Standards. (2006c). National Board certification: Eligibility verification forms and instructions. Retrieved October 26, 2006, from http://www.nbpts.org/UserFiles/File/2006_Eligibility_Verif.PDF National Center for Education Statistics. Digest of education statistics. (2007). Table 2: Enrollment in educational institutions, by level and control of institution: Selected years, fall 1980 through fall 2007. Retrieved March 1, 2008, from http://nces.ed.gov/programs/digest/d07/tables/dt07_002. asp?referrer=list National Center for Education Statistics. (2000). Characteristics of the 100 largest public elementary and secondary school districts in the United States: 1999–2000. Retrieved July 21, 2006, from http://nces.ed.gov/ pubs2001/100_largest/tableAA4.asp National Center for Education Statistics. (2001). Characteristics of the 100 largest public elementary and secondary school districts in the United States: 2000–2001. Retrieved July 21, 2006, from http://nces.ed.gov/ pubs2002/100_largest/table_02_1.asp

210    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

National Center for Education Statistics. (2005a). Common core of data. Retrieved September 24, 2006, from http://nces.ed.gov/ccd/districtsearch/ district_detail.asp?Search=1&details=1&InstName=Cleveland&City=cleve land&State=39&DistrictType=1&DistrictType=2&DistrictType=3&District Type=4&DistrictType=5&DistrictType=6&DistrictType=7&NumOfStuden tsRange=more&NumOfSchoolsRange=more&ID2=3904378 National Center for Education Statistics. (2005b). The condition of education. Retrieved March 13, 2007, from http://nces.ed.gov/programs/digest/ d05/tables/dt05_057.asp and http://nces.ed.gov/programs/digest/ d05/tables/dt05_033.asp National Center for Education Statistics. (2005c). Digest of education statistics. Retrieved October 26, 2006, from http://nces.ed.gov/programs/digest/ d05/tables/dt05_076.asp National Center for Education Statistics. (2005d). Digest of education statistics, Table 162. Total and current expenditure per pupil in public elementary and secondary schools: Selected years, 1919–20 through 2002–03. Retrieved September 8, 2006, from http://nces.ed.gov/programs/digest/ d05/tables/dt05_162.asp National Center for Education Statistics. (2005e). Digest of education statistics: 2005. Retrieved December 1, 2006, from http://nces.ed.gov/programs/ digest/d05/tables/dt05_004.asp National Center for Education Statistics. (2006a). Digest of education statistics: 2005. Retrieved December 1, 2006, from http://nces.ed.gov/programs/ digest/d05/tables/dt05_199.asp?referer=list National Center for Education Statistics. (2006b). Digest of education statistics: 2005. Retrieved December 1, 2006, from http://nces.ed.gov/programs/ digest/d05/tables/dt05_339.asp National Center for Education Statistics. (2007a). The condition of education: Table 34-2. Number and percentage distribution of school principals, by school level, school type, and selected professional characteristics: School years 1993–94, 1999–2000, and 2003–04. Retrieved September 9, 2007, from http://nces.ed.gov/programs/coe/2007/section4/table. asp?tableID=726 National Center for Education Statistics. (2007b). Digest of education statistics, 2006, Table 173: Enrollment, staff, and degrees conferred in postsecondary institutions participating in Title IV programs, by level and control of institution, sex of student, and type of degree: 2004–05 and fall 2005. Retrieved November 30, 2007, from http://nces.ed.gov/programs/digest/ d06/tables/dt06_173.asp?referrer=list National Center for Education Statistics. (2007c). Digest of education statistics, 2006, Table 354: Total expenditures of private for-profit degree-granting institutions, by purpose and type of institution: 2002–03 and 2003–04. Retrieved November 30, 2007, from http://nces.ed.gov/programs/digest/ d06/tables/dt06_354.asp?referrer=list

References    211

National Center for Education Statistics. (2007d). Digest of education statistics, 2006: Table 337: Revenues of public degree-granting institutions, by type of institution and source of revenue: 2003–04. Retrieved November 29, 2007, from http://nces.ed.gov/programs/digest/d06/tables/dt06_337. asp?referrer=list National Center for Education Statistics. (2007e). Digest of education statistics, 2006: Table 340: Total revenue of private not-for-profit degree-granting institutions, by source of funds and type of institution: 1996–97 through 2003–04. Retrieved November 29, 2007, from http://nces.ed.gov/programs/digest/d06/tables/dt06_340.asp?referrer=list National Center for Education Statistics. (2007f). Digest of education statistics, 2006: Table 343: Total revenue of private for-profit degree-granting institutions, by source of funds and type of institution: 2002–03 and 2003–04. Retrieved November 29, 2007, from http://nces.ed.gov/programs/digest/d06/tables/dt06_343.asp?referrer=list National Center for Education Statistics. (2007g). Digest of education statistics, 2006: Table 347: Expenses of public degree-granting institutions, by type of expense and type of institution: 2003–04. Retrieved November 29, 2007, from http://nces.ed.gov/programs/digest/d06/tables/dt06_347. asp?referrer=list National Center for Education Statistics. (2007h). Digest of education statistics, 2006: Table 353: Total expenditures of private not-for-profit degree-granting institutions, by purpose and type of institution: 2003–04. Retrieved November 29, 2007, from http://nces.ed.gov/programs/digest/d06/ tables/dt06_353.asp?referrer=list National Center for Education Statistics (2007i). Digest of education statistics, 2007, Table 2: Enrollment in educational institutions, by level and control of institution: Selected years, fall 1980 through fall 2007. Retrieved March 1, 2008, from http://nces.ed.gov/programs/digest/d07/tables/ dt07_002.asp?referrer=list National School Boards Association. (2003, May 20). GAO: NCLB testing will cost $1.9 billion. Retrieved July 26, 2006, from http://nsba.org/site/doc_ sbn.asp?TRACKID=&DID=11782&CID=1201 National Science Foundation. (undated). Education: Lessons about learning. Retrieved September 6, 2006, from http://www.nsf.gov/about/history/ nsf0050/pdf/education.pdf NBPTS. (2010). Choosing excellence: National Board certification Q & A guide for teachers and other educators. Retrieved August 14, 2010, from http://issuu.com/nbpts/docs/nbptsqa?mode=embed&layout=http%3A %2F%2Fskin.issuu.com%2Fv%2Fcolor%2Flayout.xml&backgroundColor =154174&showFlipBtn=true Neal, D., & Johnson, W. R. (1996). The role of pre-market factors in black-white wage differences. Journal of Political Economy, 104, 869–895. Neas, R. G. (2001). Five years and counting: A closer look at the Cleveland voucher program. Washington, DC: People for the American Way Foundation.

212    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Nelson, F. H. (1997). How much thirty thousand charter schools cost. Retrieved July 27, 2006, from http://www.aft.org/topics/charters/downloads/howmuch.pdf Nelson, F. H., Muir, E., & Drown, R. (2003). Paying for the vision: Charter school revenue and expenditures. National charter school finance study. Washington, DC: American Federation of Teachers. Northwest Regional Educational Laboratory. (1998). Catalog of school reform models: First edition. Portland, OR: Author. (ERIC Document Reproduction Service No. ED439510) Northwest Regional Educational Laboratory. (2005). Catalog of school reform models. Retrieved July 5, 2006, from http://www.nwrel.org/scpd/catalog/index.html Northwest Regional Educational Laboratory. (2006, October 4). The catalog of school reform models. Retrieved March 5, 2007, from http://www.nwrel. org/scpd/catalog/ModelDetails.asp?ModelID=48 Nunnery, J. A., Ross, S. M., & McDonald, A. (2006). A randomized experimental evaluation of the impact of Accelerated Reader/Reading Renaissance implementation on reading achievement in grades 3 to 6. Journal of Education for Students Placed At Risk, 11(1), 1–18. Nye, B., Hedges, L. V., & Konstantopoulos, S. (2001). Are effects of small classes cumulative? Evidence from a Tennessee experiment. Journal of Educational Research, 94(6), 336–345. Nye, B., Konstantopoulos, S., & Hedges, L. V. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26(3), 237–257. Nygard, R. (1977). Personality, situation, and persistence: A study with emphasis on achievement motivation. Oslo: Universitetsforlaget. O’Neill, J. (1990). The role of human capital in earnings differences between black and white men. Journal of Economic Perspectives, 4(4), 25–46. Odden, A. (2000). The costs of sustaining educational change through comprehensive school reform. Phi Delta Kappan, 81(6), 433–438. OECD. (1994). Quality in teaching. Paris: OECD. Organisation for Economic Cooperation and Development. (2006). Education at a glance, Table A1.5: Educational attainment expressed in average number of years in formal education (2004). Retrieved December 1, 2007, from http://www.oecd.org/document/6/0,3343,en_2649_392632 38_37344774_1_1_1_1,00.html Orzechowski, S., & Sepielli, P. (2003). Net worth and asset ownership of households: 1998 and 2000. Washington, DC: U.S. Census Bureau. Parsad, B., & Jones, J. (2005). Internet access in U.S. public schools and classrooms: 1994–2003 (NCES 2005-015). Washington, DC: National Center for Education Statistics, U.S. Department of Education. Patrick, B. C., Skinner, E. A., & Connell, J. P. (1993). What motivates children’s behavior and emotion? Joint effects of perceived control and autonomy in the academic domain. Journal of Personality and Social Psychology, 65(4), 781–791.

References    213

Pedulla, J. J., Abrams, L. M., Madaus, G. F., Russell, M. K., Ramos, M. A., & Miao, J. (2003). Perceived effects of state-mandated testing programs on teaching and learning: Findings from a national survey of teachers. Chestnut Hill, MA: National Board on Educational Testing and Public Policy, Lynch School of Education, Boston College. Pennsylvania Economy League Eastern Division. (2001). Meeting the challenge: Managing the fiscal impact of charter schools in Pennsylvania. Greater Philadelphia Urban Affairs Coalition. Perlmutter, L. C., & Monty, R. A. (1977). The importance of perceived control: Fact or fantasy? American Scientist, 65, 759–765. Peterson, C., Maier, S. F., & Seligman, M. E. P. (1995). Learned helplessness: A theory for the age of personal control. New York: Oxford University Press. Phelps, R. P. (Ed.). (2005). Defending standardized testing. Mahwah, NJ: Lawrence Erlbaum. Podgursky, M. (2004). Teacher licensing in U.S. public schools: The case for simplicity and flexibility. University of Missouri, Columbia. Podgursky, M. (n.d.). Response to “Does teacher preparation matter?” Retrieved October 30, 2006, from http://www.nctq.org/nctq/research/1114011044390.pdf Quartz, K. H., Lyons, K. B., Masyn, K., Olsen, B., Anderson, L., Thomas, A., et al. (2004). Retention report series: A longitudinal study of career urban educators. (Document rrs-rr006-0704). University of California, Los Angeles. Ramey, C. T., Campbell, F. A., Burchinal, M., Skinner, M. L., Gardner, D. M., & Ramey, S. L. (2000). Persistent effects of early childhood education on high-risk children and their mothers. Applied Developmental Science, 4, 2–14. Redlich, J. D., Debus, R. L., & Walker, R. (1986). The mediating role of attribution and self efficacy variables for treatment effects on achievement outcomes. Contemporary Educational Psychology, 11, 195–216. Reichardt, R. E. (2000). The cost of class size reduction: Advice for policymakers. Santa Monica, CA: RAND Graduate School of Policy Studies. Reilly, R. (no date). There is a printer and a Xerox machine in your classroom that you can’t see! Retrieved August 5, 2006, from http://www.teachers. net/gazette/OCT02/reilly.html Renaissance Learning. (2002). Results from a three-year state-wide implementation of Reading Renaissance in Idaho. Madison, WI: Renaissance Learning. Renaissance Learning. (no date). STAR Reading: Understanding reliability and validity. Wisconsin Rapids, WI: Renaissance Learning. Reynolds, A. J., Ou, S. R., & Topitzes, J. W. (2004). Paths of effects of early childhood intervention on educational attainment and delinquency: A confirmatory analysis of the Chicago Child-Parent Centers. Child Development, 75(5), 1299–1328. Rice, J. K., & Hall, L. J. (2008). National Board certification for teachers: What does it cost and how does it compare? Education Finance and Policy, 3(3), 339–373.

214    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Rivkin, S. G., Hanushek, E. A., & Kain, J. F. (2005). Teachers, schools and academic achievement. Econometrica, 73(2), 417–458. Robinson, S. L., DePascale, C., & Roberts, F. C. (1989). Computer delivered feedback in group based instruction: Effects for learning disabled students in mathematics. Learning Disabilities Focus, 5(1), 28–35. Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review Papers and Proceedings (May), 247–252. Rosenshine, B. (2003). High-stakes testing: Another analysis. Education Policy Analysis Archives, 11(24). Retrieved from http://epaa.asu.edu/epaa/ v11n24/ Ross, C. E., & Broh, B. A. (2000). The roles of self esteem and the sense of personal control in the academic achievement process. Sociology of Education, 73(4), 270–284. Ross, S. M., & Gil, L. (2004). The past and future of comprehensive school reform: Perspectives from a researcher and practitioner. In C. T. Cross (Ed.), Putting the pieces together: Lessons from comprehensive school reform research (pp. 151–174). Washington, DC: The National Clearinghouse for Comprehensive School Reform. Ross, S. M., Nunnery, J., & Goldfeder, E. (2004). A randomized experiment on the effects of Accelerated Reader/Reading Renaissance in an urban school district: Final evaluation report. Memphis, TN: Center for Research in Educational Policy, University of Memphis. Rouse, C. E. (1998). Private school vouchers and student achievement: An evaluation of the Milwaukee Parental Choice Program. The Quarterly Journal of Economics, 113(2), 553–602. Rowan, B., Barnes, C., & Camburn, E. (2004). Benefiting from comprehensive school reform: A review of research on CSR implementation. In C. T. Cross (Ed.), Putting the pieces together: Lessons from comprehensive school reform research (pp. 1–52). Washington, DC: The National Clearinghouse for Comprehensive School Reform. Ryan, R. M., Connell, J. P., & Deci, E. L. (1985). A motivational analysis of selfdetermination and self-regulation in education. In C. Ames & R. Ames (Eds.), Research on motivation in education, Volume 2: The classroom milieu (pp. 13–51). Orlando, FL: Academic Press. Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55, 68–78. Ryan, R. M., Mims, V., & Koestner, R. (1983). The relationship of reward contingency and interpersonal context to intrinsic motivation: A review and test using cognitive evaluation theory. Journal of Personality and Social Psychology, 45, 736–750. Sanders, W. L., Ashton, J. J., & Wright, S. P. (2005). Comparison of the effects of NBPTS certified teachers with other teachers on the rate of student academic progress. Cary, NC: SAS Institute, Inc.

References    215

Sanders, W. L., & Horn, S. P. (1994). The Tennessee value-added assessment system (TVAAS): Mixed-method methodology in educational assessment. Journal of Personnel Evaluation, 8, 299–311. Sass, T. R. (2006). Charter schools and student achievement in Florida. Education Finance and Policy, 1(1), 91–122. Schiller, Z. (no date). Cleveland school vouchers: Where the students come from. Cleveland, OH: Policy Matters Ohio. Schoen, H. L. (1976). Self-paced instruction: How effective has it been in secondary and postsecondary schools. The Mathematics Teacher, 69, 352–357. Schrag, P. (2007). Policy from the hip: Class size reduction in California. In T. Loveless & F. Hess (Eds.), Brookings papers on education policy: 2006/2007 (pp. 229–243). Washington, DC: Brookings Institution Press. Schunk, D. H. (1981). Modeling and attributional feedback effects on children’s achievement: A self-efficacy analysis. Journal of Educational Psychology, 74, 93–105. Schunk, D. H. (1991). Self-efficacy and academic motivation. Educational Psychologist, 26, 207–231. Schunk, D. H., & Swartz, C. W. (1991). Writing strategy instruction with gifted students: Effects of goals and feedback on self-efficacy and skills. Roeper Review, 15, 225–230. Schweinhart, L. J., Barnes, H. V., & Weikart, D. P. (1993). Significant benefits: The High/Scope Perry Preschool Study through age 27. Ypsilanti, MI: High/Scope Press. Scriven, M. (1967). The methodology of evaluation. In R. W. Tyler, R. M. Gagné & M. Scriven (Eds.), Perspectives of curriculum evaluation (pp. 39–83). Chicago: Rand McNally. Seligman, M. E. P. (1975). Helplessness: On depression, development, and death. San Francisco: Freeman. Sikorski, M. F. (1990). Cost-effectiveness analysis of mastery learning and direct instruction in reading. Unpublished doctoral dissertation. Dekalb: Northern Illinois University. Skinner, E. A., Wellborn, J. G., & Connell, J. P. (1990). What it takes to do well in school and whether I’ve got it: A process model of perceived control and children’s engagement and achievement in school. Journal of Educational Psychology, 82(1), 22–32. Skinner, E. A., Zimmer-Gembeck, M. J., Connell, J. P., Eccles, J. S., & Wellborn, J. G. (1998). Individual differences and the development of perceived control. Monographs of the Society for Research in Child Development, 63(2/3), 1–231. Slavin, R. E., & Fashola, O. S. (1998). Show me the evidence! Thousand Oaks, CA: Corwin Press. Smith, D. E. P., Brethower, D., & Cabot, R. (1969). Increasing task behavior in a language arts program by providing reinforcement. Journal of Experimental Child Psychology, 8(1), 45–62.

216    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Smith, E. (2008). Raising standards in American schools? Problems with improving teacher quality. Teaching and teacher education, 24, 610–622. Smith, E., & Gorard, S. (2007). Improving teacher quality: Lessons from America’s No Child Left Behind. Cambridge Journal of Education, 37(2), 191– 206. Smith, P., Molnar, A., & Zahorik, J. A. (2003). Class size reduction in Wisconsin: A fresh look at the data. Tempe. AZ: Arizona State University, Education Policy Studies Laboratory. Sorensen, N. (1995). Measuring HR for success. Training and Development, 49, 49–51. Spies, R. (2001). The effect of rising costs on college choice. (Research Report Series No. 117). Princeton, NJ: Princeton University. Stancavage, F. B., Mitchell, J. H., Bandeira de Mello, V., Gaertner, F. E., Spain, A. K., Rahal, M. L., et al. (2006). National Indian education study part II: The educational experiences of fourth- and eighth-grade American Indian and Alaska Native students (NCES 2007-454). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics. Stecher, B. M., & Bohrnstedt, G. W. (Eds.). (2000). Class size reduction in California: The 1998–99 evaluation findings. Sacramento: California Department of Education. Stephan, J. J. (2004). State prison expenditures, 2001. (Report No. NCJ 202949). Washington, DC: Bureau of Justice Statistics, U.S. Department of Justice. Stephan, J. J., & Karberg, J. C. (2003). Census of state and federal correctional facilities, 2000. (NCJ No. 198272). Washington, DC: U.S. Department of Justice. Stone, J. E. (2002). The value-added achievement gains of NBPTS-certified teachers in Tennessee: A brief report. Retrieved October 30, 2006, from http://www.education-consumers.com/briefs/stoneNBPTS.shtm Straus, E. W., & Straus, A. (2006). Medical marvels: The 100 greatest advances in medicine. Amherst, NY: Prometheus. Strauss, R. P., & Sawyer, E. A. (1986). Some new evidence on teacher and student competencies. Economics of Education Review, 5(1), 41–48. Sullivan, D. (1997, March). Trends in real wage growth. Chicago Fed Letter, (115), 1–3. Taylor, J. E. (2006). The struggle to survive: Examining the sustainability of schools’ comprehensive school reform efforts. Journal of Education for Students Placed At Risk, 11(3&4), 331–352. Teddlie, C., & Stringfield, S. (1993). Schools do make a difference: Lessons learned from a 10 year study of school effects. New York: Teachers College Press. Tenenbaum, G., & Goldring, E. (1989). A meta-analysis of the effect of enhanced instruction: Cues, participation, reinforcement and feedback and correctives on motor skill learning. Journal of Research and Development in Education, 22, 53–64.

References    217

Texas Center for Educational Research. (2000). The cost of teacher turnover. Austin, TX: Author. Traub, J. (1999). Better by design? A consumer’s guide to schoolwide reform. Washington, DC: Thomas B. Fordham Foundation. Tyack, D., & Cuban, L. (1995). Tinkering toward utopia: A century of public school reform. Cambridge, MA: Harvard University Press. Urban Institute-Brookings Institution Tax Policy Center. (2004). State and local government finance data query system. Retrieved January 31, 2007, from http://www.taxpolicycenter.org/slf-dqs/pages.cfm U.S. Census Bureau. (2006). Annual demographic survey, March supplement. Retrieved January 10, 2007, from http://pubdb3.census.gov/macro/032006/pov/new29_100_01.htm U.S. Department of Agriculture. (2007). Food stamp program participation and costs. Retrieved November 17, 2007, from http://www.fns.usda.gov/ pd/fssummar.htm U.S. Department of Education. (1997). Digest of education statistics. (NCES 98015). Washington, DC: National Center for Education Statistics. U.S. Department of Education. (1998, April 30). Goals 2000: Reforming education to improve student achievement. Retrieved July 6, 2006, from http:// www.ed.gov/PDFDocs/g2kfinal.pdf U.S. Department of Education. (2002a). Public Law 107-110, 107th Congress: An act to close the achievement gap with accountability, flexibility, and choice, so that no child is left behind. Retrieved October 1, 2007, from http://www.ed.gov/policy/elsec/leg/esea02/107-110.pdf U.S. Department of Education. (2002b). Scientifically based research and the Comprehensive School Reform (CSR) Program. Retrieved June 30, 2006, from http://www.ed.gov/programs/compreform/legislation.html U.S. Department of Education. (2002c). Teacher professional development in Title I schools: Recent evidence from the National Longitudinal Survey of Schools. Retrieved October 1, 2006, from http://www.ed.gov/offices/ OUS/PES/professional_develop03.pdf U.S. Department of Education. (2004). Implementation and early outcomes of the Comprehensive School Reform Demonstration (CSRD) Program. (No. 2004-15). Jessup, MD: Policy and Program Studies Service, U.S. Department of Education. U.S. Department of Education. (2005a). CCD public school district data. Retrieved August 12, 2007, from http://nces.ed.gov/ccd/districtsearch/ district_detail.asp?start=0&ID2=5509600 U.S. Department of Education. (2005b). Common core of data: State nonfiscal survey. Retrieved August 12, 2007, from http://nces.ed.gov/ccd/bat/ U.S. Department of Education. (2005c). Digest of education statistics. Retrieved December 1, 2006, from http://nces.ed.gov/programs/digest/ d05/tables/dt05_075.asp

218    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

U.S. Department of Education. (2005d, September). Projections of education statistics to 2014. Retrieved August 12, 2007, from http://nces.ed.gov/ pubs2005/2005074.pdf U.S. Department of Education. (2005e). The secretary’s fourth annual report on teacher quality: A highly qualified teacher in every classroom. Retrieved October 12, 2006, from https://www.title2.org/TitleIIReport05. pdf. U.S. Department of Education. (2005f). State education data profiles. Retrieved August 12, 2007, from http://nces.ed.gov/programs/stateprofiles/sresult.asp?mode=short&s1=12 U.S. Department of Education. (2006, March 6). Comprehensive school reform program: Funding status. Retrieved July 6, 2006, from http://www. ed.gov/programs/compreform/funding.html U.S. Department of Education. (2007). Digest of education statistics 2006. (NCES 2007-017). Washington, DC: National Center for Education Statistics. U.S. Department of Justice. (2006). Crime in the United States, 2006, Table 38: Arrests by age. Retrieved November 2, 2007, from http://www.fbi.gov/ ucr/cius2006/data/table_38.html U.S. Department of Labor. (March 12).(2008a) Employer costs for employee compensation: December 2007. Retrieved March 26, 2008, from http:// www.bls.gov/news.release/pdf/ecec.pdf U.S. Department of Labor. (2008b) Median weekly earnings of full-time wage and salary workers by detailed occupation and sex. Retrieved March 1, 2008, from http://www.bls.gov/cps/cpsaat39.pdf U.S. Department of Labor. (2004, July). National compensation survey: Occupational wages in the United States, Bulletin 2576. Retrieved March 1, 2008, from http://www.bls.gov/ncs/ocs/sp/ncbl0757.pdf U.S. Department of Labor. (2005). TAA benefits: Job search allowances. Retrieved November 28, 2006, from http://www.ctdol.state.ct.us/TradeAct/ benefits/JS-allowance.htm U.S. Department of Labor. (2006a). Average number of jobs started by individuals from age 18 to age 40 in 1978–2004 by age and sex. Retrieved November 18, 2006, from http://www.bls.gov/nls/y79r21jobsbyage.pdf U.S. Department of Labor. (2006b, September 15). Consumer price index. Retrieved October 5, 2006, from ftp://ftp.bls.gov/pub/special.requests/ cpi/cpiai.txt U.S. General Accounting Office. (2001). School vouchers: Publicly funded programs in Cleveland and Milwaukee. (Report Number GAO-01-914). Washington, DC: U.S. General Accounting Office. U.S. General Accounting Office. (2003). Title I: Characteristics of tests will influence expenses; Information sharing may help states realize efficiencies. (Report No. GAO-03-389). Washington, DC: Author. University of Chicago School Mathematics Project. (2003). Welcome to the UCSMP Everyday Mathematics Center. Retrieved September 6, 2006, from http://everydaymath.uchicago.edu/

References    219

Vallerand, R. J., & Reid, G. (1984). On the causal effects of perceived competence on intrinsic motivation: A test of cognitive evaluation theory. Journal of Sport Psychology, 6, 94–102. van Ark, B., Kuipers, S. K., & Kuper, G. H. (Eds.). (2000). Productivity, technology and economic growth. Boston: Kluwer Academic Publishers. Vandevoort, L. G., Amrein-Beardsley, A., & Berliner, D. C. (2004). National Board certified teachers and their students’ achievement. Education Policy Analysis Archives, 12(46). Retrieved from http://epaa.asu.edu/epaa/ v12n46/. Vernez, G., Karam, R., Mariano, L. T., & DeMartini, C. (2006). Evaluating comprehensive school reform models at scale: Focus on implementation. (Report No. MG-546). Santa Monica, CA: RAND. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge, MA: Harvard University Press. Walberg, H. J. (1982). What makes schooling effective? Contemporary Education Review, 1, 1–34. Walberg, H. J., Strykowski, B. F., Rovai, E., & Hung, S. S. (1984). Exceptional performance. Review of Educational Research, 54(1), 87–112. Wang, H.-H., & Fwu, B.-J. (2007). In pursuit of teacher quality in diversity: A study of the selection mechanisms of new secondary teacher education programmes in Taiwan. International Journal of Educational Development, 27, 166–181. Wang, M. C., Haertel, G. D., & Walberg, H. (1997). What do we know? Widely implemented school improvement programs. Philadelphia: Temple University Center for Research in Human Development and Education. Warren, J. R., Jenkins, K. N., & Kulick, R. B. (2006). High school exit examinations and state-level completion and GED rates, 1975–2002. Educational Evaluation and Policy Analysis, 28(2), 131–152. Watson, J. (1972). Smiling, cooing and “the game.” Merrill-Palmer Quarterly, 4, 323–329. Watson, J. S. (1966). The development and generalization of “contingency awareness” in early infancy: Some hypotheses. Merrill-Palmer Quarterly, 12, 123–135. Webb, N. L., Meyer, R. H., Gamoran, A., & Fu, J. (2004). Participation in the Student Achievement Guarantee in Education (SAGE) Program and performance on state assessments at grade 3 and grade 4 for three cohorts of students—grade 1 students in 1996–97, 1997–98, and 1998–99. Wisconsin Center for Education Research, University of Wisconsin-Madison. Wenglinsky, H. (1997). The effect of school district spending on academic achievement. Sociology of Education, 70, 221–237. Wenglinsky, H. (2001). Teacher classroom practices and student performance: How schools can make a difference (Research Report RR-01-19). Princeton, NJ: Educational Testing Service. White, R. W. (1959). Motivation reconsidered: The concept of competence. Psychological Review, 66(5), 297–333.

220    The Cost-Effectiveness of 22 Approaches for Raising Student Achievement

Widerstedt, B. (1998). Job mobility, wage growth, and match quality in Sweden. (Umea Economic Studies No. 464). Umea, Sweden: Umea University. Wigfield, A., Eccles, J., Mac Iver, D., Reuman, D., & Midgley, C. (1991). Transitions at early adolescence: Changes in children’s domain-specific selfperceptions and general self esteem across the transition to junior high school. Developmental Psychology, 27, 552–565. WikEd (2006). Accelerated Math. Retrieved March 15, 2007, from http://wik. ed.uiuc.edu/index.php/Accelerated_Math Wilson, S. M., & Floden, R. E. (2003). Creating effective teachers: Concise answers for hard questions. An addendum to the report Teacher Preparation Research: Current Knowledge, Gaps and Recommendations. Denver, CO: Education Commission of the States; American Association of Colleges for Teacher Education. Winship, C., & Korenman, S. D. (1999). Economic success and the evolution of schooling and mental ability. In S. E. Mayer & P. E. Peterson (Eds.), Earning and learning: How schools matter (pp. 49–78). Washington, DC: Brookings Institution Press. Wisconsin Department of Public Instruction. (2006, August 28). Milwaukee Parental Choice Program (MPCP): MPCP facts and figures for 2005–2006. Retrieved September 24, 2006, from http://dpi.wi.gov/sms/choice.html Wisconsin Department of Public Instruction. (2007a, August 13). Changing the Wisconsin Knowledge and Concept Examinations norms from 1996 to 2000. Retrieved August 13, 2007, from http://www.dpi.state.wi.us/oea/ kcnorms02.html Wisconsin Department of Public Instruction. (2007b). SAGE Program guidelines: February 2007. Retrieved June 7, 2007, from http://dpi.state.wi.us/sage/ index.html Wong, K., & Meyer, S. (2001). Title I schoolwide programs as an alternative to categorical practices: An organizational analysis of surveys from the Prospects study. In G. D. Borman, S. C. Stringfield & R. E. Slavin (Eds.), Title I: Compensatory education at the crossroads (pp. 195–234). Mahwah, NJ: Erlbaum. Wong, K. K., & Meyer, S. J. (1998). Title I schoolwide programs: A synthesis of findings from recent evaluation. Educational Evaluation and Policy Analysis, 20(2), 115–136. Wößmann, L. (2007). International evidence on expenditures and class size: A review. In T. Loveless & F. Hess (Eds.), Brookings papers on education policy: 2006/2007 (pp. 245–272). Washington, DC: Brookings Institution Press. Wright, P., Horn, S., & Sanders, W. (1997). Teachers and classroom heterogeneity: Their effects on educational outcomes. Journal of Personnel Evaluation in Education, 11(1), 57–67. Yeh, S. S. (2006a). High stakes testing: Can rapid assessment reduce the pressure? Teachers College Record, 108(4), 621–661. Yeh, S. S. (2006b). Raising student achievement through rapid assessment and test reform. New York: Teachers College Press.

References    221

Yeh, S. S. (2007). The cost-effectiveness of five policies for improving student achievement. American Journal of Evaluation, 28(4), 416–436. Yeh, S. S. (2008). The cost-effectiveness of comprehensive school reform and rapid assessment. Education Policy Analysis Archives, 16(13). Retrieved from http://epaa.asu.edu/epaa/v16n13/ Yeh, S. S. (2009a). Class size reduction or rapid formative assessment? A comparison of cost-effectiveness. Educational Research Review, 4, 7–15. Yeh, S. S. (2009b). The cost-effectiveness of raising teacher quality. Educational Research Review, 4(3), 220–232. Yeh, S. S. (2010a). The cost-effectiveness of NBPTS teacher certification. Evaluation Review, 34(3), 220–241. Yeh, S. S. (2010b). Understanding and addressing the achievement gap through individualized instruction and formative assessment. Assessment in Education, 17(2), 169–182. Yeh, S. S., & Ritter, J. (2009). The cost-effectiveness of replacing the bottom quartile of novice teachers through value-added teacher assessment. Journal of Education Finance, 34(4), 426–451. Ysseldyke, J., & Bolt, D. M. (2007). Effect of technology-enhanced continuous progress monitoring on math achievement. School Psychology Review, 36(3), 453–467. Ysseldyke, J., & Tardrew, S. (2007). Use of a progress-monitoring system to enable teachers to differentiate math instruction. Journal of Applied School Psychology, 24(1), 1–28. Ysseldyke, J., Tardrew, S., Betts, J., Thill, T., & Hannigan, E. (2004). Use of an instructional management system to enhance math instruction of gifted and talented students. Journal for the Education of the Gifted, 27(4), 293– 310. Zahorik, J., Molnar, A., Ehrle, K., & Halbach, A. (2000). Effective teaching in reduced size classes. Center for Education Research, Analysis, and Innovation, University of Wisconsin-Milwaukee. Zuckerman, M., Porac, J., Lathin, D., Smith, R., & Deci, E. L. (1978). On the importance of self-determination for intrinsically motivated behavior. Personality and Social Psychology Bulletin, 4, 443–466.

This page intentionally left blank.

About the Author

Stuart Yeh is Associate Professor and Coordinator of the Evaluation Studies Program at the University of Minnesota. He received a B.A. degree in economics and Master of Public Policy degree from the University of Michigan, a Ph.D. from Stanford University, and was awarded a postdoctoral fellowship at Harvard University. He has conducted numerous cost-effectiveness evaluations in the field of education and human services.

The Cost-Effectiveness of 22 Approaches for Raising Student Achievement, page 223 Copyright © 2011 by Information Age Publishing All rights of reproduction in any form reserved.

223