Economics of Means-Tested Transfer Programs in the United States, Volume II 9780226392523

Few government programs in the United States are as controversial as those designed to help the poor. From tax credits t

171 13 2MB

English Pages 384 [323] Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Economics of Means-Tested Transfer Programs in the United States, Volume II
 9780226392523

Citation preview

Economics of Means-Tested Transfer Programs in the United States, Volume II

National Bureau of Economic Research Conference Report

Economics of Means-Tested Transfer Programs in the United States, Volume II

Edited by

Robert A. Moffitt

The University of Chicago Press Chicago and London

The University of Chicago Press, Chicago 60637 The University of Chicago Press, Ltd., London © 2016 by the National Bureau of Economic Research All rights reserved. Published 2016. Printed in the United States of America 25 24 23 22 21 20 19 18 17 16 1 2 3 4 5 ISBN-13: 978-0-226-39249-3 (cloth) ISBN-13: 978-0-226-39252-3 (e-book) DOI: 10.7208/chicago/9780226392523.001.0001 Library of Congress Cataloging-in-Publication Data Names: Moffitt, Robert A., editor. Title: Economics of means-tested transfer programs in the United States / edited by Robert Moffitt. Other titles: National Bureau of Economic Research conference report. Description: Chicago ; London : The University of Chicago Press, 2016– | Series: NBER conference report | Includes bibliographical references and index. Identifiers: LCCN 2016006530 | ISBN 9780226370477 (cloth : alk. paper) | ISBN 9780226370507 (e-book) | ISBN 9780226392493 (cloth : alk. paper) | ISBN 9780226392523 (e-book) Subjects: LCSH: Economic security—United States. | Income maintenance programs—United States. | Economic security—United States—Testing. Classification: LCC HD7125 .E273 2016 | DDC 361/.05—dc23 LC record available at http://lccn.loc.gov/2016006530 ♾ This paper meets the requirements of ANSI/NISO Z39.48–1992 (Permanence of Paper).

National Bureau of Economic Research Officers Martin B. Zimmerman, chairman Karen N. Horn, vice chairman James M. Poterba, president and chief executive officer Robert Mednick, treasurer

Kelly Horak, controller and assistant corporate secretary Alterra Milone, corporate secretary Denis Healy, assistant corporate secretary

Directors at Large Peter C. Aldrich Elizabeth E. Bailey John H. Biggs John S. Clarkeson Don R. Conlan Kathleen B. Cooper Charles H. Dallara George C. Eads Jessica P. Einhorn

Mohamed El-Erian Linda Ewing Jacob A. Frenkel Judith M. Gueron Robert S. Hamada Peter Blair Henry Karen N. Horn John Lipsky Laurence H. Meyer

Michael H. Moskow Alicia H. Munnell Robert T. Parry James M. Poterba John S. Reed Marina v. N. Whitman Martin B. Zimmerman

Directors by University Appointment Timothy Bresnahan, Stanford Pierre-André Chiappori, Columbia Alan V. Deardorff, Michigan Ray C. Fair, Yale Edward Foster, Minnesota John P. Gould, Chicago Mark Grinblatt, California, Los Angeles Bruce Hansen, Wisconsin–Madison

Benjamin Hermalin, California, Berkeley Marjorie B. McElroy, Duke Joel Mokyr, Northwestern Andrew Postlewaite, Pennsylvania Cecilia Rouse, Princeton Richard L. Schmalensee, Massachusetts Institute of Technology David B. Yoffie, Harvard

Directors by Appointment of Other Organizations Jean-Paul Chavas, Agricultural and Applied Economics Association Martin J. Gruber, American Finance Association Arthur Kennickell, American Statistical Association Jack Kleinhenz, National Association for Business Economics William W. Lewis, Committee for Economic Development Robert Mednick, American Institute of Certified Public Accountants

Alan L. Olmstead, Economic History Association Peter L. Rousseau, American Economic Association Gregor W. Smith, Canadian Economics Association William Spriggs, American Federation of Labor and Congress of Industrial Organizations Bart van Ark, The Conference Board

Directors Emeriti George Akerlof Jagdish Bhagwati Carl F. Christ Franklin Fisher

George Hatsopoulos Saul H. Hymans Rudolph A. Oswald Peter G. Peterson

John J. Siegfried Craig Swan

Relation of the Directors to the Work and Publications of the National Bureau of Economic Research 1. The object of the NBER is to ascertain and present to the economics profession, and to the public more generally, important economic facts and their interpretation in a scientific manner without policy recommendations. The Board of Directors is charged with the responsibility of ensuring that the work of the NBER is carried on in strict conformity with this object. 2. The President shall establish an internal review process to ensure that book manuscripts proposed for publication DO NOT contain policy recommendations. This shall apply both to the proceedings of conferences and to manuscripts by a single author or by one or more co-authors but shall not apply to authors of comments at NBER conferences who are not NBER affiliates. 3. No book manuscript reporting research shall be published by the NBER until the President has sent to each member of the Board a notice that a manuscript is recommended for publication and that in the President’s opinion it is suitable for publication in accordance with the above principles of the NBER. Such notification will include a table of contents and an abstract or summary of the manuscript’s content, a list of contributors if applicable, and a response form for use by Directors who desire a copy of the manuscript for review. Each manuscript shall contain a summary drawing attention to the nature and treatment of the problem studied and the main conclusions reached. 4. No volume shall be published until forty-five days have elapsed from the above notification of intention to publish it. During this period a copy shall be sent to any Director requesting it, and if any Director objects to publication on the grounds that the manuscript contains policy recommendations, the objection will be presented to the author(s) or editor(s). In case of dispute, all members of the Board shall be notified, and the President shall appoint an ad hoc committee of the Board to decide the matter; thirty days additional shall be granted for this purpose. 5. The President shall present annually to the Board a report describing the internal manuscript review process, any objections made by Directors before publication or by anyone after publication, any disputes about such matters, and how they were handled. 6. Publications of the NBER issued for informational purposes concerning the work of the Bureau, or issued to inform the public of the activities at the Bureau, including but not limited to the NBER Digest and Reporter, shall be consistent with the object stated in paragraph 1. They shall contain a specific disclaimer noting that they have not passed through the review procedures required in this resolution. The Executive Committee of the Board is charged with the review of all such publications from time to time. 7. NBER working papers and manuscripts distributed on the Bureau’s web site are not deemed to be publications for the purpose of this resolution, but they shall be consistent with the object stated in paragraph 1. Working papers shall contain a specific disclaimer noting that they have not passed through the review procedures required in this resolution. The NBER’s web site shall contain a similar disclaimer. The President shall establish an internal review process to ensure that the working papers and the web site do not contain policy recommendations, and shall report annually to the Board on this process and any concerns raised in connection with it. 8. Unless otherwise determined by the Board or exempted by the terms of paragraphs 6 and 7, a copy of this resolution shall be printed in each NBER publication as described in paragraph 2 above.

Contents

Preface Robert A. Moffitt

ix

1.

The Supplemental Security Income Program Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

1

2.

Low-Income Housing Policy Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

59

3.

Employment and Training Programs Burt S. Barnow and Jeffrey Smith

127

4.

Early Childhood Education Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

235

Contributors Author Index Subject Index

299 301 307

vii

Preface Robert A. Moffitt

This volume contains four chapters that were prepared as part of a research project sponsored by the National Bureau of Economic Research (NBER) on the economics of means-tested transfer and human-capital programs for the disadvantaged in the United States. Two of the chapters survey the history, policy issues, rules, caseloads, and research on a major means-tested transfer program (the Supplemental Security Income program and Housing Assistance programs) and two of the chapters cover the same topics for, respectively, employment and training programs and early childhood education programs, which are human-capital programs. In addition to these four chapters, four chapters appear in a first volume along with an introduction that summarizes the chapters in both volumes. The introduction in volume I also gives an overview of the current structure and historical trends in US means-tested programs. The chapters in both volumes are revised versions of papers presented at a conference convened by the NBER in Cambridge, Massachusetts, on December 4–5, 2014. The editor would like to thank the Smith Richardson Foundation for financial support.

Robert A. Moffitt is the Krieger-Eisenhower Professor of Economics at Johns Hopkins University and a research associate of the National Bureau of Economic Research. For acknowledgments, sources of research support, and disclosure of the author’s material financial relationships, if any, please see http://www.nber.org/chapters/c13684.ack.

ix

1

The Supplemental Security Income Program Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

1.1

Introduction

Supplemental Security Income (SSI) is a federally administered, meanstested program that provides cash—and typically Medicaid—benefits to low-income individuals who meet a categorical eligibility requirement of age or disability status. The SSI essentially operates three programs for distinct populations: blind or disabled children, blind or disabled nonelderly adults, and individuals age sixty-five and older (without regard for disability status). The program has a federally determined set of income, asset, and medical eligibility criteria and maximum benefit levels that do not vary across states. Nearly one-third of states supplement the federal benefit with state SSI benefits (paid for entirely by the individual states), though these payments account for just 6 percent of total SSI benefits paid. In 2013 the federal government paid $54 billion in SSI cash benefits and in December 2013 there were 8.4 million SSI recipients. An additional $133 bil-

Mark Duggan is the Wayne and Jodi Cooperman Professor of Economics at Stanford University, director of the Stanford Institute for Economic Policy Research, and a research associate of the National Bureau of Economic Research. Melissa S. Kearney is professor of economics at the University of Maryland and a research associate of the National Bureau of Economic Research. Stephanie Rennane is a PhD candidate in economics at the University of Maryland. This chapter was prepared for the 2015 volume of Means-Tested Programs in the United States, edited by Robert Moffitt. The authors are grateful to Robert Moffitt, David Autor, David Wittenburg, Kathy Ruffing, Donna Pavetti, Paul Van de Water, and participants at the NBER Means-Tested Program authors’ conference for helpful comments. For acknowledgments, sources of research support, and disclosure of the authors’ material financial relationships, if any, please see http://www.nber.org/chapters/c13487.ack.

1

2

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

lion was paid for SSI recipients’ Medicaid benefits in 2011.1 More than half of SSI recipients in December 2013 received the maximum federal benefit of $710 per month (or more if supplemented by the state) with the rest having their benefits partially phased out due to relatively higher income. Approximately one in six current SSI recipients are under the age of eighteen, one in four are sixty-five or older, and the remaining 60 percent are between the ages of eighteen and sixty-four. The corresponding shares twenty-five years ago were 6, 44, and 50 percent, respectively, reflecting the substantial increase in SSI enrollment among children and nonelderly adults during this period. Total federal benefits paid for SSI disabled children and nonelderly adults nearly tripled over a twenty-five-year period, rising from $14.6 billion in 1988 to $44.4 billion dollars in 2013 (SSA [2014c], all figures in real 2014$). The SSI program has become an increasingly important part of the social safety net, especially for nonelderly adults and children. For the elderly, the SSI program typically supplements social security (OASDI) benefits for lowincome individuals and households, providing a transfer of income intended to assist individuals with very low levels of income. The fraction of elderly individuals receiving SSI benefits has fallen steadily since the early 1980s, with this trend primarily driven by a corresponding increase in Social Security benefits.2 In 2013, approximately one in twenty-two elderly individuals received SSI benefits versus one in fifteen thirty years earlier. For nonelderly adults, the SSI program provides cash income to disabled individuals with limited earnings history. The rationale for these income transfers is to provide an income floor to individuals with disabilities who are unable to engage in substantial gainful activity (SGA). Nearly one in four SSI disabled adults also qualify for benefits through the Social Security Disability Insurance (SSDI) program, which requires ten or more years of earnings history, while the rest do not have sufficient work history to qualify for SSDI. Both programs are administered by the US Social Security Administration (SSA) and have an identical set of medical eligibility criteria. The fraction of nonelderly adults receiving SSI benefits has increased substantially over time, from 1.5 percent in 1988 to 2.5 percent by 2013.3 In the 2003 means-tested programs volume, Daly and Burkhauser (2003) make the important observations that (a) “disability” is neither a precise nor a static concept and (b) societal expectations about work for those with disabilities 1. This is the most recent year for which Medicaid spending data by eligibility category are available. The CMS reports $223 billion for 14.1 million aged and disabled Medicaid recipients. Because this exceeds the number of SSI aged and disabled recipients, we scale this down by the ratio of SSI aged and disabled to CMS aged and disabled. 2. The primary reason for this growth is that Social Security benefits are indexed to wages. 3. This 1.0 percentage point increase is less than half the corresponding enrollment change for the SSDI program. This difference is likely driven by the growth in labor supply among women over time, which has made more of them eligible for SSDI benefits and their level of SSDI benefits higher as well (Duggan and Imberman 2007). Because SSDI phases out SSI benefits one for one, an increase in SSDI benefits will tend to reduce SSI enrollment.

The Supplemental Security Income Program

3

have changed over time as, for example, reflected in the 1990 Americans with Disabilities Act. These observations raise the issue of labor supply disincentives inherent in the SSI program, a point to which we return below. Supplemental Security Income also provides benefits to low-income children with disabilities. The fraction of children receiving SSI has increased by a factor of four since the late 1980s, from 0.4 percent in 1988 to 1.8 percent in 2013. This enrollment growth was primarily driven by two 1990 policy changes that expanded the program’s medical eligibility criteria (Duggan and Kearney 2007; GAO 1994). There is considerable overlap between the households with children served by this program and those served by the Temporary Assistance to Needy Families (TANF) program.4 But unlike TANF, SSI is a federal program and is not explicitly “temporary.” The motivation for why families with a disabled child should get additional income, as compared to a family with a healthy child and the same level of income, is not explicit in the program. One could rationalize that such families might have additional child care needs to support parental employment or additional health care costs for the child. Or, one could argue that families with a disabled child have a need for occupational services, designed to help a child improve and excel in school. But in practice, the program taxes parental earnings and it does not explicitly tie benefits to child care or health care costs. Furthermore, if a child’s condition improves, the family risks losing their SSI benefits. All of these observations raise questions about the incentives of the program and whether it is optimally designed to serve families with disabled children. We return to these points below. When considering SSI alongside the panoply of means-tested cash transfer programs, we note four defining features of the program. These are features that stand in contrast to typical features of other means-tested income support programs in the United States, including the Earned Income Tax Credit (EITC), TANF, the Supplemental Nutritional Assistance Program (SNAP), and Medicaid. First, as we have noted above, for the nonelderly the SSI program includes a categorical requirement of demonstrated disability, specifically, a disability that hinders labor market or educational performance. Second, the program’s benefit levels are relatively generous, especially compared to TANF cash benefit awards in low-benefit states, and are indexed to inflation. Third, SSI benefits are paid for with federal dollars, which can amount to large net transfers to states with a disproportionate share of low-income Americans. Fourth, the program is not intended to be temporary, so any distortions in behavior resulting from the program can potentially be long lasting. 4. In 2001, households with at least one child on SSI were more than three times as likely as households with children not on SSI to receive some income from the TANF program (Duggan and Kearney 2007).

4

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

These four features raise a particular set of theoretical issues. First, the categorical disability requirement is a form of “tagging,” so named in the seminal work of Akerlof (1978), in which the government imposes certain eligibility requirements to target funds to groups with especially high needs. The existence of a tag allows the government to redistribute more than if all individuals were potentially eligible for the benefit. It also may provide an incentive for some individuals to overstate the severity of their medical conditions in order to qualify for the program. Second, there exists the standard trade-off between income protection and distortions to the labor supply and savings decisions of benefit recipients. Third, the federal nature of this program raises the possibility of spillover effects to state and local programs such as TANF. In the pages that follow, we review these issues in more depth and describe the relevant empirical evidence. We review recent empirical evidence on the determinants of caseloads and the effects of program participation as it exists for the working-age adult, elderly, and child SSI programs. In general, existing studies suggest that the growth in the working-age adult caseload is driven by three main factors: relaxed eligibility requirements, the aging of the baby boom generation, and increased stringency of other assistance programs. There is some evidence suggesting that the SSI program reduces labor force participation and savings among older adults in the years leading up to their eligibility for elderly SSI benefits. Studies that have focused on the SSI children’s program document the important role SSI plays as an antipoverty safety net program for families. These studies also highlight spillovers and interactions between SSI and other government programs, such as Aid to Families with Dependent Children (AFDC) and special education programs, although more evidence about the size and nature of spillovers across programs is needed. While there is a now an informative body of evidence about the effects of child SSI benefits on child and parent outcomes, this is one of the most promising areas for future research. For example, more research is needed to understand how child SSI income is used in the household and how program rules affect the therapeutic and educational trajectory of child beneficiaries. Little is known about the effects of child SSI on later program participation, educational outcomes, or the consequences of labeling children as disabled. All of these questions are open and fruitful areas for future research. There are also a number of important remaining questions about optimal policy design. The outline of the chapter is as follows: In section 1.2 we provide a brief summary of the history of the SSI program and discuss the most important features of the program today. Section 1.3 presents information about the caseload and caseload trends. Section 1.4 describes economic issues particular to the design and practical application of this program as well as a discussion of relevant empirical evidence. A final section concludes.

The Supplemental Security Income Program

1.2

5

Origins and Structure of the SSI Program

The federal Supplemental Security Income program began paying out benefits in January 1974 and replaced a combination of approximately 1,350 different state and local programs that provided benefits to low-income aged, blind, and disabled individuals (Berkowitz and DeWitt 2013). Many of these programs had been partially funded by the federal government, and the size of benefits varied across states (Wiseman 2011). In some cases, the uniform federal SSI benefit amount was lower than what had been paid by the previous programs. Because of this, a system of state supplements was introduced during the transition to SSI to ensure that no individual would receive lower benefits from the SSI program than they were already receiving from their state or local welfare program. Relatedly, because there was variation across geographic areas in the medical and income eligibility criteria, recipients already enrolled in state programs by early 1973 were grandfathered in to SSI, though anyone who enrolled in a state program after July 1973 would have their SSI eligibility determined according to the uniform medical eligibility standards in effect throughout the United States. Since its inception, the SSI program has been administered by SSA, perhaps partly because of the overlap in the populations served by the OASDI and SSI programs. Supporters of the program also argued that there would be less stigma from receiving SSI benefits if it were administered by SSA instead of local welfare offices. And because SSA already had a set of medical eligibility criteria defined for the SSDI program, it was well positioned to apply these same criteria to SSI applicants. The two programs have used the same medical eligibility criteria for disabled adults during the last forty years. By December of 1974, there were 4.0 million US residents receiving SSI benefits and more than 60 percent of SSI recipients were age sixty-five or older. Most of these elderly SSI recipients qualified solely due to low income and assets after reaching sixty-five, though a substantial number also qualified initially due to a disability and remained on SSI after reaching age sixty-five. Legislation that took effect in the summer of 1974 required that SSI benefits be indexed to the Consumer Price Index (CPI). In contrast to SSDI, SSI has always paid benefits to disabled children.5 In the first full year of the program, 71,000 children received SSI benefits, and over the next ten years this number tripled to 212,000. During the debate that took place in both houses of Congress in the early 1970s as SSI legislation was considered, there was little discussion of whether children should receive benefits from the SSI program and what the medical eligibility criteria for them should be. Evidence from the historical record suggests that a congressional staffer inserted a phrase about benefits for disabled children 5. The SSDI does pay benefits to children, but only as dependents of disabled workers. See Autor and Duggan (2006) for more background on the SSDI program.

6

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

into the 1971 version of the House bill. This phrase remained in the final version that passed both houses of Congress and that was sent to President Nixon for his signature (Berkowitz and DeWitt 2013). The shifting age distribution of SSI recipients over the last four decades is striking. As incomes among the elderly have risen during that time period, a smaller share has been eligible for the program. The fraction of US residents age sixty-five and older receiving SSI stood at 11 percent in 1974 and has trended steadily downward to 4.7 percent by 2013. In contrast, the fraction of children and of nonelderly adults receiving SSI benefits has grown substantially during that same period. Perhaps the most important factor causing this growth has been an expansion in the program’s medical eligibility criteria, a subject to which we now turn. 1.2.1

Disability Determination

We begin our review of the structure of the SSI program with a discussion of the program’s disability determination process, considering first the process as it applies to adult applicants and subsequently to applicants under age eighteen. Income-eligible applicants over the age of sixty-five do not need to demonstrate the existence of a work-limiting disability. If they satisfy the income and asset tests, they are eligible for SSI. This discussion about disability determination therefore only applies to those under the age of sixty-five.6 In addition, individuals can meet the categorical requirement for SSI through blindness if they have 20/200 vision or less with the use of a correcting lens in their better eye, or if they have tunnel vision of 20 degrees or less (SSA 2014a). These objective standards stand in contrast to the more subjective criteria employed to determine eligibility under the disabled criteria, as described below. Disability Determination for Adults Nonelderly adults typically apply for SSI benefits through an SSA field office. Employees there determine whether the applicant meets nonmedical requirements, including sufficiently low income and assets. If monthly earnings exceed SSA’s definition of SGA, the applicant is deemed categorically ineligible.7 Applications that pass this initial screen are then forwarded on to a state agency, where the disability determination process is usually carried out by a two-person team. The first person is a state disability examiner, who assembles both medical and nonmedical evidence and requests a consultative exam when the medical evidence is not sufficient to make a 6. About 45 percent of elderly SSI recipients first qualified for the program because of blindness or a disability. More specifically, in December 2013 there were 2.11 million SSI recipients age sixty-five and older, but there were only 1.16 million SSI recipients in the “aged” category. 7. The monthly substantial gainful activity amount increased from $500 to $700 in 1999 and has been indexed to inflation since. See http://www.socialsecurity.gov/oact/cola/sga.html for more information.

The Supplemental Security Income Program

7

disability determination. The examiner also prepares (or assists in preparation for more complicated cases) an assessment of the applicant’s residual functional capacity. The second person on the team is a medical consultant who reviews the available medical evidence provided by the applicant and acquired through one or more additional consultative exams. The examiner prepares the final determination, which is then signed by the medical consultant. A nonelderly adult applying for SSI benefits must demonstrate that he or she has a medically determined physical or mental disability that limits his or her ability to engage in SGA and further demonstrate that this disability will last at least twelve months or result in death. The federal guidelines are the same across states and are identical to those used by the SSDI program. In practice there is variation in award rates, as the determination of disability status is made by individual examiners and often inevitably involves subjective judgments. Indeed, recent research (Maestas, Mullen, and Strand 2013; French and Song 2014) has shown that there is considerable variation across examiners in the disability determination, even after controlling for the characteristics of applicants. The SSA’s disability determination process considers whether a medical impairment is severe and is expected to last for at least twelve months or to result in death. If the impairment passes this threshold and is on SSA’s list of medical impairments, then the applicant passes the disability determination. If the impairment is not on this list, then SSA considers whether the applicant can perform labor market tasks that he/she previously performed. If this is possible, then the applicant is found to be categorically ineligible. If the applicant is unable to do past work, then SSA considers whether there are other occupations in the economy that he/she could perform. In this case, the examining team considers not only the applicant’s medical condition but also his/her age, education, and work experience.8 Applicants who are initially rejected may appeal the decision. A firstround appeal involves the application being considered by a second team of examiners. Applicants denied at this stage have the option to appeal to an administrative law judge (ALJ). When appearing before an ALJ, the applicant is often joined by a lawyer or some other representative. The hearings are somewhat unusual in that only one side is represented—SSA does not have anyone there explaining the reason for the initial decisions. Here, too, there is an element of significant variation across judges. On this point, a paper by French and Song (2014) shows systematic variation in denial rates across SSA appeals judges. Applicants denied through that second appeals stage can try again by appealing to the Social Security Appeals Council and then to their district court. In 2009, approximately 1.662 million individuals applied for SSI and 8. See Wixon and Strand (2013) for a more detailed explanation of this process.

8

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

met the initial income and asset screens. From this group, approximately 31.1 percent received an SSI award at this first stage. Of the 1.145 million rejected applicants, more than half (51.3 percent) appeal the decision. Only 10.2 percent receive an award at the next stage, suggesting that employees at the state Disability Determination Services rarely overturn the decisions made by their colleagues. However, that is not the case for ALJs. Of the 413,000 rejected applicants appealing to an ALJ, the majority (57.9 percent) receive an award from the ALJ or at a subsequent stage. The large number of appeals substantially increases the SSI award rate among nonelderly adults from 31.1 percent (considering just the first stage) to 49.6 percent.9 Put another way, more than one in three SSI awards to nonelderly adults are made on appeal. The average time from initial application to the first decision is four months, while those appealing to the ALJ level or higher typically wait more than two years for the decision (OIG 2008). Disability Determination for Children The process of determining categorical disability eligibility for children has undergone substantial change since the program’s inception. Like adult applicants, in order to be eligible for the program, a child has to be determined to have a disability lasting at least twelve months or resulting in death. Initially this was done by establishing that a child applicant had a medical impairment that appeared on the SSA list of qualifying medical conditions. Two policy changes in the early 1990s introduced a greater emphasis on a child’s functioning rather than a strict focus on medical conditions alone. First, the landmark legal case of Sullivan v. Zebley (full case name Louis Wade Sullivan, Secretary of Health and Human Services v. Brian Zebley, et al., 493 US 591) resulted in the addition of a functional assessment for children. In this case, the Supreme Court ruled on the side of the plaintiffs, finding that SSA’s listing-only methodology for determining SSI child claims was inconsistent with the statutory standard of “comparable severity” for adult limitations set forth in the Social Security Act. The argument was that the current program rules did not provide SSI child claimants with an individualized functional assessment similar to the functional analysis considered in many adult claims. Second, prompted by the Zebley decision, in December of 1990 the SSA issued new regulations in accordance with the Disability Benefits Reform Act (DBRA) of 1984 that revised and expanded SSA’s medical listings for childhood mental impairments. The new medical listings for mental impairments provided more detailed and specific guidance on how to evaluate mental disorders in children as compared to the former regulations, which were put into place in 1977 (GAO 1995.) Over the early 1990s, use of the individual functional assessment (IFA), 9. Left out of this calculation are the 14,189 applications still in process in the most recent data.

The Supplemental Security Income Program

9

as well as the new DBRA criteria emphasizing functioning in determining mental disabilities, led to a large expansion in the number of children determined to be categorically eligible for SSI, many of whom had less severe disabilities than previous generations of SSI child recipients. In the three years prior to this change, the number of children receiving SSI benefits was growing by about 3 percent per year, from 241,000 in 1986 to 264,000 by 1989. In the seven years following these changes, the number of children on SSI increased from 265,000 in 1989 to 955,000 in 1996, an increase of 260 percent. In terms of the percent of children from birth to age seventeen receiving SSI benefits, this increase reflects an increase from 0.4 percent to 1.4 percent (Duggan and Kearney 2007). In response to this caseload expansion, Congress revised the SSI eligibility rules for children as part of the 1996 welfare reform legislation. The revised provisions eliminated the IFA, but preserved the spirit of the functional limitation idea: to be determined categorically eligible, a child must demonstrate “a medically determined physical or mental impairment which results in marked and severe functional limitations, which can be expected to lead to death or which has been or can be expected to last for a continuous period of not less than 12 months” (SSA 2014d). This change resulted in nearly 100,000 children being terminated from the program in 1997, and the share of children receiving SSI remained at 1.2 percent from 1997 through 2000. The new provisions further required children reaching age eighteen to be reevaluated to determine whether a child SSI recipient would continue to receive benefits as an adult. As a result, the current determination process for children is less restrictive than it was during the “listing-only” paradigm in effect before the Zebley decision, but more restrictive than it was during the early 1990s (Berkowitz and Dewitt 2013; Wittenburg 2011; and Wiseman 2011). Despite this, SSI enrollment has grown steadily since 2000, with 1.8 percent of children receiving SSI benefits in 2013.10 In practice, the change in child disability determination since the early 1990s has led to a situation where a child’s disability status is frequently determined by a subjective determination about his performance in school, relative to peers his age. This has led to concerns about how the program’s eligibility criteria may increase the chance that a child is labeled with a learning disability, placed on medication in an effort to be deemed disabled, or receives (or not) inappropriate treatment therapies (Wen 2010; Wittenburg 2011). On the point of medication, a report by the US Government Accountability Office (GAO) found little evidence to suggest that medication use increased the chance that a child would be awarded SSI benefits (GAO 2012). These are issues to which we return later in the chapter. 10. During this same 2000 to 2013 period, the fraction of children in families with incomes below the poverty line also increased, from 16.2 percent to 19.9 percent. While this may have contributed to the increase in child SSI enrollment, recent research suggests that changes in poverty do not have a significant effect on SSI enrollment (Aizer, Gordon, and Kearney 2014).

10

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

Continuing Disability Reviews Continuing disability reviews (CDRs) have been required by law since the beginning of SSI. In practice, the frequency and stringency of CDRs have not been consistent over time, in many cases due to administrative backlogs and budget constraints (GAO 2006, 2014). The frequency with which SSA is expected to conduct CDRs on a disability beneficiary is set at the time the individual begins receiving benefits. The frequency is set according to the likelihood that the individual’s condition will improve: “improvement expected” (CDR every six to eighteen months); “improvement possible” (CDR every three years); and “improvement not expected” (CDR every five to seven years) (GAO 2006). For children, CDRs are required to be conducted every three years, except for benefits awarded for low birth weight, where CDRs should be conducted every twelve months (GAO 2012). Reviewers are required to conduct CDRs beginning with a neutral opinion about the beneficiary’s disability status, rather than presuming the beneficiary still has a disability. The standards of improvement for disability are often unclear (GAO 2006). This is particularly true in cases where the original disability determination was decided on appeal, or when an individual’s improvement is contingent on Medicaid benefits received as a result of participation in SSI. Despite these challenges, however, an SSA quality assessment of CDRs in 2005 found a 95 percent accuracy rate in CDR decisions. The CDRs are conducted at two levels in order to maintain cost effectiveness and efficiency: a mailer survey to all beneficiaries asking about their condition, and a full examination for select beneficiaries. The SSA uses a statistical “profiling” method based on age, condition, and previous CDR results in order to determine how thoroughly to conduct the CDR. If a beneficiary is unlikely to improve, they are more likely to receive just the mailer. If the information about the respondent’s medical condition on the mailer suggests improvement, then SSA will conduct a full medical examination. If not, the mailer completes the CDR requirement. Certain cases skip the mailing process and are subject to a full medical examination from the beginning (GAO 2006). As of 2014, the mailer process was not used for children (GAO 2014). When SSA determines that an individual’s benefits should be terminated, the beneficiary has a three-month grace period during which he/she can appeal the decision. When faced with budget constraints that limit the number of CDRs that SSA can conduct in a given time frame, SSA prioritizes CDRs in the following manner: (a) maintaining CDR currency, (b) age eighteen redeterminations, and (c) cost effectiveness. The priority on cost effectiveness often means that SSA prioritizes SSDI CDRs over SSI CDRs, since SSDI beneficiaries on average receive larger benefits than SSI beneficiaries. While potentially more cost effective in the short run, SSA has acknowledged that focusing on CDRs for children and younger beneficiaries may yield higher savings in

The Supplemental Security Income Program

11

the long run (GAO 2014). As of August 2011, approximately 435,000 children on SSI were overdue for CDRs, more than one-third of the total child caseload (GAO 2012). In September 2011, SSA’s inspector general estimated that “$1.4 billion in SSI benefits (had been paid) to approximately 513,000 recipients under age 18 who should have not received them” (GAO 2014). Additionally, since 1996 child SSA cases have been required to be reevaluated at the child’s eighteenth birthday according to adult eligibility rules. Following the Zebley decision, child cases have been determined based on the child’s ability to function at a comparable level to nondisabled children, while adult cases have always been determined based on an individual’s ability to work or participate in SGA (Hemmeter 2012). In many cases the transition from child to adult benefits leads to many terminations, and continuing beneficiaries are often reassigned to a different diagnosis category. In 1997, just following the introduction of age eighteen redetermination, 54 percent of eighteen-year-olds lost their benefits. This number fell to 46 percent by 2006 (Hemmeter and Gilby 2009). Additionally, 30 percent of eighteen-year-olds who kept their benefits were assigned to a new diagnosis group (Hemmeter 2012). While children whose benefits are terminated may be able to work, recent research finds that their income earned from work does not fully replace the income from benefits they would have earned. Deshpande (2014a) finds that young adults whose benefits were terminated earned only one-third of what they would have received in benefits, and suggests that these former beneficiaries experience significant volatility in their earnings over time. 1.2.2

Means Testing and Benefit Levels

To qualify for the SSI program, individuals must have sufficiently low income and assets. In the case of children, a portion of parental and sibling income affects both SSI eligibility and the potential benefit if a person is eligible. For married adult applicants and beneficiaries, spousal income is considered in eligibility and award determination. Other family members’ income and assets are counted toward an applicant’s income and assets through a process called deeming. As deemed income and assets increase, a person’s potential SSI benefits decline, and we discuss the specifics of this below. This raises the standard incentive concern—that an SSI recipient and his/her family members may have a lower incentive to work and save due to program rules (Hubbard, Skinner, and Zeldes 1995). In 2015, the federal benefit rate (FBR)—which is the maximum monthly benefit level—was $733 for individuals and $1,100 for couples. While the federal benefit rate is the same for recipients of all ages, the average actual monthly benefit amount varies substantially across age groups. In December 2014, the average benefit was $633 for child beneficiaries, $550 for nonelderly adult beneficiaries, and $426 for elderly beneficiaries. An SSI recipient’s monthly benefit falls below the FBR if the recipient or a family

12

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

member has earned or unearned income. The FBR is adjusted for a cost of living adjustment (COLA) using the consumer price index (CPI-W) each year. However, the value of the earned and unearned income exclusions for the SSI recipient—which define the threshold at which benefits begin to phase out—have not changed since the program began (Daly and Burkhauser 2003) and the asset limits were last updated in 1989. Adults Age Eighteen to Sixty-Four The means-testing eligibility for SSI is based on income—both earned and unearned—as well as assets. In order to be eligible for SSI, a nonelderly adult must not have assets exceeding $2,000 if filing as an individual, or $3,000 if filing as a couple. The value of the individual’s home and the value of one vehicle, as well as several small assets including grants and scholarships for educational purposes, personal effects (e.g., wedding rings), and life insurance policies, are excluded from the calculation of assets. In terms of income, an eligible adult’s benefit amount is equal to the difference between the maximum federal benefit rate (FBR) and “countable income.” In general, if an applicant is determined to have countable income greater than or equal to the maximum benefit of $733 a month, then the applicant is not eligible for an SSI award. Similarly, if an SSI recipient’s countable income rises above $733 in a month, his/her SSI benefit for that month falls to zero and his/her benefits may be terminated if this persists. Countable income for a single adult SSI recipient is approximately equal to the sum of unearned income and one-half of earned income. There is a general (either earned or unearned) income exclusion of $20 per month and an earned income exclusion of $65 per month. Thus an adult SSI recipient with $300 per month in unearned income but no earned income would have countable income of $280. An adult SSI recipient with $300 per month in earned income but no unearned income would have countable income of $107.50. In other words, unearned income phases out the SSI benefit one-for-one while there is a (lower) 50 percent marginal tax rate on earned income. In principle, the adult SSI recipient’s income would need to exceed $1,500 per month to fully phase out the SSI benefit. Under the Section 1619 waivers enacted in 1987, beneficiaries may be eligible to receive cash payments until the SSI benefit is fully phased out, even after earnings exceed the SGA. In practice, this is relatively rare: Ben-Shalom and Stapleton (2015) find that 10.4 percent of the 2001 SSI award cohort were allowed to earn above SGA for at least one month over a six-year period from 2001 to 2007. Over the same time frame, 8.4 percent had earnings exceeding the phase-out threshold in at least one month, but maintained eligibility for Medicaid due to a Section 1619(b) waiver.11 11. See https://secure.ssa.gov/poms.nsf/lnx/0502302010 for more details on section 1619 waivers. In practice, these waivers have a similar purpose as the trial work period for SSDI, allowing beneficiaries to test their work ability while maintaining eligibility for benefits and Medicaid temporarily.

The Supplemental Security Income Program

13

The share of SSI recipients with earned income is quite small: in 2013, less than 5 percent of the nonelderly adult beneficiary population reported having earned income (SSA 2014b). This makes clear that earned income is not generally the reason for benefit amounts falling below the FBR. Main sources of unearned income include transfer payments from Social Security, Unemployment Insurance, or a household TANF award, as well as income brought into the household from other family members. Income from tax refunds and grants or scholarships are not counted toward qualifying unearned income, nor are noncash benefits such as food assistance through the SNAP program.12 In addition to the standard exclusions for earned and unearned income there is also a student income exclusion, which allows full-time students to exclude a substantial amount of earned income from being counted toward SSI. In 2015, students age eighteen to twentytwo could exclude up to $1,780 per month from their own earned income. When an adult SSI recipient is married, the spouse’s income may be “deemed” to the SSI recipient. Thus even if the SSI recipient has no income, if his/her spouse has substantial income, then this can substantially lower the SSI benefit. There is a 50 percent tax rate on the earnings of the spouse in the phase-out range and spousal earnings can be substantial before the SSI recipient’s benefits begin to phase out. More specifically, if the applicant has no income, the spouse of an SSI recipient could earn $819 per month in 201513 before the SSI benefit begins to decline, and the spouse’s earnings would have to exceed $2,285 per month before the SSI benefit would be fully phased out. Given a federal poverty level of $15,930 for a two-person family, this suggests that the family’s income could reach almost 175 percent of the FPL before SSI benefits would be fully phased out. If there are one or more ineligible children in the household, then earnings of the spouse can be even higher before SSI benefits are taxed. In 2015, the spouse of an SSI recipient can earn $1,186 per month, rather than $819 per month, before the phase out of benefits begins if there is one child present in the household. Figures 1A.1–1A.4 provide several examples of the thresholds at which SSI benefits start to phase out in several different income and family situations. Children Less Than Age Eighteen Child applicants are, by definition, under age eighteen and not married or a head of household. If these conditions are not met, the applicant is evaluated as an adult. As with adults, the means testing involved in child eligibility determination is based on both assets and income. Child eligibility is based on the same asset limit as individual adult eligibility ($2,000), and includes 12. For more information, see http://www.ssa.gov/ssi/text-income-ussi.htm. 13. The spouse receives the same $85 income exclusion ($65 earned and $20 either earned or unearned) that the SSI recipient would. Additionally, SSI benefits are calculated as the lower of the amount that the person on SSI would receive if the spouse’s income was ignored and the amount that the couple would both receive if both were on SSI and it was included.

14

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

both assets in the child’s name and parental assets deemed to the child for the sake of eligibility determination. Applicants may subtract the amount of the adult income asset limit ($2,000 for a single parent, $3,000 for a married couple) from total parental assets, the remaining balance of parental assets is deemed to the child. This means that children in households where a single parent has more than $4,000 or a married couple has more than $5,000 in assets—net of excludable assets including a house, one vehicle, or educational grants, among others—are ineligible for SSI.14 Countable income for child applicants is based in part on parental income deemed to the child. This specified deeming process is somewhat different from the deeming of spousal income discussed above for adult recipients. If a child applicant’s parent(s) would be eligible for SSI based on their own income, then none of the parental income is deemed to the child. But if parental income exceeds the threshold for adult SSI eligibility, any income that is not used to “exhaust” the parent’s hypothetical eligibility for SSI is deemed to the child as unearned income.15 The unearned and earned income exclusions are applied to parental income, as well as any deductions for other children in the household who are not receiving SSI or TANF benefits. If there is more than one SSI-eligible child in the household, the remaining income to be deemed is divided equally among all eligible children in the household. The deemed income from parents is added to any additional earned or unearned income the child may have. Any public income maintenance payments made to other members of the household are not included in countable income.16 Then, the standard earned and unearned exclusions are applied, and the remaining countable income amount is compared to the FBR. An eligible child’s SSI benefit amount is determined as the amount by which the FBR exceeds countable income.17 As was true for adult SSI recipients, there is an effective 50 percent marginal tax rate on SSI benefits in the phase-out range. However, parental earnings can be substantial before a child’s SSI benefits begin to phase out. Consider a family with one parent and one child on SSI. In 2015, the parent’s earnings must exceed $1,591 per month before the child’s SSI benefits begin to phase out. If there are two parents with one child on SSI, parental earnings must exceed $2,322 per month before the phaseout begins. This represents a very high level of earnings before benefit phaseout begins relative to SSI adults or other means-tested transfer programs such as TANF or food stamps. 14. Source: http://www.socialsecurity.gov/ssi/text-resources-ussi.htm, last accessed November 11, 2014. 15. The deeming rules changed in 1992 in such a way that led to a more generous treatment of parental income for deeming purposes (see Hannsgen and Sandell 1996). 16. Source: https://secure.ssa.gov/poms.nsf/lnx/0501320100, last accessed November 11, 2014. 17. Source: https://secure.ssa.gov/poms.nsf/lnx/0500820510, last accessed November 11, 2014.

The Supplemental Security Income Program

15

According to data from SSA, more than two-thirds of children on SSI were living with only one parent in December 2013. An additional 12 percent reside with no parents, with most of these children likely living with other relatives or in foster care. Of the 1.163 million children on SSI residing with one or both parents, parental earnings was nonzero for 479,000 (41 percent) and average parental earnings for this group was $1,789 per month. However, given the relatively generous income exclusions described above, these earnings resulted in deemed income for just 160,000 children. The SSI benefits were actually reduced more frequently because of the child’s own (usually unearned) income from absent parents, Social Security, or some other source. 1.2.3

Citizenship and Residency Requirements

Since passage of the Personal Responsibility and Work Opportunity Reconciliation Act (PRWORA) in August 1996, resident aliens are only eligible for SSI if they were living in the United States prior to August 1996 and (a) receiving SSI prior to August 1996, (b) are blind and disabled, or (c) are on active duty or a veteran of the armed forces. If they arrived after August 22, 1996, refugees, asylees, and certain other small categories of immigrants are eligible for benefits during their first seven years in the United States with refugee/asylee status.18 Lawfully admitted permanent residents (LAPRs) with substantial work history (forty quarters of work) may be eligible to apply for SSI after five years. If the applicant is an LAPR and does not have sufficient work history, but their spouse does, this work history could count for determining eligibility.19 Similarly, a LAPR child is eligible if her parents have sufficient work history. As a result of these restrictions, noncitizen beneficiaries declined by nearly half, from 12.1 percent of the SSI population in 1995 to 6.7 percent in 2013. Throughout this period, noncitizen beneficiaries have been disproportionately elderly. Noncitizen beneficiaries accounted for nearly 31.8 percent of all aged beneficiaries in 1995, declining to 22.6 percent in 2013. The corresponding fractions for blind and disabled SSI recipients were 6.3 percent and 4.2 percent, respectively (SSA 2014b). 1.2.4

State Supplementation of SSI Benefits

In 2011—the most recent year for which state supplement data is available for all states—all but six states (Arizona, Arkansas, Mississippi, North Dakota, Tennessee, and West Virginia) supplemented the federal SSI benefit for at least some of their SSI recipients.20 Of the remaining forty-five 18. Source: https://secure.ssa.gov/apps10/poms.nsf/lnx/0500502100, last accessed November 11, 2014. 19. Source: https://secure.ssa.gov/apps10/poms.nsf/lnx/0500502135. 20. Four of these six states do supplement the benefit for the small number of SSI recipients enrolled since 1973. Several states (such as Michigan and Pennsylvania) are a mix in that the state administers the supplement for some recipients and the federal government for others.

16

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

states, most administer the optional SSI supplements themselves, though the federal government administers the supplement for almost one-third of the states. As shown in table 1.1, states vary substantially with respect to the fraction of SSI recipients with a state supplement. For example, in Texas and New Mexico, just 0.3 percent and 0.1 percent of beneficiaries, respectively, received a state supplement in January 2011. In contrast, in a handful of states, including California, Massachusetts, New Jersey, and New York, among others, more than 95 percent of SSI recipients receive a state supplement. In some states (e.g., Alaska), there are actually more recipients of state supplements than federal benefits, due to cases where the federal benefit phases out but the person still has sufficiently low income to receive the state supplement. In January 2011, there were 3.4 million individuals receiving state SSI supplements. Given that there were 7.66 million total SSI recipients, this suggests that about four in nine of those on SSI have a state supplement. While some states provide supplements to all SSI beneficiaries, other states provide supplements only to select groups of beneficiaries, such as blind beneficiaries, or beneficiaries in assisted-living arrangements. AddiTable 1.1

Percentage of state SSI caseload receiving a state supplement State

Share

State

Share

Alaska  Alabama  Arkansas  Arizona  California Colorado  Connecticut District of Columbia  Delaware  Florida  Georgia  Hawaii Iowa  Idaho Illinois Indiana Kansas  Kentucky Louisiana Massachusetts Maryland  Maine  Michigan Minnesota

138.6 0.1 0.0 0.0 97.9 42.0 17.3 5.5 4.0 2.6 1.2 10.8 10.8 50.5 9.7 2.8 15.8 2.1 2.5 98.2 3.1 101.7 88.8 47.7

North Carolina North Dakota  Nebraska  New Hampshire  New Jersey New Mexico Nevada New York Ohio  Oklahoma Oregon  Pennsylvania Rhode Island South Carolina South Dakota  Tennessee  Texas Utah Virginia Vermont Washington Wisconsin West Virginia  Wyoming 

10.5 0.0 20.5 54.8 96.0 0.1 24.3 95.7 0.5 93.0 2.4 84.5 96.5 3.4 27.3 0.0 0.3 8.2 3.4 96.8 23.3 101.3 0.0 48.6

Source: Data is from SSA (2011).

The Supplemental Security Income Program

17

tionally, states determine the size of the supplement, which ranges between approximately $10 and $350 per month (SSA 2011). For example, California’s average supplement of $167 per month is about twice as high as New York’s ($77 per month) and Massachusetts’ ($79 per month) and more than three times the average in New Jersey ($46), Vermont ($54), or Rhode Island ($45).21 The other six states with a federally administered SSI supplement provide it to less than one in four of their SSI recipients. In 2011, federally administered state supplements accounted for 6 percent of total federally administered SSI expenditures. Because 70 percent of SSI recipients with a supplement receive it from SSA, we estimate that total SSI supplements are 8 to 9 percent of total SSI expenditures. 1.2.5

Interactions with Other Government Programs

The vast majority of SSI recipients obtain health insurance through the Medicaid program. While most states automatically grant Medicaid coverage to all of their SSI recipients, enrollment is not 100 percent for two reasons. First, some eligible enrollees do not complete the necessary paperwork to enroll in the program. Second, twelve states have different and potentially more restrictive Medicaid eligibility requirements so that some SSI recipients are ineligible for Medicaid. Despite this, a recent study that used administrative data from SSA and the Centers for Medicare and Medicaid Services showed that more than 85 percent of SSI recipients are also enrolled for health insurance through Medicaid (Riley and Rupp 2012). Approximately one- in three SSI recipients received Social Security (OASDI) benefits in 2013. As discussed above, Social Security benefits are treated as unearned income and phase out SSI benefits one for one. Thus, an SSI recipient with a $300 monthly Social Security benefit but no other income would receive an SSI benefit that is $280 lower (recall the $20 income exclusion) than the maximum SSI benefit. More than half (56 percent) of elderly SSI recipients receive Social Security benefits, and the average monthly Social Security benefit among those who do receive it is $493 per month. Thirty percent of nonelderly adult SSI recipients also receive Social Security benefits, and virtually all of these benefits are paid through the SSDI program. Disabled applicants qualify for both SSI and SSDI if their work history is sufficient to qualify for SSDI, but their SSDI benefit is low enough that it does not completely offset their SSI benefit. The average monthly SSDI benefit among those SSI recipients with income from both programs was $534 monthly in December 2013. Only 7.5 percent of SSIenrolled children also received Social Security benefits in that same month, with most obtaining this as a dependent of a retired, disabled, or deceased worker. 21. The average benefit amount is not readily available for the thirty-three states that administer the state supplement directly.

18

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

Supplementary Social Income and Medicaid also play an important role for many SSDI awardees who must wait for five months from the onset of their disability before their SSDI benefits “kick in” and twenty-nine months before their Medicare benefits take effect (Riley and Rupp 2012). Some individuals awarded SSDI will receive SSI benefits for the first five months after the onset of disability if they satisfy the means test. Once the five-month waiting period is over, SSDI benefits take effect in month six and begin to offset the SSI benefit, often lowering it to zero. As a result, the number of individuals exiting the SSI rolls each year is artificially high because many are on just temporarily until SSDI payments begin. Participation in the Supplemental Nutrition Assistance Program (SNAP) is especially high among SSI recipients. According to recent data from the Survey of Income and Program Participation (SIPP), approximately three in five households with some SSI income also receive SNAP benefits. In contrast, only 8 percent of SSI households have any income from TANF and just 4 percent have any unemployment insurance benefits. As SSI benefits increase, a household’s SNAP benefits will typically decline. Adult SSI recipients living alone are categorically eligible for SNAP benefits, though things become more complicated when there are additional household members. Much previous research has examined the relationship between SSI and AFDC/TANF (e.g., Garrett and Glied 2000). While some households have income from both programs, an individual cannot receive benefits from both. Thus if one of two children in a one-parent family is on SSI, the relevant family size for AFDC/TANF benefit computation would be just two. The TANF is administered by states and benefit levels vary dramatically across states. For example, the maximum benefit in California is more than five times greater than in Mississippi. Previous research has shown that SSI enrollment is much higher in states with low AFDC/TANF benefits, no doubt partly because these states tend to have a higher fraction of people in or near poverty. The growth in SSI enrollment during the 1990s cushioned the effects of the dramatic decline in AFDC/TANF enrollment during the same period. Data from the SIPP indicate that children are now twice as likely to reside in a household with some SSI income as in a household with some TANF income (6.9 percent versus 3.4 percent). 1.3

Program Caseloads

There have been substantial changes in SSI caseload growth and the composition of the SSI caseload since the program began in 1974. Initially, SSI primarily paid benefits to the elderly; however, their share of the caseload has declined throughout the life of the program. Nonelderly adults’ share of the SSI caseload started to increase rapidly in the mid-1980s following a liberalization of the program’s medical eligibility criteria that we describe

The Supplemental Security Income Program

19

below. The number of children on SSI also increased rapidly during the early 1990s as a result of similar expansions in the medical eligibility criteria, and while welfare reform temporarily reduced the rate of child participation in SSI, the growth in child participation has increased again over the past decade. In addition to changes in numbers of participants, there is significant variation in participation across states and disabilities in each of these three age groups. 1.3.1

Caseload Trends

Figure 1.1 shows the trends in total caseload over time for each of the three age groups during the last forty years. The total caseload actually declined during the first ten years of the program, though it has more than doubled since 1983, increasing from 3.9 million in that year to nearly 8.4 million in 2013. The elderly caseload has remained at a stable level of about two million beneficiaries but has declined as a share of the total caseload from approximately 60 percent in 1974 to less than one-quarter in 2013. Over the same time frame, nonelderly adults increased from less than 40 percent of the total caseload to nearly 60 percent of the caseload, and children on SSI increased from less than 2 percent of the total caseload to over 15 percent of the total caseload.

Fig. 1.1

Total SSI caseload, 1974–2013

Source: Data from SSA (2014b).

20

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

Fig. 1.2A

Percent of elderly population on SSI, 1975–2013

Source: Data from SSA (2014b) and US Census Bureau (2014).

These changes in the percentage of the SSI caseload are mirrored by similar trends in SSI participants as a percentage of the total population in their age group. Figure 1.2A shows the steady decline in the elderly SSI population as a percentage of the total population age sixty-five and older, and figure 1.2B shows the substantial increase in SSI enrollment among nonelderly adults and children in the mid-to-late 1980s and early 1990s. Additionally, figure 1.2B demonstrates that while participation has increased for nonelderly adults of all ages, younger adults ages eighteen to forty-nine have experienced a larger relative increase in participation. Enrollment growth for all nonelderly groups slowed in the mid-1990s, though it has picked up (especially for children) since 2000. By 2013, SSI enrollment among children, adults eighteen to forty-nine, adults fifty to sixty-seven, and the elderly stood at 1.8 percent, 2.0 percent, 3.6 percent, and 4.7 percent, respectively.22 The 22. Part of the increase in SSI enrollment among nonelderly adults during this period reflects the aging of the baby boom generation. However, there were substantial increases in enrollment even within age groups. For example, the share of adults ages thirty to forty-nine on SSI increased from 1.0 to 2.0 percent during the 1985 to 2013 period and the increases were similar for the eighteen to twenty-nine (0.8 to 2.0 percent) and fifty to sixty-four (2.3 to 3.6 percent) age groups.

The Supplemental Security Income Program

Fig. 1.2B

21

Percent of nonelderly and child population on SSI, 1975–2013

Source: Data from SSA (2014b) and US Census Bureau (2014).

fraction of individuals living in a household with one or more SSI recipients is, of course, substantially higher. For example, according to data from the Survey of Income and Program Participation (SIPP), more than 6.5 percent of children are either on SSI or have a family member on the program. Because the child caseload has increased so significantly, in particular since 2000, we devote special attention to examining trends in the child caseload. While increases in the caseload during the early to mid-1990s were driven by loosening medical eligibility criteria in the wake of the Zebley decision, the more recent caseload growth occurred after the eligibility criteria for children were tightened during welfare reform. Furthermore, figures 1.1 and 1.2B show that even during a period of constant SSI eligibility criteria between 2002 and 2012, the child caseload increased 43 percent, growing from 915,000 to more than 1.3 million beneficiaries. Separating the caseload into physical disabilities, intellectual disabilities, and other mental disabilities (e.g., autism and ADHD) reveals that the caseload growth has been driven predominantly by the mental disability caseload. The caseload for mental disability diagnoses increased from 340,000 in 2002 to more than 700,000 in 2012. Over the same period, the physical disability caseload increased by only 24 percent (from 337,000 to 416,000). The number of SSI-enrolled chil-

22

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

dren with intellectual disability as the primary diagnosis declined by 47 percent, falling from 240,000 in 2002 to 127,000 in 2012. While the number of children receiving SSI for intellectual disabilities declined over the decade, this decline was not enough to offset the increases in the mental caseload (Aizer, Gordon, and Kearney 2013). While growth in the caseload has been driven by nonelderly participants, SSI still supports a substantially larger share of elderly adults in the total population. For example, less than 1 percent of children under age five are on SSI, and approximately 2 percent of children age five to seventeen and adults between eighteen and forty-nine are on SSI. However, approximately 3.6 percent of adults ages fifty to sixty-four are participating in SSI, and more than 4 percent of adults over sixty-five are on SSI. The gender composition of enrollees also varies substantially by age. Among children, boys are about two times more likely than girls to be enrolled in SSI. However, enrollment rates are approximately equal among adults in their thirties, forties, and fifties. There are about twice as many elderly women as elderly men on SSI, though this partially reflects the longer life expectancy of women. Table 1.2 examines award rates by age in 2013 and reveals a more nuanced picture. Among children, award rates are highest among those under the age of five, with nearly 50 percent of applications for children under five being accepted, compared to 30 percent of applications for children thirteen to seventeen. Award rates are relatively low among adults in their twenties and thirties with approximately 20 percent of applications being accepted. However, award rates increase substantially for applicants in their forties and fifties, with the award rate in the fifty to fifty-nine age range nearly twice that of the twenty-two to twenty-nine age range. This sharp increase could partially reflect the role of education and vocational factors in the disability determination process, which makes it somewhat easier to qualify when an applicant reaches age fifty.

Table 1.2

Percent of applications awarded benefits by age category, 2013 Total applications

Award rate (%)

157,736 219,915 80,965 134,823 109,576 110,090 314,498 451,106 598,354 160,883

49.8 32.5 30.8 35.9 23.7 22.9 23.8 27.9 43.3 39.6

Under 5 5–12 13–17 18–21 22–25 26–29 30–39 40–49 50–59 60–64 Source: Data is from SSA (2014e).

The Supplemental Security Income Program

23

Fig. 1.3 Percent of 2013 SSI disability caseload diagnosed with a physical disability Source: Data from SSA (2014e).

1.3.2

Qualifying Diagnoses

The composition of disabilities also varies substantially across age groups. Figure 1.3 shows that more than half of beneficiaries in the youngest and oldest age groups are eligible primarily on the basis of a physical disability—70 percent of children under age five and 65 percent of adults age sixty to sixty-four. In contrast, less than 30 percent of recipients between the ages of five and thirty-nine had a physical disability as their primary diagnosis. Mental and intellectual disabilities accounted for 57 percent of the total working-age adult caseload in 2013.23 As shown in table 1.3, intellectual disabilities constitute the largest category of nonphysical disabilities for adults in 2013, representing approximately 19 percent of the total nonelderly adult caseload. Mood disorders and schizophrenic disorders comprise the majority of the remaining mental disability caseload, accounting for 16 and 9 percent of the total caseload, respectively. The main categories of physical disabilities for adults include musculoskeletal conditions, which constitute 23. By comparison, new awards for mental and intellectual disabilities accounted for only 30 percent of adult awards (SSA 2014e), suggesting that the average duration of SSI enrollment is higher for beneficiaries with these conditions.

24

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

Table 1.3

Distribution of disability diagnosis for child and nonelderly caseloads

Primary diagnosis Congenital anomalies Endocrine, nutritional, and metabolic diseases Infectious and parasitic diseases Injuries Mental disorders (subtotal) Autistic disorders Developmental disorders Childhood and adolescent disorders not elsewhere classified Intellectual disability Mood disorders Organic mental disorders Schizophrenic and other psychotic disorders Other mental disorders Neoplasms Diseases of the: Blood and blood-forming organs Circulatory system Digestive system Genitourinary system Musculoskeletal system and connective tissue Nervous system and sense organs Respiratory system Skin and subcutaneous tissue Other Unknown

Age birth–17 (%)

Age 18–64 (%)

5.5 0.7 0.1 0.5 68.3 10.2 21.2 19.5 9.1 3.2 2.2 0.3 2.6 1.2

0.8 2.6 1.3 2.6 57.4 1.8 0.7 1.0 18.9 16.4 3.9 8.9 5.7 1.3

1.1 0.5 1.3 0.3 0.8 7.8 2.8 0.2 7.2 1.9

0.4 4.3 1.0 1.0 13.2 7.7 2.1 0.2 0.3 3.6

Source: SSA (2014b, table 35).

13 percent of the total caseload and over 20 percent of the total caseload for adults over fifty. Nervous system/sensory disorders account for approximately 8 percent of the total caseload and have higher concentrations among younger adults, accounting for over 10 percent of the total caseload for adults ages eighteen to twenty-nine. For children, nonphysical disabilities comprise approximately 68 percent of the 2013 caseload, with developmental, autistic, and other adolescent disorders accounting for 21, 10, and 19 percent of the total caseload, respectively. Another 9 percent of children have an intellectual disability as their primary condition. The largest categories of physical disabilities are congenital anomalies and nervous system/sensory disorders, representing approximately 5.5 and 8 percent, respectively, of the total caseload (SSA 2014b). Diagnoses and caseload size also vary substantially by gender and race. In 2013, men accounted for 47 percent of the working-age adult caseload. Adult men and women were equally likely to receive SSI on the basis of a

The Supplemental Security Income Program

25

mental or intellectual disability, with 59 and 56 percent of male and female recipients, respectively, receiving SSI for mental or intellectual disabilities. By contrast approximately two-thirds of the child caseload in 2013 was male, and 73 percent of boys received SSI for a mental or intellectual disability, relative to 58 percent of girls. Based on estimates from the SIPP, 54 percent of child SSI beneficiaries were minorities in 2013, as compared to approximately 25 percent of nonbeneficiaries. Slightly less than 40 percent of adult and elderly SSI beneficiaries were minorities in 2013, compared to approximately 20 and 13 percent of nonelderly adult and elderly nonbeneficiaries, respectively.24 In terms of raw counts, boys are disproportionately likely to have a mental disorder as their primary condition. However, the rate of growth in the mental disability caseload was similar for girls and boys over the past decade. The caseload for boys increased by 110 percent, from 6.7 cases per 1,000 in 2002 to 14.1 cases per 1,000 in 2011. The caseload for girls increased by 116 percent, from 2.5 cases per 1,000 in 2002 to 5.4 cases per 1,000 in 2011. Perhaps as a result of the similar rates of growth across gender, the composition of the mental caseload for children has remained relatively constant across the age and gender distribution over the past decade (Aizer, Gordon, and Kearney 2013). Despite the growth in the child SSI caseload over the past decade, new SSI allowances for children with mental disabilities have remained relatively constant. While applications to child SSI increased between 2002 and 2011, there were approximately 104,000 initial allowances for mental disabilities among children in 2002 and approximately 106,000 in 2007 (Aizer, Gordon, and Kearney 2013). While the number of allowances increased to nearly 132,000 in 2011, applications also increased by nearly 100,000 over the decade. As a result, the allowance rate for mental disabilities declined from 48 percent in 2002 to 41 percent in 2011 (GAO 2012). These trends suggest that caseload growth is likely driven by fewer children exiting the program, rather than more children entering SSI. Another important determinant of the size and growth of the SSI caseload is the rate of exit from SSI. In 2013, the median duration of SSI participation among nonelderly adults was approximately nine years (SSA 2014e). In 2013, the exit rate for nonelderly adults was approximately 10 percent. Among the 10 percent who left SSI, 60 percent left because of excess income or assets,25 22 percent left due to death, and approximately 7 percent left due to no longer meeting the disability criteria. Among children, the exit rate was only 5 percent of the caseload. Approximately 37 percent of children exiting SSI left due to excess income, 6 percent left due to death, and 24. Authors’ calculations are from the 2008 Survey on Income and Program Participation, Wave 15 (2013 data). 25. This component of the exit rate may be artificially high because it may include some SSI recipients who switch to SSDI after the five-month waiting period.

26

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

approximately 27 percent left due to no longer meeting the eligibility criteria (SSA 2014e). Variance in the frequency and thoroughness of CDRs also contribute to the trends on program exit. Between 2001 and 2011, the number of annual adult CDRs fell from 584,000 to 179,000, and the number of annual child CDRs fell from 150,000 to 45,000 (GAO 2014). As of January 2014, SSA estimated that it had a backload of approximately 1.3 million CDRs (GAO 2014). The low rate of program exit due to disability eligibility in both adult and child caseloads has been an issue of increasing concern for administrators and policymakers. 1.3.3

Geographic Variation in SSI Enrollment

The fraction of people enrolled in SSI varies substantially both across and within states, ranging from a low of 1 percent in North Dakota to a high of greater than 5 percent in West Virginia. Some of this is accounted for by differences across states in income levels, which we do not attempt to adjust for in the figures that follow. Figure 1.4 groups states into quartiles of the nonelderly adult participation rate distribution. The map reveals that states with the highest rates of SSI enrollment tend to be in the South, while many of those with low enrollment are in the West. Appendix table 1.A1 lists the fraction of nonelderly adults enrolled in SSI by state. There is also substantial variation within states in SSI enrollment. For example, in California, 2.6 percent of nonelderly adults receive SSI benefits. This state average masks considerable variation across counties: 1.0 percent of nonelderly adults in San Mateo County receive SSI benefits, as compared to 8.3 percent of their counterparts residing in Del Norte County (source data from SSA [2014f] and AHRF 2013). Exploring within-state variation to determine how much is driven by population characteristics versus factors such as program awareness or disability determination procedures would be a useful research endeavor. Participation in the child SSI program also exhibits substantial geographic variation, as displayed in figure 1.5. While most of the states with high adult participation also have high child participation, there are some differences. For example, while Texas is in the top quartile of child SSI participation, it is below the median for nonelderly adult SSI participation. The elderly caseload—mapped in figure 1.6—has a similar range and geographical pattern with the exception of two outliers: California and New York. In these two states, the elderly SSI caseload was approximately 13 and 9 percent of the total elderly population, respectively, which are the two highest state-specific enrollment rates. This likely reflects the more generous supplementation of SSI benefits in these states so that Social Security benefits are less likely to fully phase out the SSI benefits. In addition to variation in SSI enrollment rates across states, there is significant variation in caseload growth across states. While the majority

Nonelderly adult SSI population as percent of state nonelderly adult population, 2013

Sources: Data from SSA (2014f) and US Census Bureau (2014). Patterns on the map represent quartiles of the participation distribution.

Fig. 1.4

Child SSI population as percent of state child population, 2013

Sources: Data from SSA (2014f) and US Census Bureau (2014). Patterns on the map represent quartiles of the participation distribution.

Fig. 1.5

Elderly SSI population as percent of elderly adult population, 2013

Sources: Data from SSA (2014f) and US Census Bureau (2014). Patterns on the map represent quartiles of the participation distribution.

Fig. 1.6

30

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

of states with high caseload levels also experienced high growth, this is not true for all states. For example, consider the child SSI caseload. Texas had a relatively small child caseload in 2002 of approximately 9 cases per 1,000 children, compared to a high of 32 cases per 1,000 children in the District of Columbia and a low of 4 cases per 1,000 children in Hawaii. However, the child caseload in Texas increased by approximately 120 percent between 2002 and 2011, while it grew by approximately 50 percent in the District of Columbia and approximately 30 percent in Hawaii (Aizer, Gordon, and Kearney 2013) In an attempt to understand how the drivers of this growth relate to state characteristics, Strand (2002) examines variation in application and allowance rates across states for adult DI and SSI applications, and finds that approximately half of the variation in allowance rates can be explained by economic, demographic, and health factors. Similarly, Rutledge and Wu (2013) find that poor health is a significant predictor of the state SSI caseload and application rate. By contrast, Aizer, Gordon, and Kearney (2013) examine state-level variation in the child SSI caseload and do not find a significant relationship between caseload growth and state-level variation in population diagnosis rates, health insurance coverage, poverty, or unemployment rates. They find some evidence that participation in special education is positively related to child SSI caseload growth. Wittenburg et al. (2015) comes to a similar conclusion that there is not a single state or local factor to explain this variation. Future research could contribute to a better understanding of these geographic participation patterns. 1.3.4

Enrollment in Other Government Programs and Intergenerational Connection in SSI Receipt

An examination of data from 2008 SIPP reveals that many SSI recipients also obtain benefits from other safety net programs. Table 1.4 shows that more than half of child, adult, and elderly SSI beneficiaries receive food assistance from SNAP. Approximately 67 percent of children receiving SSI also receive SNAP, compared to just 22 percent of children not on SSI. Similarly, 58 and 56 percent of nonelderly adult and elderly beneficiaries receive SNAP, compared to 11 and 5 percent of nonbeneficiaries, respectively. Nearly all beneficiaries in each age group receive health insurance through Medicare or Medicaid. The high rates of participation in other means-tested programs are reflected in the income of households with SSI beneficiaries. Between 50 to 60 percent of all SSI households have incomes at or below 150 percent of the poverty line, compared to approximately 25 percent of nonbeneficiary households.26 Furthermore, a significant fraction of the SSI caseload participates in other Social Security programs, either disability (SSDI) or retirement 26. Author calculations from the 2008 Survey on Income and Program Participation.

The Supplemental Security Income Program Table 1.4

31

Individual SSI beneficiaries compared to others in the age cohort, 2013 Child < 18

 

No SSI

SSI

SSDI (ages 18–64) SS retirement (ages 62+) Medicaid Medicare SNAP TANF WIC UI Any noncash benefit Any cash benefit Any housing benefit Obs. (unweighted) Percent of total pop. (weighted) Percent of age category pop. (weighted)

Adults 18–64 No SSI 0.03 0.31

SSI 0.29 0.21

Adult 65+ No SSI 0.85

SSI 0.67

0.35 0.00 0.22 0.03 0.07 0.00 0.52 0.08 0.06

0.83 0.00 0.67 0.06 0.15 0.00 0.99 1.00 0.31

0.08 0.03 0.11 0.01 0.03 0.02 0.31 0.05 0.03

0.93 0.29 0.58 0.05 0.03 0.01 0.97 1.00 0.25

0.04 0.97 0.05 0.00 0.00 0.00 0.15 0.03 0.03

0.95 0.99 0.56 0.01 0.00 0.00 0.98 1.00 0.36

16,387 0.232 0.982

302 0.004 0.018

41,932 0.604 0.968

1,509 0.020 0.032

11,782 0.133 0.958

562 0.006 0.042

Source: Data from Wave 15 of the 2008 Survey of Income and Program Participation. Note: Statistics calculated using SIPP reference month person weights (wpfinwgt). All respondents are in only one category above.

(OASI). Approximately 30 percent of adult SSI beneficiaries also receive SSDI, while two-thirds of elderly adults on SSI in the SIPP also report receiving OASI retirement benefits.27 Comparing households with a beneficiary in a given age category reveals substantial overlap in SSI participation across ages, in particular between nonelderly adults and children. For example, nearly 30 percent of households with a child on SSI also have a nonelderly adult on SSI. Similarly, 22 percent of households with an adult SSI beneficiary include a child on SSI, conditional on also having a child in the household. 1.4 1.4.1

Economic Issues Conceptual Issues

The SSI program for nonelderly adults provides a transfer of income targeted to disabled individuals who are presumed to have limited capacity to obtain financial security through their own paid employment. The SSI program for children provides a transfer of income to families who have to 27. According to the SSA Statistical Supplement, approximately 56 percent of aged SSI beneficiaries also receive OASI. The higher dual participation rate reported in the SIPP could reflect respondents confusing the two programs.

32

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

contend with the burden of caring for a disabled child. As outlined in the introduction, there are four sets of theoretical issues that are of primary importance when it comes to the SSI program. First, there are conceptual questions related to the advantages and disadvantages of categorical eligibility requirements. Second, there are issues related to systematic disincentives to accumulate earnings and assets inherent to most means-tested transfer programs. Third, there are questions about long-term benefits and costs to program participants, in terms of whether the program adequately and appropriately serves the needs of disabled individuals and their family members. And fourth, there are important issues about program spillovers, both across programs and across federal and state levels of government. In this section, we describe each of these sets of issues. We review empirical evidence on these issues later in the chapter. Categorical Eligibility The SSI eligibility is based in part on an applicant’s successful demonstration of a disability that renders the individual unable to perform adequately in the labor market. But defining what it means to be unable to work or work at a sufficient level of earnings is not a precise concept. The ideal design of an income-support program balances the social benefit of income redistribution against the social costs of labor supply disincentives. A key justification for a program with a categorical disability requirement is that by targeting such individuals, the program can transfer more resources to truly “needy” individuals, achieving greater targeting efficiency at a lower cost of productivity efficiency. Akerlof (1978) and Nichols and Zeckhauser (1982) showed that by requiring a categorical “tag,” an income-redistribution program can more effectively screen out individuals who would “masquerade” as being in need of government assistance when they simply have a high disutility of work, but not an actual impediment to work. When a tag works as it should, the likelihood of Type II errors is reduced, meaning that fewer “undeserving” individuals will qualify, which leaves more resources available for those who are truly in need of income assistance. This comes at a trade-off with Type I errors, whereby some individuals who truly do need income assistance are erroneously labeled as not sufficiently disabled, or as Kleven and Kopczuk (2011) point out, are discouraged from applying. In their seminal paper on the design of optimal disability insurance, Diamond and Sheshinski (1995, 10) aptly noted that “any attempt to evaluate abilities to work will be subject to two types of error-admission of people ideally omitted and exclusion of people ideally admitted.” The authors describe how, in the design of a disability benefit program, the challenge of balancing income redistribution and labor supply disincentives is even more complicated than in a typical income-maintenance program because of the imperfect nature of defining disability. They note that blindness automati-

The Supplemental Security Income Program

33

cally qualifies an individual for a disability benefit in the United States, even though many blind people choose to work instead. So the challenge is not simply that the severity of the medical condition is difficult to measure, but rather that the medical problem alone is not a sufficient guide to the disutility of work. They show that in a scheme where health status is costlessly but imperfectly observable, it is still optimal to provide a disability benefit program that screens on the basis of health such that the probability of being accepted onto the program increases with level of disability. Parsons (1996) extends this framework to consider the optimal benefit structure of social insurance programs in the presence of two-way misclassification error whereby some members of the target group do not have the tag and some members of the nontarget group do. This leads to a four-way payment system, in contrast to the three-way payment system of Diamond and Sheshinski (1995). Parsons concludes that a dual-negative income tax system is optimal, with transfer payments that are more generous for nonworkers with the tag as compared to those without, and with a premium paid to program-eligible individuals who work. Parsons further observes that the design of social insurance programs in the United States omits one of these prices, namely, work incentives for individuals assessed as program eligible. Kleven and Kopczuk (2011) develop a model that builds on the Diamond and Sheshinski (1995) model by considering what happens to the optimal benefit design when it is costly to observe health status. Their model explicitly considers complexity in social programs as a byproduct of costly efforts to screen between deserving and undeserving applicants. The authors observe that while a more rigorous screening technology may have desirable effects on targeting efficiency, the associated complexity introduces transaction costs into the application process and may induce incomplete take up. An additional, related problem not addressed in the Diamond and Sheshinski framework is that the link between a medical condition and labor supply will vary with economic conditions. For example, consider an individual with limited education and a verified condition of extreme back pain. Such an individual might not be able to perform physical labor, but could perform a desk job. However, the availability of desk work for an individual with limited education will depend crucially on local economic conditions. How should the design of SSI or SSDI requirements respond to these varying linkages between health status, economic conditions, and ability to work? This is an issue that warrants focused attention and, to date, has not received a thorough treatment, either theoretically or empirically. Another important consideration relevant to the categorical eligibility requirement is the possibility that disability status is mutable, and individuals might distort their behavior to select into the “disabled” category. To the extent that individuals distort their health or behavior so as to qualify as disabled—or to have their child labeled as disabled—the loss in social

34

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

welfare might exceed the benefits of the income transfer to such individuals. As the SSI caseload has become increasingly comprised of difficult-to-verify conditions, namely pain and mental disabilities, the possibility of less precise categorical labeling has increased. Furthermore, because the program is not meant to be temporary, any distortions in behavior resulting from the program can potentially be long lasting. Work and Savings Disincentives As is common to all income-support programs that establish benefits to be a decreasing function of earnings and assets, there is the trade-off between income protection and distortions to the labor supply and savings decisions of benefit recipients. As described above, SSI enrollment affects the incentive to work through an increase in the effective marginal tax rate in the phase-out region. This effect is not limited to the SSI recipient but can extend to other family members, including spouses and parents. Of course, a program that is predicated on the concept of inability to work would not have labor supply disincentives if that inability to work was a fixed or precise concept. For this reason, when one considers the effects of the SSI program on nonelderly adult beneficiaries, the issue is perhaps more appropriately considered an issue of imperfect categorical labeling than a typical labor supply disincentives issues. When it comes to the child SSI program, we return to the paradigm of more typical labor supply disincentives. In that program, there is a question about whether other members in the household are discouraged from earning income, since additional income can cause a child in the family to lose SSI eligibility, and because SSI child benefits are a function of family income. This leads to the classic labor supply disincentives introduced by any means-tested income transfer program. The large income exclusions described above may substantially reduce the efficiency costs for families with children on SSI. In addition, SSI has asset eligibility requirements for all three groups— children, nonelderly adults, and the elderly. The concept of asset limits raises the possibility that individuals are discouraged from saving or accumulating assets in order to apply for the program. Hurst and Ziliak (2006) provide a recent examination of this theoretical possibility in the context of welfare reform policies that relaxed asset restrictions in the Temporary Assistance to Needy Families (TANF) programs, finding no evidence of savings responses in response to relevant policy changes. We review the evidence on savings and the SSI program below, which focuses primarily on the incentives for adult SSI recipients. The reduced incentive to save may be especially harmful for children on SSI. Consider a family that wants to save for future educational or health care costs for a disabled child. Even a modest amount of savings by the parents can lead to the termination of the child’s SSI benefits.

The Supplemental Security Income Program

35

Benefits and Costs to Participating Individuals The typical benefit of a short-term means-test income support program, such as unemployment insurance, is consumption smoothing. By providing income support through a period of temporary economic struggle, a transfer program allows an individual or family to maintain a floor and a smoother trajectory of consumption. But SSI is different than a typical program in that it is explicitly not intended to be temporary. The more relevant question for benefits of the program is: What would an individual’s income and consumption be in the absence of this explicit disability benefits program? In addition, are there health benefits that accrue to an individual who qualifies for SSI that would not be obtained if income were obtained through other means, either through work or other sources of unearned income? In this subsection, we raise a number of other conceptual issues related to the benefits and costs of program participation. First, when considering the benefits of the SSI program to families with a child SSI recipient, one returns to the issue of justifying the payment of additional income to low-income families with a disabled child. One potential justification is that the presence of a disabled child in a family makes it more difficult for a parent to work outside the home. An empirical examination by Powers (2001) confirms this to be true. Using data from the School Enrollment Supplement to the October 1992 Current Population Survey, the author finds large negative effects of having a disabled child on the probability that a wife or female head of household participates in the labor force, controlling for family and individual-level characteristics. The size of the effect is substantial, comparable to having a child under the age of five in the house. Another possibility is that families with a disabled child incur more health care expenses. Related research by Buescher et al. (2014), Stabile and Allin (2012), and Rupp and Ressler (2009) further suggests that parents of children with disabilities confront substantial financial costs and additional challenges in the labor market. These observations raise two important questions. First, is the income received from the SSI program sufficient to make up for the income losses and higher expenses experienced by families with a disabled child? Second, do families use the additional income received from SSI to pay for goods or services that lead to improved parental work outcomes or improved health conditions for the disabled child? Both of these questions are open for research. A second conceptual issue is whether the current structure of SSI is optimally designed to serve families with disabled children. Recall from section 1.2 that conditional on qualification, the level of SSI benefits is the same for disabilities with different severities. It is therefore plausible that the income support from the program more than offsets potential losses

36

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

of income experienced by individuals (or families of children) with a fairly mild disability, but is not sufficient to support individuals (or families of children) with a severe disability. Furthermore, an individual or a child only maintains SSI eligibility if his condition does not show dramatic signs of improvement. This raises the possibility that individuals do not pursue paths to improvement or that parents withhold intervention treatments from their children in order to maintain eligibility. A third issue that is especially relevant to a child’s experience on SSI or experience trying to qualify is whether the labeling of the disability has positive or negative consequences. On the one hand, the existence of the SSI program provides a financial incentive for families and administrators to evaluate a child for a disability and label that child with the qualifying diagnoses.28 For children whose limitations might otherwise have gone unrecognized, this could have a beneficial effect of awareness and treatment. On the other hand, the label itself could lead to hindered educational opportunities or a reduced sense of urgency on the part of the parent or older child to overcome the limitation. These are conceptual considerations, with little rigorous empirical evidence. A fourth and final issue is that SSI enrollment may lead to long-term dependency, both for children and nonelderly adults. Perhaps some qualifying individuals, with the proper individualized attention, would overcome a less severe disability. But one consequence of the SSI program is that parents and family advocates might be inclined to hold onto that label, in order to maintain eligibility for program benefits. This is an interesting question for future research to explore. Program Spillovers The federal nature of the SSI program serves a broad redistribution purpose, but it also imposes fiscal externalities between state and federal governments and programs. Benefit levels of the federal SSI program are relatively generous, especially compared to TANF cash benefit awards in low-benefit states. Thus, the award of SSI can amount to large transfers of federal dollars to individual states. Researchers have considered the extent to which individuals and states substitute SSI program benefits for statefunded transfer programs and how program features make this shifting more or less likely. We review this evidence below. 28. The notion that rates of child disability diagnoses would vary with financial incentives is not to be dismissed. Cullen (2003) presents evidence from school districts in Texas showing that a 10 percent increase in the supplemental revenue received by a district for having a disabled student leads to an approximately 2 percent increase in the fraction of students classified as disabled. As would be expected, she finds that this responsiveness is larger for disability categories that are milder and less precise, such as learning disability and speech impairment.

The Supplemental Security Income Program

1.4.2

37

A Review of the Evidence

Some of the most convincing evidence on the effect of the SSI program on individual and family outcomes has taken advantage of specific policy changes such as those following the 1990 Sullivan v. Zebley decision or changes in SSI around the time of welfare reform in 1996. These analyses use difference-in-differences or regression discontinuity approaches to capture the causal effect of SSI participation on outcomes of interest. Other studies exploit variation in other programs including AFDC/TANF, health care eligibility, or special education programs to study interactions between SSI and these programs. A third empirical approach found in the literature is the use of panel data on individuals before and after their determined eligibility for SSI to examine the effect of SSI participation on individual and family outcomes, controlling for individual-level fixed effects. Researchers have relied on a combination of public-use survey data and program administrative data to tackle these questions. Of course, there are trade-offs to each of these data sources. Surveys often contain the relevant information to answer important questions in this literature, but have limited sample sizes. Administrative data sources provide large samples and detailed information on earnings and program participation, though they may not include other information that would allow richer investigations, such as information about the use of other programs or other family members. Increased linkages between various administrative data sources or further linkages between administrative and survey data would provide valuable opportunities for researchers to answer many of the questions we highlight here. The Impact of Child SSI Participation on Short-Term Outcomes There is some evidence that the receipt of child SSI income leads to a net increase in family income and a decrease in poverty rates. Duggan and Kearney (2007) consider how a child’s enrollment in the SSI program affects short-term family outcomes including poverty, household earnings, and health insurance coverage. The authors make use of the longitudinal nature of the SIPP to identify a change in household outcomes at precisely the time that the household begins receiving child SSI benefits, controlling for unobserved differences across households and observed outcomes in these same household in the months leading up to and immediately following a child’s first enrollment in SSI. They find that child SSI participation increases total household income by an average of approximately $316 per month, or 20 percent. The estimates suggest that for every $100 in SSI income transferred to a family, total income increases by more than $72. The enrollment of a child in the SSI program appears to lead to a small offset of other transfer income but very little, if any, impact on parental earnings.

38

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

Duggan and Kearney (2007) additionally find that for every one hundred children who enroll in SSI, twenty-two children and thirty-seven people are lifted out of poverty and an additional twenty-eight people see their incomes increase to more than twice the poverty line. These results suggest that the increase in child SSI enrollment over recent decades has potentially played a large role in lowering child poverty rates below what they otherwise would have been. Providing further evidence of the antipoverty effects of the SSI program, Schmidt, Shore-Sheppard, and Watson (2013) find that SSI program participation leads to a reduction in the likelihood that a family reports being food insecure. In a more recent investigation of the parental labor supply effects of child SSI participation, Deshpande (2014b) estimates the effect of removing children from the SSI program on parental earnings and household income. The author uses administrative data from the Social Security Administration and implements a regression discontinuity and a difference-in-difference design that exploits SSA budget cuts for child medical reviews. As mentioned in section 1.2, most children on SSI are scheduled to have their cases reviewed every three years to determine if they are still medically eligible for the program. However, in recent years, budget cuts have prevented SSA from conducting all the reviews that were scheduled. In fiscal year 2005 there was a large cut in the budget for these medical reviews, and as a result there was a sharp decline in the probability of a child being removed from SSI at the beginning of the fiscal year. Deshpande’s analysis takes advantage of this discrete change in the probability of removal at the beginning of fiscal year 2005. Her estimates suggest that a loss of $1,000 in a child’s SSI payment is fully offset by increases in parental earnings, driven entirely by intensive margin responses. The large earnings response is somewhat at odds with previous estimates from the welfare literature that suggest smaller parent labor supply elasticities with respect to child benefits, in particular the SSI results of Duggan and Kearney (2007) described above. Deshpande suggests that the discrepancy might reflect asymmetric responses to benefit gains—which is what Duggan and Kearney (2007) observe—and benefit losses—which is what Deshpande (2014b) observes. An additional finding of the study by Deshpande (2014b) is that the removal of a child from the SSI program leads to lower rates of DI applications among parents and siblings. This finding is consistent with recent work by Dahl, Kostol, and Mogstad (2014) demonstrating family spillovers in the likelihood of applying for Disability Insurance; those authors find that in the context of Norway, individuals are more likely to apply for DI if they have a parent on the program. A remaining question for future research is how families use the additional income that they receive from the SSI program and to what effect. There is some evidence from other programs on this topic, but not specifically for SSI. For example, Meyer and Sullivan (2004) explore the effect of changes in

The Supplemental Security Income Program

39

welfare reform and tax policy on measures of consumption, Dahl and Lochner (2012) examine the impact of EITC receipt on educational outcomes for children, and Evans and Garthwaite (2014) examine the impact of EITC on maternal mental health. To the best of our knowledge, there has been virtually no work of this kind specific to SSI. Future research should consider how families make use of the additional income brought into the home by SSI and whether they are spent disproportionately on the recipient child. To fully understand the benefits of the SSI program, it would be useful to know whether the resources are used to fund additional consumption or parental leisure, to purchase market-provided childcare that allows parents to work outside the home, or whether the additional income leads to investments in education or health at either the child or family level. Future research is also needed on the extent to which the incentives that the SSI program creates for families to obtain a disability diagnosis for their child leads to beneficial outcomes (say, by raising the parents’ awareness of need and ability to pursue helpful interventions). We also need evidence about the extent of harmful reactions to this incentive. For example, the 2010 Boston Globe series written by Patricia Wen described with compelling and troubling anecdotes an unintended side effect of SSI—the overmedication of children with psychotropic drugs in order to qualify for SSI benefits. However, the more systematic study by the GAO suggests that overmedication is not a widespread phenomenon among SSI recipients. The Impact of Child SSI Participation on Long-Term Outcomes To better appreciate the normative implications of SSI participation among children with disabilities, we need an understanding of the longterm outcomes for SSI recipients. One way to learn about this issue is to study the transition to adulthood for child SSI recipients. Do we see that relatively many child SSI recipients are able to productively transition into employment after age eighteen? Or do they remain dependent on government transfer programs, either SSI or another program? Does SSI participation enhance, impede, or have no impact on their long-term opportunities and human capital development? Loprest and Wittenburg (2005) provide a descriptive look at the transition experiences of child Supplemental Security Income (SSI) recipients just prior to and after age eighteen. They use year 2000 data from the National Survey of Children and Families (NSCF) to study the work preparation activities and family circumstances of a pretransition cohort of youth age fourteen to seventeen and a posttransition cohort of individuals age nineteen to twenty-three, comparing income, work, and personal and family circumstances of those on SSI benefits after age eighteen to those who no longer receive these benefits. The data indicate that only a minority of pretransition SSI recipients had ever participated in vocational training or vocational rehabilitation (VR) and many had never heard of SSI work-

40

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

incentive provisions. Their findings for the posttransition cohort show that those who no longer receive SSI at age eighteen tend to be in better health and are more likely to be working than those who continue on benefits. They also find that among those who are removed from the SSI program at age eighteen, most continue to have incomes below poverty and about one-half dropped out of school and one-third have been arrested. As the authors note, these findings are relevant to ongoing efforts to improve the transition process for child SSI recipients and to understand some of the circumstances of young people after the age eighteen redetermination. Additional descriptive evidence from Rupp, Hemmeter, and Davies (2015)  examines the long-term receipt of SSI and DI among child SSI recipients from a variety of award cohorts. They find that, in general, child recipients from more recent cohorts receive benefits for a shorter period of time. They find that ten years after the SSI award, approximately 45 percent of the 2000 child SSI award cohort receives neither SSI nor DI, compared to only 25 percent of the 1980 cohort. They note a sharp break in the trends in transitions off disability benefits between cohorts who likely were not affected by the introduction of age eighteen redeterminations and other eligibility restrictions in 1996 (i.e., cohorts from the 1980s and early 1990s) with cohorts who likely were affected (i.e., 1995 award cohorts and later). The authors also conduct a decomposition analysis that is consistent with their hypothesis that the change in trends is likely driven by policy changes rather than observed changes in characteristics of the child SSI caseload over time. Additionally, they find that relatively few SSI child recipients transition to DI as adults: ten years after the award, approximately 9 percent of the 1980 cohort received DI alone or concurrently with SSI, but this fraction falls to 3 percent for the 2000 cohort.29 Deshpande (2014a) builds on this descriptive work with a carefully designed empirical analysis. Her empirical approach exploits a policy change that increased the number and stringency of medical reviews for eighteenyear-olds, implemented as part of the 1996 PRWORA legislation. The law was written such that children with an eighteenth birthday after the law’s enactment on August 22, 1996, experienced a discontinuous increase in the probability of being removed from the program, as compared to his counterpart with an earlier eighteenth birthday. This sets up the conditions for a regression discontinuity empirical approach to examining the relationship between program removal and subsequent outcomes. To conduct her analysis, Deshpande makes use of confidential SSA files. She links data from the Supplemental Security Record (SSR), which provides demographic information on SSI children, to the CDR Waterfall File, which gives information 29. While fewer cohorts can be compared over longer time frames, the fraction of recipients continues to increase over time. For example, 23 and 18 percent of the 1980 and 1995 cohorts received DI benefits twenty years after their initial child SSI award.

The Supplemental Security Income Program

41

on all medical reviews for children and review. She links these child records to long-term outcomes using several additional SSA data sets, including the Master Earnings File (MEF) and the Master Beneficiary Record (MBR). Deshpande (2014a) finds that SSI youth who are removed from the program earn on average $4,000 per year, an increase of $2,600 relative to the earnings of those who remain on the program, and not enough to make up for the $7,700 lost in annual SSI benefits. She finds that those who were removed from the program spend on average nearly sixteen years (the entire posttreatment period observed) with observed income below 50 percent of the poverty line, as compared to five years for those who are not removed from SSI at age eighteen. Importantly, these average effects mask heterogeneous responses. For some individuals, the removal from the program spurs increased work effort. The likelihood of maintaining earnings above $15,000 is 11 percent higher among those removed from the program, and this difference grows over time. An additional important finding is that income volatility is increased for those who do not maintain program eligibility. The insight gained from Deshpande’s work is important to understanding the economic hardship faced by SSI recipients who are terminated from the program at age eighteen. But, an important limitation to this work is that it does not answer the question of how those individuals would have fared if they had not spent earlier years on SSI. There exists the possibility that a child who is raised on SSI, or spends his or her teenage years receiving SSI, develops a different set of aspirations and invests less in human-capital accumulation. Alternatively, the additional income from SSI could lead to more investment in the child and better educational outcomes. Either scenario would likely have an effect on long-term outcomes. What we learn from the Deshpande (2014a) evidence is that individuals who are removed at age eighteen are not readily able to transition into stable employment. One potential policy implication from this is that more transition support programs and work-training programs for individuals with (mild) disabilities would be beneficial. But the question of whether those individuals would have had improved long- term outcomes if they had not received child SSI income at all or for some length of time remains an open question.30 A related question to the issue just raised is how SSI participation as a child impacts the likelihood of government transfer receipt as an adult. 30. Coe and Rutledge (2013) use data from the National Health Interview Survey linked to Social Security Administration data to compare short- and long-term outcomes of children who enrolled in the SSI program during three eras that they defined as pre-Zebley (1987–1990), Zebley (1991–1996), and post-Zebley (1997–1999). They observe that recipients are less likely to report care limitations as a child, to accumulate more work experience and less time on welfare as adults, and to be slightly less likely to have health insurance as adults. It is hard to draw strong conclusions from this analysis, however, since these differences presumably reflect (to some unknown degree) differences in sample composition. It is not surprising that children who entered SSI during the “lenient” years would be less disabled on average, and thus ultimately experience better outcomes.

42

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

Does participation in this long-term form of assistance foster dependency on government transfers? Research is needed that both describes the associations between SSI program participation and later outcomes, but also empirically identifies the causal impact of child SSI receipt on later life program participation. Another way to pose this question is to consider whether a child with a similar condition who received TANF instead of SSI is less likely to “graduate” into government assistance at age eighteen. And importantly, how does any such difference translate into differences in labor force participation, future educational investment, and total earnings and economic well-being? Of course, this presents a significant challenge for researchers because the selective process by which individuals apply for, receive, and continue to receive SSI benefits suggests they are quite different from those not on the program. SSI and Boys An important demographic issue that arises in the context of the child SSI program is the disproportionate medical qualification of boys, and minority boys in particular. Duggan and Kearney (2007) examine pooled SIPP data from 1992, 1993, 1996, and 2001 to explore the predictors of SSI participation and how these compare to the demographic predictors of AFDC/ TANF enrollment. They find that family structure, parental education, and race/ethnicity relate to program participation in similar ways between the two programs. In particular, children from single-parent families and less educated parents are more likely to enroll in both SSI and AFDC/TANF, as compared to children from two-parent families or higher-educated parents. Black children are more likely to enroll than either Hispanic or white children, other characteristics held constant. A notable departure between the two programs is that conditional on other background characteristics, families with relatively more boys are significantly more likely to participate in the SSI program. This is consistent with the disproportionate presence of boys among the SSI caseloads, and the disproportionate likelihood that boys are diagnosed with mental disabilities and behavioral disorders. What should we make of the disproportionate participation in SSI of boys and minority black boys in particular? Does this reflect under-, over-, or accurate placement? Is the system “optimally” diagnosing boys? The biological and medical literatures provide overwhelming evidence that boys are more likely to have mental and behavior disorders, something economists have recently come to research in terms of a “noncognitive deficit.” What metrics would we use to evaluate whether the extent of medical and disability determinations are accurate or medically, rather than socially, based? In other words, to what extent are boys with social or behavioral issues being diagnosed as medical problems, and what does this imply for the optimal design of the SSI program? A separate question is whether the SSI program is particularly important

The Supplemental Security Income Program

43

for boys from single-parent, low-income homes, and whether enhanced program features would have even greater benefits for qualifying boys. Bertrand and Pan (2013) build on the literature about the importance of noncognitive skills for educational and labor market success and the deficit that boys appear to experience along this dimension. The descriptive picture they present about the “trouble with boys” (from the title of their paper) is based mainly on data from the Early Childhood Longitudinal Study– Kindergarten cohort. They document that boys do especially poorly in broken families and that the early school environment has little impact on the noncognitive functioning of boys in contrast to girls. They further demonstrate that boys appear to be particularly responsive (in a negative way) to the lack of parental resources experienced in a single-parent home. An important question is to what extent does and could the SSI program mitigate these challenges facing boys from single-parent, low-income homes? Program Interactions: Child SSI Low-income individuals with a qualifying disability or with a child with a qualifying disability will often have a financial preference for the SSI program over TANF. As noted above, the SSI program is not time limited and does not involve work requirements. In states with low levels of cash benefits for TANF, this financial incentive is relatively larger. Furthermore, states have a financial incentive to shift TANF recipients or applicants to the SSI program, since SSI benefits are paid for by the federal government. The gap between TANF and SSI benefits has tended to grow over time, since SSI benefit levels are automatically adjusted for cost-of-living changes, and TANF benefits are not, and have been declining in real terms. Existing research has documented significant interactions between SSI and the Aid to Families with Dependent Children (AFDC) program in the years prior to welfare reform. Garrett and Glied (2000) uses administrative data on the total number of child SSI participants in each state and examines how the generosity of child SSI payments relative to the generosity of AFDC payments affect child SSI participation before and after the Zebley decision. They find that states with the highest AFDC benefits saw the smallest increase in SSI participation among children after the Zebley decision was implemented. Using similar variation, Kubik (1999) examines individual-level survey data from the Current Population Survey and the National Health Interview Survey, and finds that families with lower potential SSI payments were less likely to identify disabilities in their child, and were also less likely to receive an SSI payment—although the data does not distinguish whether the SSI payment was received for a child or an adult. In one of the few studies examining this interaction after welfare reform, Schmidt and Sevak (2004) demonstrates that single women living in states that were early adopters of welfare reform policies—which generally tightened the eligibility criteria for welfare benefits—were more likely to

44

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

report SSI receipt. This set of findings across papers implies that individuals respond to differences in benefits across programs in a way consistent with utility-maximizing behavior. There is an additional, perhaps even more interesting, dimension to the shifting of AFDC and TANF caseloads to the SSI program: this shift moves the financial burden of benefit payments from states to the federal government. Recall that SSI benefits are paid for entirely by the federal government, except in the case of state supplementation. In contrast, the cost of AFDC benefits were shared between states and the federal government, with this difference now amplified because states are essentially given block grants for their TANF programs. This means that states would benefit financially from shifting the AFDC caseload onto the federal SSI program. In a paper that confirms that states respond to that financial incentive, Kubik (2003) uses state-level data on AFDC and SSI caseloads and shows that states experiencing unexpected negative revenue shocks experienced larger increases in the size of their child SSI caseload relative to their AFDC caseload. This finding can be interpreted as evidence of fiscal spillovers between different levels of government and has implications for the optimal design of programs in terms of state and federal cost sharing. There are two other potentially important program interactions relevant to the child SSI caseload—interactions with Medicaid and health insurance more generally, and interactions with special education programs. Work by Anna Aizer (2008) using the Early Childhood Longitudinal Survey– Kindergarten cohort (ECLS-K) shows that gaining access to health insurance through state-level expansions of the Children’s Health Insurance Program has a sizable impact on the likelihood of a child reporting a mental disorder diagnosis and treatment. This raises questions about how access to health insurance affects the likelihood that a child will gain access to a qualifying SSI determination. Whereas Duggan and Kearney (2007) consider how SSI participation affects health insurance coverage rates, it would be useful to explore the reverse relationship of how health insurance access affects SSI participation. Aizer, Gordon, and Kearney (2013) find little relationship between state-level changes in health insurance coverage and SSI caseload growth, but additional exploration of this potential relationship is warranted, especially following implementation of the Affordable Care Act. In addition to the link with health insurance, it is important to understand how the SSI program and the educational system interact in terms of establishing disability, school needs, and SSI and special education eligibility. As reported in table 1.3, a striking 68 percent of the child SSI caseload has a primary diagnosis of a mental disorder. Given this diagnostic composition of the SSI caseload, it stands to reason that SSI eligibility determinations overlap with special education determinations. Such conditions often show up in the educational system as learning disabilities or behavioral problems, often recognized by poor classroom performance. Survey data indicate that

The Supplemental Security Income Program

45

approximately 70 percent of child SSI recipients participate in special education at some point during their school years (Rupp et al. 2006). As an empirical matter, it is difficult to disentangle the causal pathway from special education assignment to SSI participation versus the causal relationship running from SSI enrollment to special education assignment. An unpublished 2007 working paper by Jessica Cohen presents evidence suggesting that increases in the SSI caseload brought about by the Zebley decision led to a significant increase in special education classification. Thinking about the relationship in the other direction, we note that special education determinations are made at a local level and depend greatly on the discretion of staff at the school level, and guided by policy set at the state level. The prevalence of special education classification varies widely across states, including variation in whether students need a diagnosed disability to be classified as eligible for special education. Aizer, Gordon, and Kearney (2013) provide evidence of an association between the prevalence of special education in a state-year and state-year SSI caseloads. Specifically, they find that special education is predictive of initial allowances, but not application rates. It could be that participation in special education contributes to caseload growth via increases in the likelihood of application acceptance by, for example, lending greater credibility to the claim of disability. Cullen and Schmidt (2011) provide additional evidence of a link between these programs. Building on the observation in Cullen (2003) that localities in Texas with greater fiscal incentives to label children as disabled experience relative increases in special education caseloads, Cullen and Schmidt (2011) find larger relative increases in SSI caseloads in such localities. Exploring these linkages in greater depth is an area worthy of additional research. Evidence on the Effect of SSI Participation for Working-Age and Elderly Adults Previous research suggests that the rise in SSI enrollment among nonelderly adults that began in the mid-1980s was driven by three main factors. First, there was a liberalization of the program’s medical eligibility criteria in 1984 that made it easier for individuals with more subjective conditions such as back pain and mental disorders to qualify for the program (Rupp and Stapleton 1995; Autor and Duggan 2003). Second, given that SSI enrollment rates rise with age, the aging of the baby boom generation led to a mechanical increase in SSI enrollment (Duggan and Imberman 2009). And finally, cutbacks in state general assistance programs increased the number of individuals applying for and ultimately receiving SSI benefits (Rupp and Stapleton 1995). Nonelderly adults who participate in SSI have very low labor force attachment, with just 4 percent having nonzero earnings in 2013. Because of this, the issue of work disincentives is perhaps not as pertinent as it is for other means-tested transfer programs. This likely explains why there are not as

46

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

many studies of the effect of SSI program participation on outcomes for nonelderly adults. One exception is a study by Bound, Burkhauser, and Nichols (2003), who use panel data from the SIPP linked to SSA disability determination records to trace earnings and income for adult SSDI and SSI participants. They find that the earnings of applicants decline around the time of SSI application, but in terms of absolute changes these reductions are quite small, since labor income is very low for SSI applicants. The data indicate that the increase in benefit income received by SSI awardees in the months after initial application is largely offset by reductions in spousal income and other transfer income. Their findings suggest that SSI program participation does not lead to a sizable increase in household income for SSI adult awardees, on average. However, presumably there is underlying heterogeneity, and for some SSI recipients who do not have access to spousal income or AFDC benefits from other family members, benefits from this program constitute a sizable increase in income. In a series of studies, Neumark and Powers have investigated the behavioral responses of older adults to potential SSI eligibility under elderly categorical eligibility.31 Recall that for elderly applicants, eligibility is based on income and assets and does not require a disability determination. Neumark and Powers (2000) examine the preretirement labor supply of men as they near age sixty-five, using SIPP data. Their analysis uses a triple-difference strategy and finds that in states with more generous state supplementation of federal SSI benefits, there is a somewhat larger reduction in labor supply before age sixty-five among men who are likely to be eligible for SSI. They additionally find that this response is more pronounced among men who qualify for early Social Security benefits, which might be used to offset the reduction in labor earnings. In subsequent work, the authors confirm the finding of an anticipatory reduction in labor supply using CPS data and exploiting within-state changes in SSI supplementation levels (Powers and Neumark 2005). Powers and Neumark (2006) confirm that these findings are not driven by cross-state migration related to SSI awards. This pair of authors has also found evidence of dissaving among likely eligible individuals as they approach age sixty-five (Powers and Neumark 2003). On the issue of program spillovers, Linder and Nichols (2012) present intriguing results suggesting that enrollment in temporary assistance programs might serve as a “gateway” to more permanent reliance on assistance. Looking at a sample of workers in the SIPP, the authors find that UI claimants tend not to apply for SSI, but do apply for DI at increased rates. Workers who are more likely to receive SNAP benefits are more likely to subsequently apply for SSI benefits. The authors are careful to note that while these results 31. Using data from the Health and Retirement Survey linked to SSA administrative records, Coe and Wu (2014) confirm that a higher expected SSI benefit is associated with a higher rate of take-up among adult and elderly individuals.

The Supplemental Security Income Program

47

might imply a causal relationship between participation in temporary assistance programs and subsequent enrollment in a disability program, they could also reflect selection on health and income. Further research is needed into this issue. It is also important to note that the efficiency effects of such a causal pathway—should one exist—are unclear. If temporary programs serve in part to increase awareness of SSI among eligible individuals that are ideally admitted—to use the language of Diamond and Sheshinski (1995)— then this could be welfare enhancing. If, on the other hand, they serve to bring individuals onto SSI who would otherwise return to work at fairly low levels of disutility of work, the social welfare implications are less clear. In another study of program spillovers, Maestas, Mullen, and Strand (2014) examine what happened to SSDI and SSI applications in Massachusetts shortly following the 2006 state health insurance reform. The effect of the reform—a precursor to the 2010 federal Affordable Care Act—was to expand health insurance access to individuals through the implementation of a state- wide insurance exchange and provision of subsidies. Theoretically, the effect of this expansion on SSDI and SSI applications could have gone either way. Recall that SSI recipients immediately qualify for Medicaid when they enter the SSI program. The SSDI applicants qualify for Medicare only after a two-year waiting period. In the pre-health-reform paradigm, individuals with a work-limiting condition might have been too hesitant to separate from an employer and apply for SSDI or SSI because if their application was unsuccessful they would have given up their employer-provided health insurance and risk being uninsured. The 2006 reform would mitigate this issue of “job lock” and potentially lead to increased applications for both SSDI and SSI. However, with the expansion of affordable health insurance, the value of SSDI or SSI falls due to a reduction in the relative value of the health insurance benefits that come with program enrollment—either Medicare or Medicaid, respectively. Using administrative application data from SSA, the authors find that SSDI applications increased throughout the state postreform, consistent with state incentives to shift health insurance costs to the federal program. For SSI, applications increased in counties with high baseline health insurance coverage rates—consistent with a job lock story—and decreased in counties with low baseline insurance coverage rates—consistent with a decline in the relative value of the SSI Medicaid award. These results speak to the interaction of health insurance coverage and SSDI and SSI, and to the fiscal externalities between programs paid for by state versus federal funds. An early paper by Yelowitz (2000) similarly considered the interaction between health insurance provision and SSI caseloads, focusing on elderly individuals. That work considers the introduction of the Qualified Medicare Beneficiary (QMB) program during the 1987 to 1992 period; the program provides supplemental health insurance to Medicare seniors without requiring SSI enrollment. Consistent with the idea that part of the benefit of SSI

48

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

enrollment is the Medicaid award, Yelowitz (2000) finds that the introduction of QMB led to a decline in SSI participation rates. Evaluations of Demonstration Programs Designed to Increase Work among SSI Beneficiaries Since the early 1980s, there have been a number of government-run, large-scale demonstrations designed to evaluate the work incentives inherent in SSI and SSDI and to determine how to promote employment and self-sufficiency among current beneficiaries.32 In 1985, the Social Security Administration introduced the Transitional Employment and Training Demonstration, the first large-scale intervention focused on SSI recipients. In thirteen communities, working-age adult beneficiaries with intellectual disabilities were randomly assigned to treatment and control groups, where the treatment group received job placement, training, and prevention services. After six years, those who received intervention were 21 percent more likely to be employed than the control group, although on average earnings in the treatment group did not increase enough to offset SSI and SSDI benefits. Other interventions in the 1990s, including Project NetWork and the State Partnership Initiative, provided a combination of case management, benefit counseling, benefit waivers, and employment assistance. These interventions all increased employment in the treatment group by a few percentage points, but again, not by enough to offset benefits (Wittenburg, Mann, and Thompkins 2013). Following these small demonstrations in the 1980s and 1990s, SSA launched the Ticket to Work (TTW) program early in the twenty-first century. Over three phases, this experimental program provided SSDI and SSI beneficiaries with vouchers that they could exchange for employment support and rehabilitation services. Though the intervention was found to result in an increased use of employment services, research has not found any subsequent increases in beneficiaries’ employment or earnings. Two possible reasons for this lack of an impact could be the limited number of employment service providers, and the fact that the intervention was not targeted to specific subpopulations among SSDI and SSI beneficiaries. This is an area ripe for additional program experimentation and evaluation. A recent randomized demonstration experiment sheds some light on the effectiveness of interventions designed to promote work and education among youth SSI beneficiaries, with the goal being to reduce the youth disability caseload. The SSA launched the Youth Transition Demonstration (YTD) project in 2003. In six sites across the country, SSI and SSDI beneficiaries ages fourteen to twenty-five were randomly assigned to treatment and control groups, where treatment groups received education and 32. For a detailed description of the most relevant interventions, see Wittenburg et al. (2013).

The Supplemental Security Income Program

49

employment services, as well as a reduced benefit offset schedule in order to encourage more work activity. The intensity of service provision varied across the six sites in the demonstration. The results of the demonstration, evaluated by Mathematica Policy Research, suggest that effects varied by intensity of service provision. In the most successful site, youth employment nearly doubled from 23 percent to 42 percent, while there was no increase in employment in sites with less intensive service provision (Wittenburg, Mann, and Thompkins 2013). However, due to relatively small increases in earnings, the increased employment among participating youth did not reduce disability benefits. In addition to employment outcomes, researchers find some evidence that YTD reduced criminal activity among beneficiaries in locations with more comprehensive services, and locations with more intense services focused on employment (Fraker et al. 2014).33 1.5

Conclusion

The SSI program provides cash assistance and health insurance to some of the nation’s most vulnerable elderly, blind, and disabled residents. In December 2014 the program paid benefits to 8.5 million US residents. Beyond the direct effects of the program on the recipient population, the program also has effects on the economic incentives and income security of beneficiaries’ spouses, parents, and children. Additionally, the program affects incentives for potential future SSI applicants. In this chapter, we have briefly summarized the history of the SSI program since it was created forty years ago, including important changes in the program’s medical eligibility criteria. We have presented descriptive evidence on caseload composition and caseload trends, showing that the overall caseload has shifted toward younger recipients and nonphysical disability diagnoses. Our discussion of conceptual issues and relevant evidence focused on four key issues. First, we described conceptual questions related to the advantages and disadvantages of categorical eligibility requirements and we showed that the SSI caseload has become increasingly comprised of difficult-to-verify conditions, namely pain and mental disabilities. Second, we described the issues related to systematic disincentives to accumulate earnings and assets inherent in the SSI program design, as in most meanstested transfer programs. Notably, there are far fewer examinations of studies of employment and earnings incentives of the SSI program as compared to the SSDI program because the SSI population tends to have close to no work experience. The more relevant set of questions for the SSI population 33. In 2014, the SSA, the Departments of Education, Labor, and Health and Human Services began PROMISE, a new demonstration designed to promote education and employment among SSI youth and their families. See http://www.ssa.gov/disabilityresearch/promise.htm for more information.

50

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

are related to the full disability requirement for eligibility and whether there would be ways to increase the employability of those with less severe disabilities. Third, we described the questions and research about long-term benefits and costs to program participants, in terms of whether the program adequately and appropriately serves the needs of disabled individuals and their family members. And fourth, we presented information and evidence about program spillovers, both across programs and across federal and state levels of government. Throughout this chapter we have made numerous explicit references to areas where further study is warranted and open research questions remain. In addition to the open research questions, there are a number of program design questions that warrant policy consideration. One critical issue is that of a full- versus partial-disability scheme. As described above, SSI eligibility is a dichotomous status and benefits are not dependent on disability severity. This stands in contrast to the disability systems of many other countries, as well as the Veteran’s Disability Compensation Program, where benefit awards are an increasing function of disability severity. A partial system could allow for functional limitations that did not preclude the ability for productive market-based work, and thus would allow individuals to combine the receipt of benefits with earnings. A partial system would also avoid the undesirable program “cliff” where eligibility immediately goes to zero and all benefits are lost to the recipient if sufficient recovery is observed. Another policy design issue that should be considered is the justification for two separate federal disability programs: SSI and SSDI. In the case of adults, the disability determination uses a similar set of criteria, but eligibility for SSI is additionally based on income and eligibility for SSDI is additionally based on work history. They also have different waiting periods: zero months for SSI (and Medicaid) and five months for SSDI (twentyfour months for Medicare). In addition, the financing schemes are separate, with federal SSI payments financed by general revenue and SSDI payments financed by payroll taxes and the Social Security trust fund. Is this efficient from an operational standpoint, or would administrative costs and complications be substantially improved by the streamlining that would come from one federal disability program? Supplemental Security Income is an important part of the US safety net, but particular features of the program and the way it operates in practice raise questions and concerns about whether there is a more effective way to provide income support for individuals with work-limiting disabilities and families with disabled children. We have attempted to systematically present these issues here for scholars and policymakers to consider and explore.

The Supplemental Security Income Program

51

Appendix Table 1A.1

Percent of population on SSI, by state and age (2013)

State

Percent under 18 on SSI

Percent 18–64 on SSI

AL AK AZ AR CA CO CT DE DC FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN

2.58 0.69 1.30 4.26 1.29 0.80 1.09 1.80 4.14 2.66 1.85 0.56 1.34 1.38 1.59 1.16 1.34 2.82 3.30 1.56 1.40 1.70 1.85 1.08 3.19 1.68 1.17 0.91 1.42 0.91 1.28 1.85 2.07 1.93 0.69 1.89 1.96 1.24 2.74 2.12 1.92 1.26 1.70

3.98 1.75 1.77 3.77 2.64 1.42 1.74 1.84 4.17 2.28 2.53 1.73 2.18 2.14 2.18 1.89 1.86 4.74 3.94 3.31 1.96 2.76 3.10 1.79 4.28 2.66 2.11 1.69 1.57 1.75 1.76 2.96 2.96 2.43 1.35 3.03 2.80 2.29 3.00 3.16 2.60 1.86 3.16

Percent 65+ on SSI 4.29 5.66 3.15 3.55 13.05 2.88 2.82 2.01 6.47 5.41 4.90 4.43 2.01 3.84 1.61 1.55 1.80 5.49 5.68 2.51 3.61 5.17 3.08 2.91 6.23 2.29 2.02 1.73 3.80 1.11 4.72 6.23 9.19 3.51 1.60 2.59 2.84 3.11 3.27 4.60 3.42 2.60 3.73 (continued)

Table 1A.1

(continued)

State

Percent under 18 on SSI

Percent 18–64 on SSI

TX UT VT VA WA WV WI WY

2.14 0.64 1.35 1.29 1.16 2.18 1.70 0.80

2.20 1.25 2.83 1.85 2.28 5.03 2.17 1.40

Percent 65+ on SSI 6.60 1.98 2.75 3.45 3.98 3.97 2.13 1.24

Sources: The SSI participation counts are from “SSI Recipients by State and County, 2013” (SSA publication no. 13-11976). Population totals are from the US Census Bureau.

Fig. 1A.1

Adult SSI benefit with and without unearned income, 2015

Fig. 1A.2

Adult SSI benefit based on applicant versus spouse income, 2015

Fig. 1A.3 Child SSI benefit based on parental earnings, with and without unearned income, 2015

54

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

Fig. 1A.4

Child SSI benefit for different family types, 2015

Notes: Calculations in figures 1A.1–1A.4 based on formulas for benefit schedules outlined in SSA (2014d) and correspondence with representatives from the Social Security Administration.

References Aizer, Anna. 2008. “Peer Effects and Human Capital Accumulation: The Externalities of ADD.” NBER Working Paper no. 14354, Cambridge, MA. Aizer, Anna, Nora Gordon, and Melissa Kearney. 2013. “Exploring the Growth of the Child SSI Caseload in the Context of the Broader Policy and Demographic Landscape.” NBER Disability Research Center Paper no. NB 13-02, Cambridge, MA. Akerlof, George A. 1978. “The Economics of ‘Tagging’ as Applied to the Optimal Income Tax, Welfare Programs, and Manpower Planning.” American Economic Review 68 (1): 8–19. Area Health Resources Files. 2013. US Department of Health and Human Services, Health Resources and Services Administration, Bureau of Health Professions, Rockville, MD. Autor, David, and Mark Duggan. 2003. “The Rise in the Disability Rolls and the Decline in Unemployment.” Quarterly Journal of Economics 118 (1): 157–206. ———. 2006. “The Growth in the Social Security Disability Rolls: A Fiscal Crisis Unfolding.” Journal of Economic Perspectives 20 (3): 71–96. Ben-Shalom, Yonatan, and David C. Stapleton. 2015. “Long Term Work Activity

The Supplemental Security Income Program

55

and Use of Employment Supports among New Supplemental Security Income Recipients.” Social Security Administration Bulletin 75 (1): 73–95. Berkowitz, Edward D., and Larry DeWitt. 2013. The Other Welfare: Supplemental Security Income and US Social Policy. Ithaca, NY: Cornell University Press. Bertrand, Marianne, and Jessica Pan. 2013. “The Trouble with Boys: Social Influences and the Gender Gap in Disruptive Behavior.” American Economic Journal: Applied Economics 5 (1): 32–64. Bound, John, Richard V. Burkhauser, and Austin Nichols. 2003. “Tracking the Household Income of SSDI and SSI Applicants.” In Research in Labor Economics, vol. 22, edited by Sol W. Polachek, 113–59. Stanford, CT: JAI Press. Buescher, Ariane V. S., Zuleyha Cidav, Martin Knapp, and David S. Mandell. 2014. “Costs of Autism Spectrum Disorders in the U.S. and the U.K.” Journal of the American Medical Association, Pediatrics 168 (8): 721–28. Coe, Norma, and Matthew Rutledge. 2013. “What is the Long-Term Impact of Zebley on Adult and Child Outcomes?” Working Paper no. 2013-3, Center for Retirement Research at Boston College, Chestnut Hill, MA. Coe, Norma, and April Y. Wu. 2014. “What Impact Does Social Security Have on the Use of Public Assistance Programs among the Elderly?” Working Paper No. 2014-5, Center for Retirement Research at Boston College, Chestnut Hill, MA. Cohen, Jessica L. 2007. “Financial Incentives for Special Education Placement: The Impact of SSI Benefit Expansion on Special Education Enrollment.” Unpublished Manuscript, Massachusetts Institute of Technology, Cambridge, MA. Cullen, Julie Berry. 2003. “The Impact of Fiscal Incentives on Student Disability Rates.” Journal of Public Economics 87:1557–89. Cullen, Julie Berry, and Lucie Schmidt. 2011. “Growth in the Supplemental Security Income Program for Children: The Role of Local Jurisdictions and Fiscal Incentives.” Working Paper, University of California, San Diego, and Williams College. Dahl, Gordon B., and Lance Lochner. 2012. “The Impact of Family Income on Child Achievement: Evidence from the Earned Income Tax Credit.” American Economic Review 102 (5): 1927–56. Dahl, Gordon B., Andreas Ravndal Kostol, and Magne Mogstad. 2014. “Family Welfare Cultures.” Quarterly Journal of Economics 129 (4): 1711–52. Daly, Mary C., and Richard V. Burkhauser. 2003. “The Supplemental Security Income Program.” In Means-Tested Transfer Programs in the United States, edited by Robert A. Moffitt, 79–139. Chicago: University of Chicago Press. Deshpande, Manasi. 2014a. “Does Welfare Inhibit Success? The Long-Term Effects of Removing Low-Income Youth from Disability Insurance.” Unpublished Manuscript, Massachusetts Institute of Technology, Cambridge, MA. ———. 2014b. “The Effect of Disability Payments on Household Earnings and Income: Evidence from the Supplemental Security Income Children’s Program.” Unpublished Manuscript, Massachusetts Institute of Technology, Cambridge, MA. Diamond, Peter, and Eytan Sheshinski. 1995. “Economic Aspects of Optimal Disability Benefits.” Journal of Public Economics 57:1–23. Duggan, Mark, and Scott Imberman. 2009. “Why Are the Disability Rolls Skyrocketing? The Contribution of Population Characteristics, Economic Conditions, and Program Generosity.” In Health at Older Ages: The Causes and Consequences of Declining Disability among the Elderly, edited by David Cutler and David Wise. Chicago: University of Chicago Press. Duggan, Mark, and Melissa Kearney. 2007. “The Impact of Child SSI Enrollment on Household Outcomes.” Journal of Policy Analysis and Management 26 (4): 861–86. Evans, William N., and Craig L. Garthwaite. 2014. “Giving Mom a Break: The

56

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

Impact of Higher EITC Payments on Maternal Health.” American Economic Journal: Economic Policy 6 (2): 258–90. Fraker, Thomas, Arif Mamun, Todd Honeycutt, Allison Thompkins, and Erin Jacobs Valentine. 2014. “Final Report on the Youth Transition Demonstration Evaluation.” Mathematica Policy Research, Washington, DC. http://www.mdrc .org/publication/final-report-youth-transition-demonstration-evaluation. French, Eric, and Jae Song. 2014. “The Effect of Disability Insurance Receipt on Labor Supply.” American Economic Journal: Economic Policy 6 (2): 291–337. Garrett, Bowen, and Sherry Glied. 2000. “Does State AFDC Generosity Affect Child SSI Participation?” Journal of Policy Analysis and Management 19 (2): 275–95. Hannsgen, Greg P., and Steven H. Sandell. 1996. “Deeming Rules and the Increase in the Number of Children with Disabilities Receiving SSI: Evaluating the Effects of a Regulatory Change.” Social Security Bulletin 59 (1): 43–51. Hemmeter, Jeffrey. 2012. “Changes in Diagnostic Codes at Age 18.” Research Statistics Note no. 2012–14, Office of Research, Evaluation and Statistics, Social Security Administration, Washington, DC. Hemmeter, Jeffrey, and Elaine Gilby. 2009. “The Age 18 Redetermination and Post Redetermination Participation in SSI.” Social Security Bulletin 69 (4): 1–25. Hubbard, R. Glenn, Jonathan Skinner, and Stephen P. Zeldes. 1995. “Precautionary Saving and Social Insurance.” Journal of Political Economy 103 (2): 360–99. Hurst, Erik, and James P. Ziliak. 2006. “Do Welfare Asset Limits Affect Household Saving?” Journal of Human Resources 41 (1): 46–71. Kleven, Henrik Jacobsen, and Wojciech Kopczuk. 2011. “Transfer Program Complexity and the Take-Up of Social Benefits.” American Economic Journal: Economic Policy 3 (1): 54–90. Kubik, Jeffrey D. 1999. “Incentives for the Identification and Treatment of Children with Disabilities: The Supplemental Security Income Program.” Journal of Public Economics 73 (2): 187–215. ———. 2003. “Fiscal Federalism and Welfare Policy: The Role of States in the Growth of Child SSI.” National Tax Journal 56 (1): 61–79. Linder, Stephan, and Austin Nichols. 2012. “The Impact of Temporary Assistance Programs on Disability Rolls and Re-Employment.” Program on Retirement Policy Working Paper, Urban Institute, Washington, DC. Loprest, Pamela, and David Wittenburg. 2005. “Choices, Challenges and Options: Child SSI Recipients Preparing for the Transition to Adult Life.” Working Paper, Urban Institute, Washington, DC. Maestas, Nicole, Kathleen Mullen, and Alexander Strand. 2013. “Does Disability Insurance Receipt Discourage Work? Using Examiner Assignment to Estimate Causal Effects of SSDI Receipt.” American Economic Review 103 (5): 1797–1829. ———. 2014. “Disability Insurance and Health Insurance Reform: Evidence from Massachusetts.” American Economic Review: Papers and Proceedings 104 (5): 329–55. Meyer, Bruce D., and James X. Sullivan. 2004. “The Effects of Welfare and Tax Reform: The Material Well-Being of Single Mothers in the 1980s and 1990s.” Journal of Public Economics 88:1387–420. Neumark, David, and Elizabeth T. Powers. 2000. “Welfare for the Elderly: The Effects of SSI on Pre-Retirement Labor Supply.” Journal of Public Economics 78 (1–2): 51–80. Nichols, Albert L., and Richard J. Zeckhauser. 1982. “Targeting Transfers through Restrictions on Recipients.” American Economic Review 72 (2): 372–77. Office of the Inspector General (OIG). 2008. “Disability Claims Overall Processing Times.” Report no. A-01-08-18011, Social Security Administration, Washington, DC.

The Supplemental Security Income Program

57

Parsons, Donald. 1996. “Imperfect Tagging in Social Insurance Programs.” Journal of Public Economics 62 (1–2): 183–207. Powers, Elizabeth T. 2001. “New Estimates of the Impact of Child Disability on Maternal Employment.” American Economic Review Papers and Proceedings 91 (2): 135–39. Powers, Elizabeth T., and David Neumark. 2003. “Interaction of Public Retirement Programs in the United States.” American Economic Review Policy and Proceedings 93 (2): 261–65. ———. 2005. “The Supplemental Security Income Program and Incentives to Claim Social Security Retirement Early.” National Tax Journal 58:5–26. ———. 2006. “Supplemental Security Income, Labor Supply, and Migration.” Journal of Population Economics 19:447–80. Riley, Gerald F., and Kalman Rupp. 2012. “Longitudinal Patterns of Medicaid and Medicare Coverage among Disability Cash Benefit Awardees.” Social Security Bulletin 72 (3): 19–35. Rupp, Kalman, Paul S. Davies, Chad Newcomb, Howard Iams, Carrie Becker, Shanti Mulpuru, Stephen Ressler, Kathleen Romig, and Baylor Miller. 2006. “A Profile of Children with Disabilities Receiving SSI: Highlights from the National Survey of SSI Children and Families.” Social Security Bulletin 66 (2): 21–48. Rupp, Kalman, Jeffrey Hemmeter, and Paul S. Davies. 2015. “Longitudinal Patterns of Disability Program Participation and Mortality among Childhood SSI Award Cohorts.” Social Security Bulletin 75 (1): 35–64. Rupp, Kalman, and Steve Ressler. 2009. “Family Employment and Caregiving among Parents of Children with Disabilities on SSI.” Journal of Vocational Rehabilitation 30:153–75. Rupp, Kalman, and David Stapleton. 1995. “Determinants of the Growth in SSA’s Disability Programs: An Overview.” Social Security Bulletin 58 (4): 43–70. Rutledge, Matthew S., and April Y. Wu. 2013. “Why Do SSI and SNAP Enrollments Rise in Good Economic Times and in Bad?” Working Paper no. 2014-10, Center for Retirement Research at Boston College, Chestnut Hill, MA. Schmidt, Lucie, and Purvi Sevak. 2004. “AFDC, SSI, and Welfare Reform Aggressiveness: Caseload Reductions vs. Caseload Shifting.” Journal of Human Resources 39 (3): 792–812. Schmidt, Lucie, Lara Shore-Sheppard, and Tara Watson. 2013. “The Effect of Safety Net Programs on Food Insecurity.” NBER Working Paper no. 19558, Cambridge, MA. Social Security Administration. 2011. “State Assistance Programs for SSI Recipients, January 2011.” SSA Publication no. 13-11975, Office of Research, Evaluation and Statistics, Washington, DC. ———. 2014a. “2014 Red Book.” SSA Publication no. 64-030, Office of Research, Demonstration and Employee Support, Washington, DC. ———. 2014b. “Annual Statistical Supplement, 2013.” SSA Publication no. 13-11700, Office of Research, Evaluation and Statistics, Washington, DC. ———. 2014c. “Annual Report of the Supplemental Security Income Program.” https://www.ssa.gov/OACT/ssir/SSI14/index.html. ———. 2014d. “Program Operations Manual System.” Accessed November 22, 2014. https://secure.ssa.gov/apps10/poms.nsf/. ———. 2014e. “SSI Annual Statistical Report 2013.” SSA Publication no. 13-11827, Office of Research, Evaluation and Statistics, Washington, DC. ———. 2014f. “SSI Recipients by State and County, 2013.” SSA Publication no. 13-11976, Office of Research, Evaluation and Statistics, Washington, DC. Stabile, Mark, and Sara Allin. 2012. “The Economic Costs of Childhood Disability.” Future of Children 22 (1): 65–96.

58

Mark Duggan, Melissa S. Kearney, and Stephanie Rennane

Strand, Alexander. 2002. “Social Security Disability Programs: Assessing the Variation in Allowance Rates.” Working Paper no. 98, Office of Research, Evaluation and Statistics, Washington, DC. US Census Bureau. Various years. “Survey on Income and Program Participation 2008 Panel.” http:// www.nber.org/ data/ survey-of-income-and-program -participation-sipp-data.html. ———. 2014. “Population and Housing Unit Estimates.” http://www.census.gov /popest/index.html. US Government Accountability Office (GAO). 1994. “Rapid Rise in Children on SSI Disability Rolls Follows New Regulations.” Report no. GAO-HEHS 94-225, Washington, DC. ———. 1995. “New Functional Assessments for Children Raise Eligibility Questions.” Report no. GAO-HEHS 95-66, Washington, DC. ———. 2006. “Clearer Guidance Could Help SSA Apply the Medical Improvement Standard More Consistently.” Report no. GAO 07-08, Washington, DC. ———. 2012. “Better Oversight Management Needed for Children’s Benefits.” Report no. GAO 12-497, Washington, DC. ———. 2014. “SSA Could Take Steps to Improve Its Assessment of Continued Eligibility.” Report no. GAO 14-492T, Washington, DC. Wen, Patricia. 2010. “The Other Welfare.” The Boston Globe, December 12–14. http://www.boston.com/news/health/specials/New_Welfare/. Wiseman, Michael. 2011. “Supplemental Security Income for the Second Decade.” Poverty & Public Policy 3 (1): 1–18. Wittenburg, David. 2011. “Testimony for Hearing on Supplemental Security Income Benefits for Children.” Submitted to Committee on Ways and Means, US House of Representatives, Mathematica Policy Research, Washington, DC. Wittenburg, David, David R. Mann, and Allison Thompkins. 2013. “The Disability System and Programs to Promote Employment for People with Disabilities.” IZA Journal of Labor Policy 2 (4): 1–25. Wittenburg, David, John Tambornino, Elizabeth Brown, Gretchen Rowe, Mason DeCamillis, and Gilbert Crouse. 2015. “The Child SSI Program and the Changing Safety Net.” Office of the Assistant Secretary for Planning and Evaluation, US Department of Health and Human Services, Washington, DC. Wixon, Bernard, and Alexander Strand. 2013. “Identifying SSA’s Sequential Disability Determination Steps Using Administrative Data.” Research and Statistics Note no. 2013-01, Social Security Administration, Washington, DC. Yelowitz, Aaron. 2000. “Using the Medicare Buy-In Program to Estimate the Effect of Medicaid on SSI Participation.” Economic Inquiry 38 (3): 419–41.

2

Low-Income Housing Policy Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

2.1

Introduction

The United States federal government devotes around $40 billion each year to means- tested housing programs, plus another $6 billion or so each year in tax expenditures on the Low-Income Housing Tax Credit (LIHTC). This is well over twice the level of federal spending on either cash welfare or the Title I compensatory program in education, four times what is spent on the children’s health insurance fund (Falk 2012), and five times what is spent on Head Start.1 What exactly do we spend this money on, why, and what does it accomplish? Those are the overarching questions at the heart of our chapter. We should note these programs are just a modest share of the total subsidies government provides to subsidize housing for American households. Robert Collinson is a PhD candidate in public policy at New York University and a doctoral fellow at NYU’s Furman Center. Ingrid Gould Ellen is the Paulette Goddard Professor of Urban Policy and Planning at the Wagner School, New York University. Jens Ludwig is the McCormick Foundation Professor of Social Service Administration, Law and Public Policy at the University of Chicago and a research associate and codirector of the Economics of Crime Working Group at the National Bureau of Economic Research. This chapter was prepared for the 2014 NBER conference on means- tested transfer programs organized by Robert Moffitt. We thank the Kreisman Initiative on Housing Law and Policy at the University of Chicago Law School for financial support and Benjamin Keys, Robert Moffitt, Edgar Olsen, Barbara Sard, Alex Schwartz, our discussant Lawrence Katz, other conference participants, and two anonymous reviewers for helpful comments. Rob Collinson remained an unpaid employee of the US Department of Housing & Urban Development (HUD) during the writing of this chapter. Any errors and all opinions are ours alone and do not represent those of HUD. For acknowledgments, sources of research support, and disclosure of the authors’ material financial relationships, if any, please see http://www.nber.org /chapters/c13485.ack. 1. https:// eclkc.ohs.acf.hhs.gov/ hslc/ standards/ pdf/ PDF_PIs/ PI2013/ ACF-PI-HS- 13 – 03.pdf.

59

60

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Most of the government’s spending on housing, or roughly $195 billion of an estimated $270 billion,2 goes toward subsidizing homeowners through the tax code (e.g., the Mortgage Interest Deduction). Sinai and Gyourko (2004) estimate that total subsidies for homeownership are on the order of $600 billion.3 We do not consider these subsidies in this chapter, not because they are economically unimportant, but rather because the focus of this volume is means- tested transfer programs and most of these tax subsidies are not means tested—and indeed the vast majority of these subsidy dollars go to nonpoor households.4 Public concern about housing conditions among the poor dates back at least to the “muckraking” of Jacob Riis and the publication in 1890 of his book, How the Other Half Lives, which described living conditions in the Lower East Side tenements of New York City. However, as we note in section 2.2 of our chapter, the federal government did not get involved with low- income housing in earnest until the passage of the Housing Act of 1937. Economic stimulus played a large role in motivating the government’s initial move into housing. This rationale does not come up much in current housing policy discussions, but it is perhaps not surprising when one considers what macroeconomic conditions were at the time the Housing Act was passed. Another important motivation was the concern of advocates about the substandard quality and inadequate supply of low- income housing (e.g., see Hunt 2009, 9), and the desire to promote “slum clearance.” Given these rationales, for the first several decades the government was mostly involved in directly supplying housing in the form of federal subsidies to local public housing authorities (PHAs) for the construction of public housing developments. Over time the number of separate means- tested housing programs in the United States has proliferated, due more to political forces than to any coherent overall plan or policy motivation. Perhaps the most striking change has been the decline in the share of total low- income housing assistance provided by the US Department of Housing and Urban Development (HUD) that is delivered in the form of government built- and- operated housing. Beginning in the 1960s and 1970s, HUD shifted to rely more on subsidies both to private developers to build and operate housing developments for low- income families and to low- income households to rent in the private market (housing choice vouchers). The growth in the Low Income Housing Tax Credit (LIHTC) has reinforced this change within HUD’s 2. See Analytical Perspectives FY2014 from the Office of Management and Budget (OMB 2014). https://www.whitehouse.gov/sites/default/files/omb/budget/fy2014/assets/spec.pdf. 3. Their estimate was for the year 2000 and reported in 1999 dollars as 420 billion. 4. Our chapter also focuses on the largest means- tested housing programs, which tend to be those run by the US Department of Housing and Urban Development and the Low-Income Housing Tax Credit. As noted below, the US Department of Agriculture also runs some lowincome housing programs, but these are fairly small relative to the others.

Low-Income Housing Policy

61

program portfolio. The long- term effect of this shift is that the government now plays more of a role in just subsidizing housing for low- income families rather than also directly supplying it. Section 2.3 summarizes what is currently known about the number of people participating in different means- tested housing programs in the United States, their characteristics, and how these figures have changed over time. Compared to most of the other means- tested programs run by the US government that are considered in this volume, means- tested housing programs are quite generous on a per- participant basis. Indeed, average benefit levels per participant are high enough that even with $40 billion in annual spending, only around 23 percent of low- income renters receive assistance from any of these programs (Fischer and Sard 2013). While all of these different housing programs focus on serving low- income people, the rules governing tenant selection have cycled back and forth over time—sometimes favoring the poorest of the poor and other times prioritizing instead working- poor households or those believed to be temporarily poor. This “policy cycling” reflects a key tension in the design of low- income housing programs. On the one hand, the usual assumption of declining marginal utility of consumption motivates the desire to prioritize helping the most disadvantaged families. On the other hand, because housing programs—at least supply- side programs—essentially condition program participation on living in a certain geographic location, many policymakers wish to avoid creating housing developments with high concentrations of very poor households. Changes over time in housing policies and/or program rules reflect changes in the emphasis that policymakers place on the different aspects of this trade- off. Section 2.4 discusses the different conceptual issues related to meanstested housing programs in the United States. One set of issues has to do with the changing rationales for these programs over time. During the 1930s, when the Housing Act was passed, the desire to use means- tested housing programs as a tool for macroeconomic policy (stimulus) was much stronger than it is today. The belief that government- supported housing programs are needed to address supply- side problems and stimulate housing production has also waned over time, though there remains some debate about the value of this strategy in some of our current high- cost, growing cities. To the extent that economists today worry about the supply of private housing in the United States, they more often focus on the role that government regulations like local land use and building restrictions play in restricting supply (Glaeser and Gyourko 2002; Quigley and Raphael 2004, 2005). Perhaps the most important motivation for means- tested government housing programs today in the United States is concern about housing affordability. The quality of America’s housing stock increased dramatically over the twentieth century, but at the same time it also became more expensive both in real terms and relative to the earnings of low- income house-

62

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

holds. As a result, the focus of much current housing- policy discussion is the desire to subsidize poor households to help them meet their housing needs. This motivation raises standard questions about the trade- off between transferring resources to the poor versus reducing work effort, which we consider in section 2.4. The challenge in balancing this trade- off can be seen in some of the different design choices that have been made with different means- tested housing programs. For example, the rules of HUD programs like public housing or housing vouchers require participants to contribute 30 percent of income to rent, while the program rules for the LIHTC charges a flat rent to residents. The flat rent model has the disadvantage of making LIHTC units unaffordable to a large share of the low- income households targeted by HUD programs. On the other hand, flat rents have the advantage of avoiding the large increase in effective marginal tax rates on earnings that faces participants in HUD programs, which all else equal will reduce labor supply through standard substitution effects. The 30 percent effective marginal tax on earnings in HUD programs is actually moderate compared to the UK Housing Benefit program, which has a taper rate of more than 60 percent (Brewer, Brown, and Wenchao 2012). Of course, these work dis- incentives are most relevant for nonelderly, nondisabled adults, who at present comprise only about one- third of all participants in HUD’s means- tested housing programs. The goal of addressing problems of housing affordability also raises the question of why government should help poor families meet their housing needs by providing in-kind housing assistance rather than simply cash transfers. One obvious answer is donor preferences—that is, taxpayers prefer to support low- income housing rather than simple cash transfers. Another candidate answer is the belief that housing consumption has either “internalities” that program participants may not fully understand, such as beneficial effects on the ability of people to get and keep a job, or externalities, for example, in the form of improved health or schooling. Implicit here is the idea that in-kind housing programs generate higher levels of housing consumption than would similarly costly cash transfers, although this need not be true as a conceptual matter given the complicated budget constraints created by these programs. A different type of motivation for having in-kind housing programs instead of cash transfers is to help reduce the disparities in neighborhood conditions experienced by households of different races and incomes. Specifically, government- supported housing developments could, in principle, bring poor families to less disadvantaged neighborhoods, or actually directly improve the economic or social conditions of distressed neighborhoods. Low- income households that are given cash instead could potentially be hindered in their efforts to move to better neighborhoods by information failures and discrimination by landlords. Local politics could also adversely affect the ability of either government programs or cash transfers to help

Low-Income Housing Policy

63

poor families move into less distressed neighborhoods, by either constraining the selection of sites for government- provided housing, or by making it more difficult for private- sector developers to build low- cost housing in higher income areas and so effectively limiting private development to poor and minority areas. Another key implication of the current subsidy structure is that because families are able to receive housing subsidies for as long as they continue to meet income and other program eligibility criteria, most means- tested housing programs are implicitly addressing the problem of low permanent income rather than income variability or helping cushion families against negative income “shocks.” In principle, US housing policy could shift toward a system of providing either more modest subsidies or time- limited subsidies to a larger share of eligible people. In section 2.5 of our chapter we review the available empirical evidence in this area, which is limited in important ways, as we discuss in detail below. The evidence suggests that while housing programs do indeed increase housing consumption and quality for poor families and improve affordability relative to not receiving a subsidy, surprisingly little is currently known about the effects on these outcomes of housing programs relative to cash transfers. There is also not overwhelming evidence to date to support the idea of large externalities from housing consumption to the poor. For example, the means- tested housing programs that HUD operates seem to on net reduce labor supply and earnings for program participants. This suggests that whatever beneficial effects extra housing consumption might have on work may be outweighed by the standard income and substitution effects induced by these programs. Another sort of externality argument that has often been made is the possibility that inadequate housing or neighborhood conditions adversely affect productivity, health, well- being, and behavioral outcomes, or what Rosen (1985) refers to as the “social cost of slums.”5 While there is little evidence that housing conditions within the range that we currently see in the United States generate major externalities, there is some indication that investments in government housing programs can improve the condition and desirability of surrounding neighborhoods under some circumstances. Further, there is some evidence that families and children benefit when living in more advantaged neighborhoods. Arguably the best evidence on this question comes from HUD’s Moving to Opportunity (MTO) experiment, which offered housing vouchers to move into low- poverty areas to 5. For example, one of the initial motivations for housing programs in the 1930s was the potential effect of slum conditions on delinquency by children (Hunt 2009). And in announcing the War on Poverty in 1964, President Lyndon B. Johnson argued: “Very often a lack of jobs and money is not the cause of poverty, but the symptom. The cause may lie deeper in our failure to give fellow citizens a fair chance to develop their own capacities, in a lack of education and training, in a lack of medical care and housing” (Olsen and Ludwig 2013, 207).

64

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

some residents of high- poverty housing projects, but not others. While a large nonexperimental literature in the social sciences has reported important neighborhood effects on a wide range of outcomes, MTO indicates some important gains from living in less distressed areas as well, but on a more limited set of outcomes than what one would conclude from previous studies. Specifically MTO suggests moving into a lower- poverty area improves physical and mental health and overall well- being, but does not change educational outcomes for children or economic outcomes for adults. The only gains in earnings we see from MTO are limited to those program participants who were young children at the time their families moved. While these MTO findings suggest that low- income families would experience some important benefits if we could help them live in less racially or economically segregated neighborhoods, most large- scale, low- income housing programs do not seem to do much to change the neighborhood conditions families experience. The public housing program appears, if anything, to lead families to live in more disadvantaged neighborhoods than they would otherwise. Giving housing vouchers to families who were previously unsubsidized does not lead them to neighborhoods that are substantially different from the ones they were living in previously without a subsidy.6 It is possible that modifications to the design of the housing voucher program could induce or assist families in moving to more advantaged areas; HUD is currently experimenting with such modifications, as we discuss below.7 The final section concludes with some thoughts about the most pressing questions that might be addressed in future research in this area. 2.2

History of the Programs and Current Rules

Federal low- income housing programs can be broadly divided into three categories of programs: (a) public housing; (b) privately owned, subsidized housing; and (c) tenant- based vouchers.8 In this section we begin with a 6. Note that MTO examines the effects of giving housing vouchers to families who were living in public housing initially, rather than studying the effects of giving housing vouchers to previously unsubsidized households. The distinction is important because as noted in the text, public housing leads families to live in more distressed areas than they would otherwise. Moreover, the MTO demonstration provided many of the participating families with a special type of voucher that could only be redeemed in a low- poverty area, which is different from the large- scale voucher program that does not include such a requirement. 7. As we discuss below, the current housing voucher program sets the fair market rent (FMR)—which speaking loosely could be thought of as something like a rent “cap” for voucher- holders—at the 40th percentile of the metropolitan statistical area. Using smaller geographic areas to define the FMR essentially reduces the amount of housing unit quality poor families with vouchers need to give up for an improvement in neighborhood amenities. 8. The federal government also provides block grants to states and localities to use for a wide range of housing- related activities. The HOME program awards funds annually to jurisdictions to support rehabilitation programs for homeowners, programs to create and rehabilitate affordable rental housing, or the tenant- based rental assistance. The Community Development Block Grant program provides block grants to support community development goals, including housing rehabilitation.

Low-Income Housing Policy

65

history of means- tested housing programs in the United States, which started with public housing. Over the years, the government rhetoric surrounding housing has shifted away from publicly owned housing toward privately owned housing, and more recently from place- based subsidies toward tenant- based support. In practice the flow of dollars has changed less dramatically than has the rhetoric, largely due to the growth in the LIHTC program. But there is no question that the private- housing market has come to play a much more central role in federal- housing assistance. After describing the history of these programs, we then turn to a discussion of their key features and rules. 2.2.1

Program History

Public Housing Public housing, the federal government’s first major low- income housing program, was established by the Housing Act of 1937. Although largely funded by the federal government, public housing developments are owned and operated by housing authorities established by local governments, which have control over siting, design, and tenant selection. The original model was that the federal government would pay for construction costs (through covering debt service on bonds issued to finance development costs), but that local housing authorities would cover the operating costs through rental revenues. Over time, buildings aged, utility costs rose, and rental revenues fell far short of what was needed to cover the costs of operations and maintenance. In response, the federal government started to provide substantial subsidies for operations and improvements in the early 1970s (HUD 1974). The enactment of the public housing program was highly contested as the private real estate industry feared competition, and conservatives resisted public ownership as well as long- term subsidies (Mitchell 1985). In fact, the program may have never emerged if not for the crisis in the national economy. In the middle of the 1930s, the country was still reeling from the Great Depression, with a national unemployment rate of 25 percent. Public housing was sold partly as a way to increase construction employment and stimulate the economy. As Senator Robert Wagner, the cosponsor of the bill, poetically put it, “The whole country awaits the time when the sound of the rivet and the saw are joined more loudly in the chorus of economic recovery” (Mitchell 1985, 245). Wagner’s testimony reveals a second motivation for public housing as well: slum clearance. Wagner and many housing reformers were convinced that poor- quality housing generated social and economic externalities. As Wagner declared: “It is not necessary to prove here that millions of people in America live in homes that are injurious to their health and not conducive to their safety. . . . Nor do I need to elaborate on the fact that bad housing leaves

66

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

its permanent scars upon the minds and bodies of the young, and thus is transmitted as a social liability from generation to generation.” (Mitchell 1985, 245) Neighborhood externalities were raised as a concern as well. The US Conference of Mayors, key supporters of the bill, passed a resolution at their 1935 annual meeting stating that “the disgraceful conditions in the city slums . . . have a directly detrimental effect on the social well- being of these areas and the surrounding communities” (Mitchell 1985, 248). Notably, there is less in the official congressional debate suggesting a motivation to simply help the poor. While members of Congress did make the case that the program would increase the supply of low- rent housing for low- income households, the targeting of the program to low- income households seems to have been justified more as a way to restrict government investment to a segment of the market that private developers would not serve in order to protect private owners from competition (Meehan 1979; Schill 1993). Further, the program was set up in a way that had housing authorities screen tenants carefully, favoring those viewed to be temporarily poor (Friedman 1968; Vale 2000). After its contested enactment, the public housing program never grew to become fully popular. At a national level, it has always faced loud opposition from the real estate industry and market advocates who have questioned the efficiency of public ownership. On a local level, residents have often fiercely opposed the construction of developments within their communities, charging that they would undermine the architectural character of their community, increase crime, and reduce property values. Of course many liberal housing advocates originally supported the program, and tenants, at least initially, were quite happy with their homes and communities. Most residents found their housing units to be far superior to their previous homes (Wright 1981). Even the notorious Pruitt-Igoe and Robert Taylor Homes developments were initially popular among residents (Vale 2013). But by the late 1950s, even those sympathetic to the need for direct housing subsidies were starting to question the success of the public housing model. One issue was design. Housing officials tended to build large developments that were architecturally distinct from the surrounding neighborhoods. They felt such developments would help not only to reduce costs but also to create order and “discourage regression” to the slums that they had replaced (Wright 1981, 235). Liberal critics soon charged that this design approach had been a mistake, and that the large, standardized buildings that made up public housing developments, together with their placement on “super blocks” set apart from the regular street grid, both stigmatized tenants and isolated them from their neighbors (Bauer 1957). A second concern with the public housing program as it was implemented

Low-Income Housing Policy

67

concerned siting. While there are many good arguments for local control, it also allowed local jurisdictions, especially those located in the suburbs, to opt out of participating in the program, ensuring the concentration of public housing in central cities. Further, it permitted city governments to build developments in areas already occupied by poor, and typically minority, residents, further concentrating poverty and deepening racial segregation (Schill and Wachter 1995). The extreme case, perhaps, was Chicago, where of the thirty- three projects constructed in the 1950s and early 1960s, all but one was built in a neighborhood that was at least 85 percent black (Hirsch [1983] 1998). In part due to the lack of popularity of public housing, the pace of construction never matched the goals set out in the various housing acts. At its peak in the early 1990s, the program reached 1.4 million units. Today, the number of public housing units has fallen to 1.1 million as new public housing developments are no longer being created and many have been demolished. Most of the demolitions have occurred through the HOPE (Housing Opportunities for People Everywhere) VI program, which aimed to replace distressed public housing developments with lower- density, mixed- income developments (Schwartz 2014). Between 1993 and 2007, HOPE VI supported the demolition of more than 150,000 units of public housing, equal to 11 percent of the nation’s total public housing stock at its high point. These demolished units have been fairly geographically concentrated; 60 percent of them are located in just thirty- three cities. To the chagrin of housing advocates, the program did not include a one- for- one replacement rule (that is, a guarantee that each public housing unit that was demolished would be replaced), and only about 55 percent of the demolished units will be replaced with public housing. The other side of this argument is that many of the original units were vacant and uninhabitable at the time of demolition (Schwartz 2014). Privately Owned Subsidized Housing While there is just one public housing program, the federal government has created numerous programs to subsidize the creation of privately owned, low- income housing. The programs emerged in the 1960s and 1970s, as criticism of the public housing program and optimism about the potential of public- private partnerships to solve social problems grew. Policymakers were also motivated by a desire to create a program that would serve households with incomes too high to qualify for public housing but too low to find stable, sound housing through the private market (Hays 1995). In the typical model, the private organization would agree to provide housing with reduced rents for a specified number of years in return for a below- market interest rate loan. Initially only nonprofits were allowed to participate, but soon for- profits were invited as well.

68

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

The initial programs did not provide rent subsidies; instead, they attempted to ensure that low- and moderate- income households could occupy the developments by limiting construction costs. But the newly constructed developments were expensive, and the initial subsidies were not sufficient to write down the rents to a level affordable to low- income households. Thus, occupants tended to have moderate incomes. The rent supplement program was later developed to write down rents of low- income tenants who lived in these developments to 25 percent of adjusted income (Olsen 2003). Annual commitments grew considerably, and research showed that much of the subsidy was going to cover administrative expenses as well as tax benefits to investors (Frieden 1980). In 1974, the federal government introduced a new, and more generous, approach to subsidizing low- income housing in the private market. In addition to providing subsidies for construction or rehabilitation, the Section 8 New Construction and Substantial Rehabilitation programs provided a direct rental subsidy to tenants. Developers were also able to take advantage of accelerated depreciation allowances (allowing owners to claim deductions that are larger than actual economic depreciation, and thereby pay lower taxes, in early years of ownership). As a result of these generous subsidies, the program was expensive. Indeed, these programs turned out to be so expensive that Congress essentially terminated all of HUD’s construction programs in 1983.9 But just a few years after it ended HUD’s production programs, Congress created the Low Income Housing Tax Credit (LIHTC) as part of the Tax Reform Act of 1986, which has now become the largest subsidy for the production of rental housing in the United States.10 Unlike many other tax credits, low- income housing tax credits are limited in supply and allocated annually to states based on their population. Initially, each state was given a per capita allocation of $1.25.11 This amount increased to $1.75 in 2002 and has since been adjusted for inflation, reaching $2.25 in 2013. (The justification for these per capita allocation formulas from the perspective of economic theory is not clear.) Each annual allocation authorizes a tenyear stream of tax credits, which is estimated to reach nearly $7 billion in 2014.12 By the end of 2012, the program had supported the creation of nearly 9. Notably, Congress did not end the Section 515 program for rural housing, which was administered by the US Department of Agriculture, and provides developers with long- term, low- interest loans and rent subsidies to ensure that low- income tenants pay no more than 30 percent of their adjusted income on rent (Schwartz 2014). The structure of the program was similar to HUD subsidy programs. 10. The program replaced other tax incentives for rental housing that were not means tested. 11. While 9 percent credits are capped, 4 percent tax credits are not capped and are available for any low- income housing development financed with tax- exempt bonds. 12. According to the Joint Committee on Taxation, the LIHTC will cost $6.7 billion in foregone revenue in 2014. http://crfb.org/blogs/tax- break- down- low- income- housing- tax- credit.

Low-Income Housing Policy

69

2.5 million housing units, surpassing both the public housing program and other HUD-supported, subsidized housing.13 The LIHTC program is administered by state allocating agencies, which determine the priorities for the LIHTC program, and awards credits to developers to support the construction and rehabilitation of low- income rental housing. Projects are eligible for tax credits if at least 20 percent of their tenants have incomes below 50 percent of the area median income (AMI) or at least 40 percent have incomes below 60 percent of AMI. (Since many readers may find the poverty rate a more intuitive benchmark than AMI, it may be helpful to note that in 2014, for a family of four, the annual poverty level is 39 percent of the average area median income.)14 In practice, the vast majority of LIHTC projects contain only low- income units, or units affordable to households earning under 60 percent of area median income or lower, with 95 percent of units in tax credit projects qualified as low- income units. (While the credit sets a minimum share of units within developments that are deemed affordable, the amount of tax credits available for a project increases with the share of units that is affordable.) Projects must meet these requirements for a minimum of thirty years to qualify for the ten- year stream of tax credits.15 Each state agency is required to issue a qualified allocation plan (QAP) that outlines the selection criteria it will use when awarding tax credits. Some criteria are required by the federal government, such as setting aside at least 10 percent of credits for nonprofit developers and using the minimum amount of tax credit financing feasible. But states are also allowed to adopt additional priorities, such as providing set- asides for developments in rural areas, or awarding bonus points for locating developments in geographic areas within the state with greatest need (based on low vacancy rates, and/or high rents). As the competition for credits has increased, these criteria may play a greater role in the final distribution of tax credit projects.16 Many LIHTC developments also receive other sources of funding to cover construction costs, such as low- interest loans from state and local governments and rental- assistance payments for very low- income tenants. A recent analysis of ten states found that half of LIHTC tenants were also receiving some form of government rental assistance as well, either project or tenant based (O’Regan and Horn 2013).

13. http://www.huduser.org/portal/datasets/lihtc.html. 14. http://www.ocpp.org/poverty/2014-median- income/. 15. The original requirement was fifteen years. 16. The LIHTC projects that are financed through tax- exempt bonds can automatically qualify for LIHTC credits of 4 percent. While these credits must meet all LIHTC restrictions, they are not allocated through a competitive process and do not count toward the state yearly per capita cap.

70

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Housing Vouchers Partly motivated by the high costs of construction programs, Congress created the Section 8 Existing Housing Program in 1974 (now the Housing Choice Voucher program), which awarded vouchers to low- income households to rent apartments on the private market. While slightly different variants of the program have evolved over the years, the basic structure has remained the same. Tenants generally pay 30 percent of their income toward rent, while the federal government covers the difference between this payment and the rent up to a specified maximum payment standard (see below). To qualify for the voucher program housing units must meet certain quality and size standards, and participation by landlords is voluntary, though thirteen states and several localities have now passed source of income discrimination laws that prohibit landlords from discriminating against voucher holders.17 (Owners of LIHTC housing are also prohibited from discriminating against voucher holders.) The voucher program is now HUD’s largest housing subsidy program for low- income households. 2.2.2

Program Rules

In this section we discuss the rules for the most important housing programs described above. Rather than discuss the rules program by program, we contrast how the different program rules operate with respect to income eligibility and rent requirements. Income Eligibility While all of these programs were designed to provide rental housing for low- income households, income eligibility rules vary across programs and have varied over time within programs. These fluctuations reflect changing attitudes over time about how to balance the desire to serve the most disadvantaged families with other program objectives such as generating sufficient rental income to support operating and maintenance costs, trying to avoid dis- incentivizing work, and avoiding creating large concentrations of very low- income families. From the start, the public housing program was designed to target lowincome families, but the expectation that rents would largely cover operating costs gave local housing authorities an incentive not to target the very lowest income households. The Housing Act of 1937 simply stated that public housing tenants could earn no more than five times the rent they paid for their homes. Many public housing authorities appear to have used the leeway they had to screen tenants to choose working- poor families (Schwartz 2014; Vale 2000). 17. For a list of these states and localities and a description of the laws, see Poverty and Race Research Action Council (2005).

Low-Income Housing Policy

71

Over time, perhaps due in part to the aging of the stock or the availability of subsidized homeownership, the median income of public housing tenants fell from 57 percent of the national median in 1950 to just 29 percent in 1970 (Schwartz 2014). (Since most readers of this volume will probably find the poverty rate to be a more intuitive metric than share of area median income, we note that 29 percent of the national median in 1970 amounted to just $2,460, or about 80 percent of the poverty threshold for a family of three.) In 1974, due to concern about the concentration of poverty in public housing developments, Congress required PHAs to establish tenant selection criteria that would allow for “families with a broad range of incomes” and “avoid concentrations of low- income and deprived families with serious social problems” (Housing and Community Development Act 1974). Seven years later, Congress completely changed course and adopted stringent targeting requirements, mandating that 90 percent of occupants in existing public housing buildings and 95 percent in newly constructed buildings have incomes below 50 percent of the area’s median (Schill 1993). Further, Congress introduced requirements that housing authorities give preferences to households that were involuntary displaced, living in substandard housing or shelters, or paying more than 50 percent of their income on rent. The combination of these rules meant that virtually all households entering public housing now had incomes at the very low end of the local distribution. By 1990, the median income of public housing residents fell to less than 20 percent of the national median (Schwartz 2014), and the proportion of public housing tenants with incomes below 10 percent of the area median had risen to 20 percent (Spence 1993). In 1998, the pendulum swung back again, at least partly, and Congress sought to limit concentrations of poor households living in public housing. The Quality Housing and Work Responsibility Act of 1998 mandated that 40 percent of households admitted into public housing have incomes below 30 percent of the area median and let the threshold fall to 30 percent in some developments in high- poverty areas. Table 2.1 shows that public housing tenants can technically earn up to 80 percent of the area median income. But in practice, due to preferences and also demand, most tenants fall far below this limit. In 2013, 76 percent of public housing tenants earned incomes below 30 percent of their local area median income (HUD 2013). Similarly, tenants in Section 8 New Construction and Substantial Rehabilitation developments can also technically earn up to 80 percent of the area median income (AMI), again few do so in practice, as shown in section 2.3. The official income limits for the LIHTC program are lower than those for public housing and project- based Section 8; LIHTC tenants can technically earn only 60 percent of the area median income upon initially occupancy. In practice, however, LIHTC tenants turn out to have higher average incomes because the program does not provide rental assistance to tenants, and proj-

72

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Table 2.1

Current program income eligibility and rent rules Income limit upon occupancy

Ongoing income requirements

Tenant rent

Public housing

80% of AMI, but 40% must earn < 30% AMI

PHAs have discretion to evict tenants if their incomes rise above eligibility limit.

30% of a tenant’s adjusted income. Families can choose a flat rent, based on comparable market rent.

Section 8 New Construction and Substantial Rehabilitation

80% of AMI

Tenants can stay as incomes rise, but rents will rise accordingly.

30% of a tenant’s adjusted income.

Low-Income Housing Tax Credit program

60% of AMI

Tenants can stay as incomes rise, but next available unit must be filled by income-eligible household.

Flat rent.

Housing Choice Voucher program

50% of AMI,a but 75% must earn < 30% AMI

Tenants lose voucher if 30% of their adjusted income exceeds the payment standard for six months.

30% of a tenant’s adjusted income if unit rents below payment standard; any amount if rents are above payment standard.

Program

a Families with incomes up to 80 percent of the area median income are eligible for vouchers if they have been displaced from subsidized units by public housing demolition or expiring project-based Section 8 developments.

ects are typically underwritten to rents affordable to households earning 60 percent of AMI. So unless tenants have other rent subsidies, like housing vouchers, they typically earn incomes at about this level. Although vouchers were initially aimed at households earning up to 80 percent of AMI, over time the program has been targeted to lower- income households. Today, other than in a few special cases, tenants can earn no more than 50 percent of AMI, and the 1998 Quality Housing and Work Responsibility Act mandated that 75 percent of new voucher households must earn less than 30 percent of AMI. Despite this deeper targeting, section 2.3 shows that incomes of voucher holders appear to be about the same as those of tenants in public housing and project- based Section 8 housing. Rent Requirements As for rules about rents, rents in the public housing program were not initially tied to tenant income—they were flat rents set at levels that would enable local authorities to cover their operating costs. But costs grew faster than tenant incomes, and Congress responded by passing a series of amendments (the Brooke amendments) between 1969 and 1971 that set rents at 25 percent of a tenant’s income to protect tenants (HUD 1974). This per-

Low-Income Housing Policy

73

centage was raised to 30 percent in the early 1980s (Olsen 2003). Housing authorities typically recertify tenants’ incomes every year. The 1998 Housing Act required that housing authorities give families the option of paying a flat rent based on local market rents, though relatively few accept the offer. As of 2005, an estimated 10 percent of public housing tenants were paying a flat rent or a ceiling rent (a capped rent amount on the income- based rent) (Finkel and Lam 2008).18 Also, tenants generally pay 30 percent of their income both in the voucher program and the Section 8 New Construction/Substantial Rehabilitation program, just as in the public housing program. In the voucher program, the federal government covers the difference between the tenant payment and the rent, up to a specified maximum payment standard.19 In the first year of the program, tenants must pay no more than 40 percent of their income toward rent; after initial lease-up, families can pay more than 40 percent for units with rents above the payment standard. Housing authorities recertify the income of voucher holders every year, though housing agencies participating in the Moving to Work (MTW) demonstration program, which exempts them from many of HUD’s standard rules, are permitted to recertify less frequently. Housing authorities can set payment standards between 90 and 110 percent of the fair market rent (FMR) in the metropolitan area (or nonmetro county), which is defined as either the 40th or 50th percentile of rents, depending on the cost of housing.20 The Department of Housing and Urban Development uses metropolitan areas to define the local market, as they are believed to capture the full set of housing options available to a household in that area. The drawback of using such large areas to define FMRs is that units that rent below the 40th percentile within a metropolitan area tend to be concentrated in the lowest- income neighborhoods within that area. So, HUD is currently experimenting with letting a few housing authorities set fair market rents at the ZIP Code level, with the aim of providing voucher holders with access to a broader range of neighborhoods.21 In contrast to the HUD programs, tax credit rents are flat and not tied to a tenant’s income. The flat rents can be no higher than the rent that would be affordable to a household earning the maximum income allowed 18. The New York City Housing Authority accounted for about one- third of all flat- rent units nationwide in 2005 (Finkel and Lam 2008). 19. The Department of Housing and Urban Development has rent reasonableness rules that prohibit PHAs from paying the FMR in neighborhoods where the market rents are less than the FMR, but it is unclear how well these rules are enforced. 20. Public housing authorities can apply to HUD for “exception payment standards” above or below this range. 21. At this time, six housing authorities are operating “Small Area Fair Market Rents.” The Housing Authority of Cook County (IL), Chattanooga Housing Authority (TN), the City of Long Beach Housing Authority (CA), Laredo Housing Authority (TX), and the Town of Mamaroneck Housing Authority (NY) have voluntarily joined the demonstration. Dallas Housing Authority (TX) continues to operate with ZIP Code FMRs resulting from a lawsuit.

74

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

for the low- income units in a tax credit development (typically 60 percent of the area median income).22 Developers may charge lower than the maximum allowable rents, but rents charged for the unit are the same regardless of the income of the household who actually lives there. As a result, tenants in tax credit developments can pay considerably more than 30 percent of their income toward rent. Technically, there is no cap on rent burdens, though many owners impose minimum income requirements on applicants to ensure reasonable burdens. Further, LIHTC households are allowed to stay in developments even if their incomes rise, suggesting burdens may be lower than 30 percent for some households. In a study of LIHTC developments in eighteen states, O’Regan and Horn (2013) find that a majority of tenants were rent burdened according to standard definitions: 41 percent of LIHTC tenants paid between 30 and 50 percent of their incomes for rent and 16 percent paid over half of their incomes for rent. The fact that the average incomes of LIHTC residents are higher than those for participants in HUD-sponsored programs highlights a tension in the design of means- tested housing programs: setting tenant rent contributions to equal 30 percent of income has the downside of greatly increasing the effective marginal tax rates on earnings facing households. On the other hand, setting flat rents combined with the need for projects to reach revenue targets to be economically viable means that the flat rent in practice can wind up pricing out many low- income households. 2.3

Program Statistics

This section considers data on low- income housing assistance over time and the characteristics of families receiving federal low- income housing subsidies. We then review statistics on aggregate housing assistance receipt relative to eligibility, and then describe trends in federal spending on lowincome housing subsidies. 2.3.1

Trends in Low-Income Housing Assistance

The largest federal low- income housing assistance programs serve approximately six million households today. Figure 2.1 displays the number of units or households by program over time. For over thirty years governmentmanaged public housing was the only major form of federal low- income housing assistance. The mid- 1970s gave rise to privately owned and managed properties as an important source of federal low- income housing assistance with the development of the Section 8 tenant- based and new construction programs. By the early 1990s, the public housing, tenant- based, and project- based Section 8 programs were roughly equal in size, with each 22. Rents are considered affordable to a household if they amount to no more than 30 percent of a household’s pretax income.

Low-Income Housing Policy

Fig. 2.1

75

Assisted housing units and households, 1940–2012 (thousands)

Source: Table 2.2.

serving about 1.4 million households. The LIHTC program was introduced as part of the tax reform act of 1986 and has grown rapidly since. During the past twenty years, the public housing stock has shrunk by about 300,000 units, with about 150,000 of the most distressed public housing units demolished through the HOPE VI program. While the projectbased Section 8 and public housing stock have declined by nearly 600,000 units in the last twenty years, this has been more than offset by an additional 2.2 million households served through the Housing Choice Voucher program and the LIHTC (table 2.2).23 Today, privately owned and operated properties house roughly three- quarters of assisted households. 2.3.2

Characteristics of Households Served

Table 2.3 provides a picture of households assisted by HUD rental assistance programs. The approximately 4.5 million households served through the public housing, housing choice vouchers, project- based Section 8, and smaller programs include roughly ten million persons. In this section, we focus on HUD-assisted households because of the absence of data on households served by the LIHTC. Until very recently, no federal agencies were responsible for collecting this data. Currently, HUD is beginning to collect this information, but clean nationally representative data are not yet available. 23. Of course, the number of households in the United States has grown by some 24 million over the last twenty years.

Table 2.2

Number of units eligible for assistance and assisted households

Year

Public housing units

1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987

23,783 58,459 89,250 101,951 141,569 144,095 144,095 144,803 145,785 146,549 145,703 156,084 204,815 259,116 304,383 343,907 365,896 374,172 401,467 425,481 465,481 482,714 511,047 539,841 577,347 608,554 639,631 687,336 767,723 830,454 892,651 989,419 1,047,000 1,109,000 1,151,000 1,172,000 1,174,000 1,173,000 1,178,000 1,192,000 1,204,000 1,224,000 1,313,816 1,340,575 1,355,152 1,379,679 1,390,098

LIHTC placed in servicea

16,091

Tenant-based Section 8

162,085 297,256 427,331 521,329 599,122 650,817 690,643 728,406 748,543 797,383 892,863 956,181

Project-based Section 8

111,181 155,879 263,583 377,112 554,189 668,110 836,040 1,021,498 1,161,269 1,212,923 1,250,476 1,283,322 (continued)

Low-Income Housing Policy Table 2.2

77

(continued)

Year

Public housing units

LIHTC placed in servicea

Tenant-based Section 8

Project-based Section 8

1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

1,397,907 1,403,816 1,404,870 1,410,137 1,409,191 1,407,923 1,409,455 1,397,205 1,388,746 1,372,260 1,295,437 1,273,500 1,266,980 1,219,238 1,208,730 1,206,721 1,188,649 1,162,808 1,172,204 1,155,377 1,140,294 1,128,891 1,060,392 1,082,393 1,091,758 1,090,471

50,889 93,138 135,227 181,085 225,151 282,454 345,581 427,456 514,268 595,896 681,070 790,507 889,211 990,651 1,092,739 1,212,844 1,329,343 1,445,625 1,563,639 1,672,788 1,754,222 1,820,279 1,883,129 1,944,374 1,984,137 na

1,024,689 1,089,598 1,137,244 1,166,257 1,326,250 1,391,794 1,486,533 1,413,311 1,464,588 1,460,899 1,605,898 1,681,774 1,837,428 1,966,171 1,997,733 2,051,967 2,087,344 2,056,430 2,084,917 2,110,000 2,071,195 2,091,700 2,142,668 2,183,276 2,207,724 2,193,545

1,307,773 1,330,265 1,363,218 1,381,738 1,396,227 1,420,214 1,439,426 1,498,381 1,493,574 1,482,735 1,395,037 1,386,533 1,358,797 1,343,574 1,328,532 1,319,632 1,309,427 1,306,740 1,287,529 1,286,662 1,285,331 1,279,383 1,179,298 1,179,327 1,174,914 1,171,092

Sources: Public housing, tenant-based voucher, and project-based Section 8 come from HUD’s Annual Performance Report 1999–2013. Pre-1998 numbers for HUD programs comes from Olsen (2003). The LIHTC counts are low-income units placed in service from HUD’s Low Income Housing Tax Credit Database. a Reports the number of low-income units in LIHTC projects reported to be placed in service.

The HUD-assisted households are on average quite disadvantaged with incomes of $12,000–$14,000 across HUD’s major programs. Three- quarters of households earn less than 30 percent of their area median income (AMI). Although there is no national database of LIHTC tenants, a recent analysis shows that LIHTC tenants have considerably higher incomes than households participating in HUD programs. Analyzing tenant income data for eighteen states, O’Regan and Horn (2013) report that 45 percent of LIHTC tenants earned less than 30 percent of the area median income in 2009, and about one- fifth earned over 50 percent of the local AMI. (Indeed a significant minority of LIHTC tenants earn above 60 percent of AMI because there is no requirement for households whose income grows above that income limit to move from LIHTC housing.)

5,256 10,077 94 4,553 304 12,890 2.1 0.23 0.25 0.33 0.34 0.35 0.04 0.39 0.95 0.75 0.64 0.44 0.17 0.56

Subsidized units (thousands) Subsidized people (thousands) Percent occupied Subsidized households reported (thousands) Average rent/month, inc. utilities Average household income/year Average people/household Income as % of area median Neighborhood poverty rate Percent 62+, head or spouse Percent LT62 w/disability, head or spouse Percent single parent Percent 2+ adults with children Percent with children under 18 Percent LT 50% area median income Percent LT 30% area median income Percent minority total Percent black Percent Hispanic Minority as % of neighborhood

2,386 5,360 92 2,113 346 13,138 2.4 0.22 0.22 0.22 0.36 0.43 0.05 0.48 0.96 0.76 0.67 0.48 0.15 0.57

Housing choice vouchers 1,151 2,335 94 1,071 275 13,724 2.2 0.25 0.32 0.31 0.31 0.35 0.05 0.4 0.91 0.72 0.71 0.48 0.23 0.62

Public housing 841 1,247 96 785 274 12,172 1.5 0.24 0.23 0.56 0.44 0.18 0.03 0.21 0.96 0.73 0.45 0.29 0.14 0.45

Project-based Section 8 656 946 95 493 255 11,135 1.8 0.21 0.27 0.44 0.26 0.3 0.04 0.34 0.98 0.78 0.63 0.45 0.16 0.58

Other multifamily programs 127 156 93 63 211 14,347 1.7 0.24 0.26 0.47 0.25 0.25 0.04 0.29 0.93 0.71 0.59 0.42 0.15 0.56

Section 236

22 33 89 28 153 8,899 1.6 0.18 0.31 0.23 0.44 0.24 0.02 0.26 0.99 0.88 0.62 0.34 0.28 0.59

Moderate rehab program

Source: A Picture of Subsidized Housing data (HUD 2013). Notes: This table reports summary statistics on households, persons, and units by program. Units and households are reported for the program under which units were initially constructed. This means that some many households under the “other multifamily programs” category that includes: Section 8 Loan Management, Rental Assistance Program (RAP), Rent Supplement (SUP), Property Disposition, Section 202/811 capital advance, and preservation may be receiving projectbased rental assistance (but are not double counted).

All HUD programs

Characteristics of HUD-subsidized households 2013

Variables

Table 2.3

Low-Income Housing Policy

79

Roughly 40 percent of all HUD-assisted households have children. A large share of HUD-assisted households are headed by an elderly member (33 percent) or a disabled head or spouse (33 percent). Across HUD programs, about 24 percent of HUD-assisted households have earned income; this number is slightly higher for public housing and vouchers where there are more working- age adults. The majority of HUD-assisted residents are racial or ethnic minorities. Slightly over one- third of households are nonHispanic white. There are some important differences across programs. Residents of project- based Section 8 developments tend to be whiter and older relative to participants in other HUD programs, in part because of the inclusion of rental assistance delivered to Section 202 developments, which house the low- income elderly. Housing voucher households are on average younger, more likely to include children, and larger than households served by public housing or project- based Section 8. Single- female- headed households with children make up about 40 percent of voucher households and roughly onethird of public housing households. When utilities are included, voucher households pay more on average for their housing than either public housing or project- based Section 8 residents (not adjusting for quality). Given that some justification for housing assistance is based on the presence of neighborhood externalities, the average neighborhood characteristics across programs is a relevant statistic. On average, public housing households reside in census tracts where 32 percent of residents are poor. This is considerably higher than the neighborhoods occupied by the average voucher tenant (22 percent poor) or residents of project- based Section 8 (23 percent). Of course, because the program populations are slightly different, these differences in neighborhood environments could partly reflect differences in the constraints or preferences of the participants in the different programs. Below we review the evidence that is available on the effects on neighborhood environments of changing the type of subsidy that a given family receives. 2.3.3

Overlap with Other Subsidy Programs

How do housing subsidies overlap with other transfer programs? The interaction of other federal and local subsidy programs is in general an understudied issue and one that is relevant to, among other things, understanding whether participating in multiple programs reduces the benefit amounts families receive from each individual program. The HUD rent calculations exclude certain benefits, but include others in the determination of income and thus rents. Benefits that count toward income and rent calculations include: UI, SSDI, SSI, and TANF; HUD excludes most benefits tied to medical expenses from the calculation of adjusted income used to set rents. Importantly, HUD excludes SNAP benefits, LIHEAP, earnings

80

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Fig. 2.2 Assisted households with earnings versus EITC schedule (one earner, one child) Source: Authors’ calculations, HUD PIC 2014, Tax Policy Center 2014. Note: Sample is households with positive wage earnings, one wage earner, and one dependent.

from or payments from participation in WIA programs, and EITC refunds in the income calculation.24 As noted earlier, HUD rent calculations implicitly tax earnings at a marginal rate of 30 percent. This tax rate interacts in economically meaningful ways with other subsidy program rules. Figure 2.2 plots the distribution of wage- earning, single- earner households (with one child) against the EITC schedule in 2014. Roughly 94 percent of voucher households in this group of wage- earning households and about 89 percent of public housing fall within the EITC range. Approximately 30 percent of these wage earners are in the “phase- in” region, another 30 percent are in the flat portion of the EITC schedule, and about one- third are in the “phase- out” region. In 2014, the credit rate for the phase-in region was 34 percent for single filers with one child.25 This means that for earnings gains for households with incomes in the phase-in region, the EITC more than offsets the increase in rents brought about by the HUD rules. More attention should be given to examining how federal subsidy programs overlap and connect. 2.3.4

Eligible Households and Housing Affordability

The number of households served by federal rental assistance has increased over time, in absolute terms, and households receiving federal assis24. See HUD Occupancy Handbook, chapter 5. http://portal.hud.gov/hudportal/documents /huddoc?id=DOC_35649.pdf. 25. See Tax Policy Center, Briefing Book 2014. http://taxpolicycenter.org/.

Low-Income Housing Policy

81

tance are quite disadvantaged by standard measures. Two important lingering questions are: How has the need for low- income housing assistance evolved over time? What proportion of eligible households receive federal rental assistance? The concept of “need” for housing assistance depends on the precise justification. As we will discuss in section 2.6, the motivation for present- day housing programs tends to focus on the issue of affordability—expressed as the share of income spent on rent. Of course, affordability measured in this way can decrease in response to falling income, even as housing costs remain constant. By most measures, housing costs have increased in real terms in the postwar era. The relevant question is whether rental prices have outpaced income growth. Figure 2.3 plots changes in real rental prices against changes in 25th percentile household incomes since 1980 (base year, 1983). Over this period, rents largely tracked income until the Great Recession. Since 2007, incomes have plummeted while rents have rebounded after a brief dip. Table 2.4 also suggests that low- income households spend more today on rent than they did fifty years ago. The median renter household in 1960 spent about 18 percent of their income on rent; today, they spend 29 percent. Renters who were in the bottom fifth of the income distribution devoted about 47 percent of their income to rent in 1960, compared to 63 percent today. This trend is at least in part due to stagnant real incomes for renters over this period, but their housing expenses seem to have risen in real (inflation- adjusted) terms as well. Given the large improvements in

Fig. 2.3

Real rent versus real income growth (base = 1983) (1980–2013)

Source: Authors’ calculation. Data: CPI City Average Urban Consumers, CPS PUMS.

82

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Table 2.4

Rent burdens of renter households by income quintile and year 1960

1970

1980

1990

2000

2012

A. Rent as a percentage of household income All renters Income quintile First Second Third Fourth Fifth Poor renters

0.19

0.2

0.25

0.26

0.26

0.29

0.47 0.23 0.17 0.14 0.1 0.44

0.51 0.23 0.16 0.13 0.1 0.57

0.53 0.24 0.2 0.15 0.11 0.63

0.53 0.28 0.21 0.16 0.12 0.63

0.55 0.29 0.2 0.15 0.11 0.64

0.63 0.33 0.23 0.18 0.13 0.67

B. Percentage of renters devoting more than 30 percent of income in rent All renters Income quintile First Second Third Fourth Fifth Poor renters

0.23

0.26

0.34

0.37

0.4

0.49

0.62 0.21 0.04 0.01 0 0.55

0.67 0.23 0.04 0.01 0 0.64

0.69 0.37 0.09 0.02 0 0.68

0.72 0.42 0.14 0.05 0 0.71

0.79 0.44 0.12 0.03 0.02 0.77

0.83 0.59 0.24 0.08 0.02 0.82

Source: Data from IPUMS Decennial PUMS extracts 1960–2000; ACS PUMS 2012.

housing conditions over this time, it remains somewhat unclear how much increases in real rents are due to improvements in housing quality. As noted above, one very striking feature of federal low- income housing assistance is the small share of eligible households that actually receive a subsidy. As discussed in section 2.2, most federal low- income housing programs allow households to remain eligible earning up to 80 percent of area median income, but target households with lower incomes. The 2011 AHS shows nearly 19 million renters with incomes below 50 percent of area median income, with 4.6 million of them reporting receiving some kind of rental assistance. These figures suggest that slightly fewer than one in four eligible households currently receive a housing subsidy. To be fair, these numbers may not count many low- income households in tax credit developments who do not also receive a rent subsidy, but these households receive a much smaller effective subsidy. In just about all parts of the country housing assistance is oversubscribed, and local housing authorities have developed a number of different systems to prioritize households on the program waitlists. According to a 2012 survey of about 80 percent of housing authorities, covering about 85 percent of assisted housing, there are more than 4.9 million households on wait- lists for housing vouchers and 1.6 million households on public housing waiting lists.

Low-Income Housing Policy

83

These numbers may slightly overstate the number of unique households on a waitlists, because households could conceivably be on multiple waiting lists. Low- income housing assistance is the only major federal welfare program rationed in this way, a point to which we return below. 2.3.5

Budget

Table 2.5 shows direct federal spending on housing programs since 1980 in nominal and real (2013) dollars. Expenditures on housing assistance to low- income families rose substantially in real terms through the late 1980s and early 1990s, were stable through much of the first decade of the twentyfirst century, spiked in 2010 and 2011 partly due to investments from the federal stimulus package (ARRA), and then declined recently as a result of cuts triggered by budget sequestration. These outlay figures do not capture the opportunity costs of using land for low- income housing, or any spillover costs (positive or negative) of the programs. Most federal subsidies for low- income housing can be found in HUD’s budget, though the United States Department of Agriculture (USDA) operates a few smaller rural housing programs, including project- based rental assistance to 270,000 households through the Section 521 program at a cost of roughly $1 billion annually.26 Importantly, table 2.5 excludes tax expenditures, including subsidies that come through the Low-Income Housing Tax Credit. Table 2.6 provides estimates of the subsidies for LIHTC for the past ten years. The LIHTC cost over $6 billion in 2013. 2.4

Review of Issues Surrounding the Programs

In this section we discuss several different conceptual issues that are raised in the design and operation of means- tested housing programs. First, we discuss the justification for these programs. The growing emphasis on affordability as a motivation for the existence of such programs raises the obvious question for economists about why government relies on in-kind housing programs instead of just providing cash transfers. Many of the rationales for in-kind housing programs rest on the assumption that such programs will lead to greater consumption of housing than would a cash transfer of equal cost to the government, but as we note below this need not be the case. We also discuss the potential “internalities” as well as externalities associated with housing consumption, which are frequently cited as key justifications for housing programs. We pay particular attention to potential effects on labor supply, and how housing programs balance the general tension that arise with all poverty programs between supporting poor households and trying to not discourage work. The third issue we discuss is how different housing programs affect the neighborhoods that participants live in, as well 26. The USDA also operates a mortgage subsidy program, Section 515 (< $1B/year).

Table 2.5

Federal outlays for housing assistance, 1980–2013 (thousands) Year

Outlays

Outlays (2013 dollars)a

1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

5,480 6,861 8,064 9,449 10,048 11,402 11,441 11,278 12,727 13,979 15,481 16,958 18,776 21,397 23,804 27,438 26,660 27,693 28,686 27,645 28,788 30,067 33,046 35,306 36,574 37,710 38,002 39,436 40,245 41,405 46,628 47,743 43,801 42,376

13,179 15,091 16,702 18,831 19,337 21,262 20,912 20,102 21,918 23,171 24,744 26,233 28,399 31,611 34,434 38,879 37,100 37,889 38,826 36,853 37,523 38,315 41,475 43,448 43,804 43,756 42,779 43,242 43,278 44,191 49,167 49,324 44,453 42,376

Source: Data from the OMB historical tables, table 8.7: outlays for discretionary programs. http://www.whitehouse.gov/omb/budget/Historicals. Notes: Housing assistance includes the following major programs: Tenant-Based Rental Assistance, Project-Based Rental Assistance, Public Housing Operating Fund, Public Housing Capital Fund, Home Investment Partnership Program, Homeless Assistance Grants, RHS Rental Assistance Program, Housing for the Elderly, Native American Housing Block Grant, Housing Certificate Fund, Housing Opportunities for Persons with AIDS, Housing for Persons with Disabilities, Revitalization of Severely Distressed Public Housing (HOPE VI), SelfHelp Homeownership Opportunity Program, Rural Housing Assistance Grants, and Choice Neighborhoods. a Real values adjusted using BEA annual GDP Implicit Price Deflator.

Low-Income Housing Policy Table 2.6

85

Estimated budgetary cost of low-income housing tax credit (billions), 2004–2013 Year

Cost

Cost ($2013)a

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

4.3 4.7 4.8 5.1 5.2 8.3 5.1 5.4 6 6.4

5.1 5.5 5.4 5.6 5.6 8.9 5.4 5.6 6.1 6.4

Source: Joint Committee on Taxation, Estimates of Federal Tax Expenditures. Real values adjusted using BEA annual GDP Implicit Price Deflator.

a

as how they shape neighborhood environments for others. The final issue addressed here is the logic of the current system’s approach of providing very generous subsidies to just a small subset of income- eligible households. 2.4.1

Justifications for Low-Income Housing Subsidies

The justifications for government involvement in housing markets have shifted over the years, and they varied even at their beginning. The development of the nation’s public housing program was motivated in part to stimulate the economy after the Depression, but also partly to address perceived failures on the supply side of the private- housing market. Early housing reformers maintained that private enterprise only constructed adequate quality homes for households near the top of the income distribution, and that government involvement was critical to rescue low- income households from dangerous slums, which they believed bred social ills (Von Hoffman 1996). This view seems to ignore the important role of “filtering” in supplying housing to low- income households, where a given housing unit becomes less expensive over time as its condition declines. Whatever the initial arguments were in the 1930s around supply problems in the private- housing market, the supply of quality housing has clearly changed dramatically since then. Census measures of substandard housing, such as units lacking complete plumbing facilities, have declined dramatically from a little less than half of all housing units (45.3 percent) in 1940 to a tiny share of all housing (2 percent) in 2012. Overcrowding has shown similarly large decreases since 1940 from nearly one- in-five households with more than one person per room to one- in-twenty households today. The share of units without a septic or sewer connection has similarly declined. The incidence of housing problems has fallen substantially in nearly every

86

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

measurable way. Mirroring these declines in problems has been an improvement in the availability of housing amenities. Less than half of all housing units in 1973 had some form of air conditioning (central air or room units). In 2011, more than 85 percent of housing had either central air or room- unit air conditioning.27 As housing conditions have improved, affordability has worsened. The median renter household in 1960 was paying approximately 18 percent of his/her total family income in rent; today, the equivalent figure is 29 percent. As with health care, in the area of housing American households are now paying more but getting more as well. Today’s motivation for low- income housing subsidies is much more about affordability and less about quality, though current housing programs still impose some quality restrictions.28 Implicit in the argument that households are paying too much for housing is that they lack income to cover consumption of other goods. But if the aim of housing subsidies is to reduce underconsumption of nonhousing goods, why provide in-kind housing subsidies rather than cash transfers? There are a few arguments to justify such in-kind support, many of them generic to the whole issue of in-kind programs rather than specific to housing per se. The first potential justification is donor preferences. In- kind transfers typically are only preferred on paternalistic grounds. Currie and Gahvari (2008) provides a thorough review of the theory justifying in-kind transfers (see also Aaron 1972). Probably the most realistic model of this paternalism for low- income housing policy is to allow for an interdependence of preferences among donors (taxpayers) and recipients whereby donors derive some utility from seeing recipients consume a particular good, in this case housing. Another related possibility is that donors have preferences to restrict the consumption choice set of low- income households to rule out consumption of particular goods such alcohol, cigarettes, or luxury goods. Whether it is a preference for consumption of housing or against consumption of other goods, in-kind housing transfers do indeed seem to have more political support than cash transfers. For example, a 2003 survey found that just 39 percent of Americans support cash payments to the poor when there is no barrier to employment, while 89 percent of Americans support low- income housing assistance (Lennen et al. 2003). On the other hand, 90 percent of Americans do not necessarily support the specific low- income housing programs administered by the federal gov27. A different type of argument noted by Aaron (1972, 18) is the possibility that the private supply of housing responds to changes in housing demand with a lag that policymakers view as “too long.” 28. The US Department of Housing and Urban Development biannually tracks a measure called “worst- case housing needs,” which are unsubsidized low- income households paying more than 50 percent of their income toward rent or occupying “severely inadequate” housing as measured by the American Housing Survey. In 2011, there were 8.45 million of these households, of which just 3 percent resided in substandard housing (HUD 2013).

Low-Income Housing Policy

87

ernment. This may or may not be related to the federal government’s decision to continue to administer a bewildering variety of housing programs, even though most of them are no longer producing new units. Forty years ago the authors of Housing in the Seventies, the 1974 HUD report summarizing the findings of the National Housing Policy Review, identified twenty different subsidized housing programs and called the nation’s housing laws “a hodge podge of accumulated authorizations,” which “contain internal inconsistencies, numerous duplications, cross- purposes, and overlaps as well as outright conflicts and gimmickry” (HUD 1974, 22). The authors of the report attribute the proliferation to the multiplicity of goals that housing programs are designed to achieve, ranging from stimulating the economy to removing slums, assisting the poor, and furthering economic and racial integration. It may also be that the fragility of the political support for low- income housing contributes to the proliferation, too. It is far from certain that voters would support a simple, broad- based housing entitlement program. It is perhaps no coincidence that the Low-Income Housing Tax Credit Program is a tax expenditure program that does not require annual appropriations. A second argument for having means- tested housing programs is that providing in-kind subsidies rather than cash payments reduces the risk of fraud as households will be less motivated to try to secure housing rather than cash (Nichols and Zeckhauser 1981). This argument is also generic to the whole issue of in-kind subsidies, not housing specific. A third argument is that housing is a “merit good”—that Americans believe that all residents of the United States, as stated in the Housing Act of 1949, deserve a “decent home and suitable living environment.”29 However if housing is a normal good, providing low- income households with more cash would increase their housing consumption. This raises a critical question for low- income housing programs: Do housing subsidies increase housing consumption more than an equivalently sized cash transfer? The federal low- income housing programs produce complicated budget sets for participating households, with varying rules determining income deductions and exemptions. We limit our focus here to general cases corresponding to the larger unit- based assistance programs: public housing and the Low-Income Housing Tax Credit (LIHTC), and tenant- based rental assistance (Housing Vouchers). For a more detailed theoretical analysis of federal low- income housing programs, see Olsen (2003, 2008). First, we consider the simple unit- based assistance case. For the public housing program, a participating household is simply offered a fixed quality of housing QPH at a rent equal to 30 percent of their income (Y) after adjustments. Ignoring the distortions of the transfer on labor supply, the static budget set is represented in figure 2.4. The vertical axis is the quantity of all other goods 29. Housing Act of 1949, 42 U.S.C §§ 1441– 1490.

88

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Fig. 2.4

Budget set under public housing

Notes: The figure above illustrates a static view of public housing assistance on the budget set of participants, ignoring labor supply effect. Public housing amounts to an offer of a fixed quality of housing; accepting this offer could result in lower housing consumption with more of other goods consumed, more of both housing and other goods, or only more housing. For example, a household could be optimizing at point L in absence of the public housing offer— and thus would consume more housing but less of other goods if they were to accept the public housing unit due to their increased rent contribution. A household selecting point M would consume more housing if they were to accept the public housing unit and more of other goods, while a household consuming at point H would consume less housing and more of other goods were they to accept the public housing unit.

consumed (Qx), and the horizontal axis is a scalar index of housing quality consumed (QH). As noted by Olsen (2003), economic theory does not yield an unambiguous prediction as to whether a household offered public housing will increase its housing consumption. Public housing admission produces a take- it- orleave- it offer of a fixed quality of housing. In practice, public housing probably offers a better housing alternative for families applying for assistance; however, it may be that some households would consume better housing in the absence of a subsidy, and so enrollment in public housing would mainly lead them to increase consumption of other (nonhousing) goods. Like public housing, the LIHTC program provides a take- it- or- leave- it offer of a fixed quality of housing. Unlike public housing, which lets program participants increase consumption of other goods by fixing the tenant’s rent contribution at 30 percent of income (less than what most unsubsidized households pay toward rent), the LIHTC program may have less of an impact on the consumption of other goods (absent additional subsidies) because rents are not tied to a given tenant’s income. Instead, rents are

Low-Income Housing Policy

89

set to 30 percent of 60 percent of the area median income (roughly twice the poverty line in most parts of the country). Thus, it is possible that the LIHTC program does not offer substantially lower rents in many markets than low- income households would pay in absence of the program, but does deliver higher quality for a given rent level than most housing alternatives for low- income households. In high- cost markets, households in LIHTC units are likely spending less on rent than their unsubsidized counterparts and getting higher- quality units. Housing vouchers are more flexible than project- based subsidies. Recipients get a capped rent subsidy to lease any decent unit on the private market—conditional on the owner accepting the voucher. Recipients pay 30 percent of their income, and the voucher pays the difference in rent, up to a locally defined rent ceiling. The structure of the voucher subsidy produces a complex budget set for voucher households (illustrated in figure 2.5). The rules of the voucher program are such that it does not guarantee increased

Fig. 2.5

Budget set under housing vouchers

Notes: The figure above illustrates the budget set created by receipt of a housing voucher. Without a voucher, a low-income household faces budget constraint AZ. The provision of voucher changes the budget set. A voucher recipient must pay at least 30 percent of their adjusted income (Y) toward rent, and must reside in a unit surpassing a minimum quality threshold (Qmin). The voucher pays up to a maximum subsidy ceiling (S), this is often the fair market rent (FMR), but local PHAs have discretion to set the subsidy ceiling from 90–110% of FMR. In the first year of the program, the budget constraint is ABCDZ. First-year tenants can rent a unit that is more expensive than the maximum subsidy (S), but cannot pay more than 40 percent of their adjusted income on rent, thus the highest quality of housing they can occupy is (S + 0.1Y)/PH. The presence of this maximum expenditure ceiling means that theoretically the voucher could lead to reduced housing consumption in the first year of the program. After the first year of the program, tenants can pay more than 40 percent of their adjusted income toward rent, so they face budget constraint ABCDE and should increase housing consumption.

90

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

housing consumption. Families receiving a voucher can occupy the same unit as they resided in prior to voucher receipt, so long as that unit meets the voucher program’s quality standards. While “leasing in place” essentially leaves money on the table for the many poor households living in units with monthly rents below (perhaps far below) what is effectively the voucher’s maximum allowable rent (the FMR), around 20 percent of newly issued voucher holders use the voucher to lease their previous unit (Finkel and Buron 2001). One reason households do this is because they have a limited amount of time to find housing (typically around ninety days) when first issued a voucher, after which the voucher offer is rescinded by the local housing authority so that some other household can use it. Presumably some of these families who lease in place will “reoptimize” with future moves. If housing is a normal good, as we would expect for most households, the design of the program should eventually increase housing consumption. It remains unclear whether the current system of housing subsidies increases housing consumption more than equivalently sized cash transfers. Unfortunately, to our knowledge, no experiment has been run to answer this particular question. Hanushek (1986) reviews results on housing expenditures from the Negative Income Tax Experiment and the Housing Allowance Demand Experiment and concludes that they are quite similar. If simple cash transfers produce similar impacts on housing consumption, it is difficult to argue that in-kind housing subsidies are efficient. This is an important unresolved question that should be a priority for future research. A final justification for in-kind housing programs is if housing consumption generates externalities. We turn to this issue next. 2.4.2

Externalities from Housing Consumption

The Progressive era reformers argued that slums caused disease and social pathologies, and that these maladies could spread to the larger population (Von Hoffman 2012). This early motivation, which focused on externalities resulting from poor- quality housing and neighborhood conditions, identified poor housing as a cause rather than a simple consequence of poverty and social problems. If poor housing conditions imposed some external costs to society, then efficiency gains could be realized by increasing housing consumption to the socially optimal level. While little rigorous research has explored how inadequate housing might generate negative externalities in practice, there are some obvious candidate channels such as physical housing quality, overcrowding, and residential mobility (Leventhal and Newman 2010). Housing quality could influence outcomes—particularly health—through the presence of toxins or hazards such as lead paint or asbestos (Fisk, Lei-Gomez, and Mendell 2007). If information about the presence of these hazards is asymmetric or imperfect, then landlords may not know about or choose to report the presence of potential toxins, and households would not fully internalize the cost of locating in housing units that contain these hazardous materials.

Low-Income Housing Policy

91

As for crowding, the limited personal space in crowded apartments could facilitate the transmission of disease (Goux and Maurin 2008), create stress and induce physiological distress (Evans 2003). Overcrowding could in principle also hinder children’s academic performance or other schooling outcomes by restricting opportunities for concentrated study (Goux and Maurin 2008). Housing subsidies could potentially also help to increase residential stability. Theoretically, involuntary moves could have damaging collateral effects on individuals. These moves could induce acute stress on parents from events such as eviction, which could in turn affect parenting and children’s outcomes (Desmond 2012). Residential moves might also force children to change schools, which could potentially lower achievement for moving students as well as students at receiving schools (Hanushek, Kain, and Rivkin 2004). Involuntary moves could disrupt social networks, which could be important to adult labor market attachment. If housing instability carries large social costs and results principally from an inability to meet rental payments, then housing subsidies could be efficient if they reduce these external costs substantially. Whether these externalities exist and are economically meaningful is an empirical matter, which we discuss in section 2.5. Of particular interest to economists has been the question of whether and how housing consumption and means- tested housing programs affect productivity and labor supply. Simple static models suggest housing assistance should reduce labor supply through income and substitution effects. The majority of current federal housing subsidies require that 30 percent of recipient income is devoted to rent (the LIHTC being one notable exception in charging a flat rent instead). This means that the size of the subsidy declines linearly with income, or that income is effectively “taxed” at a rate of 30 percent. In a simple static labor supply model, the impact of housing assistance can be understood as a basic income and substitution effect, which is illustrated in figure 2.6. In the absence of a subsidy, the household faces budget constraint AZ, and optimizes at U, where the wage rate equals the marginal rate of substitution between consumption and leisure. The housing subsidy modifies the budget constraint to ABC. Due to the program rules, the subsidy lowers the relative price of leisure and induces a substitution effects equal to SE in figure 2.6. The additional income from the subsidy has an income effect denoted IE as the household shifts to U’. In this simple model, the housing subsidy reduces labor supply and the effective value of the subsidy shrinks from G to S. This static model is useful, but in reality labor supply decisions play out over multiple periods and households wait months or years to receive a subsidy, which complicates the picture considerably. For example a family that is on the waiting list for a program may realize that their effective marginal tax rate may be much higher in the future and so decide to shift work effort toward current periods when effective rates are lower. Some suggestive

92

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Fig. 2.6

Federal housing assistance income and substitution effects

Notes: The figure illustrates the impact of housing assistance on labor supply in a simple oneperiod model. In the absence of a subsidy, the household faces budget constraint AZ, and optimizes at U, where the wage rate equals the marginal rate of substitution between consumption (QC ) and leisure (QL ). The subsidy expands the budget to ABC but changes the relative price of leisure by requiring recipients to contribute 30 percent of income as rent, producing the substitution effect (SE). The additional income from the subsidy has an income effect (IE) as the household shifts to U’. The housing subsidy reduces labor supply because the effective subsidy value decreases with work effort, shrinking here from G to S.

evidence that at least some people do indeed respond this way is presented in Jacob and Ludwig (2012). Other complications to the standard static model raise the possibility that housing programs could actually increase labor supply in a more persistent way. For example, housing subsidies could increase labor supply if they cause households to increase housing consumption and additional work is needed to maintain a given level of nonhousing consumption. The residential stability created by housing assistance could also increase labor supply in the long run by stabilizing families’ housing circumstances and allowing them to invest more time and attention to job search and training. What the available data tell us about the net effect of housing programs on labor supply is discussed in detail below. 2.4.3

Neighborhood Access and Neighborhood Externalities

One possible contributor to the proliferation of low- income housing programs is the close connection between these programs and the policy goals of reducing racial/ethnic segregation, and more generally of promoting access to “better neighborhoods.” The link between housing and neighborhood conditions was referenced in the landmark Housing Act of 1949,

Low-Income Housing Policy

93

which established as a national goal a decent home and “a suitable living environment for every American.” Economists have long understood that the rent price of a housing unit is directly linked to the neighborhood conditions around it; that is, surrounding neighborhood conditions are an amenity that is capitalized into the price of an apartment or house. If neighborhoods affect life chances, and housing subsidies change the neighborhood conditions of the poor, then housing subsidies could be justified on the grounds that they improve outcomes for low- income families through their effect on neighborhoods. Of course, policymakers may care about the level of racial or income segregation of American neighborhoods for its own sake, even if neighborhood conditions do not change people’s behavior or long- term outcomes. The design of low- income housing programs may have a large bearing on the degree to which they deliver “better neighborhoods.” One critical distinction is between place- based subsidies, like public housing and the LIHTC program, versus tenant- based subsidies like housing vouchers. If place- based subsidies such as public housing change where subsidized households would otherwise live in the absence of a subsidy, then such programs affect the neighborhood conditions experienced by subsidized families by changing both the site they live in themselves and the composition of the other tenants around them. For public housing, the immediate tenants in the same building or development are likely to be very disadvantaged—and may be more disadvantaged compared to who their neighbors would have been had they not been admitted to public housing. This problem was compounded historically by the political decisions made by local housing authorities to locate public housing developments in some of the most racially and economically isolated areas of their cities, as noted in section 2.2. Housing programs could also change the neighborhood environments that low- income families experience by directly changing the neighborhoods, as opposed to changing what neighborhoods poor families live in. The effect of place- based subsidies on the conditions of surrounding neighborhoods is more theoretically ambiguous. If high- income households view subsidized housing as a disamenity, then they may choose to avoid living near them, which could result in lower- quality public services. (Some suggestive evidence for the idea that affluent families view living near low- income households as a disamenity comes from hedonic regressions that tend to show that neighborhood poverty rate or share low income is associated with reduced home prices, holding unit characteristics constant (e.g., see Dubin 1988; Bayer, Ferreira, and McMillan 2007). Low- income housing programs could also reduce property values if subsidized housing is perceived as introducing crime or disorder to a neighborhood. On the other hand, housing programs could improve neighborhood conditions and property values if they help remove disamenities such as blighted

94

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

structures and vacant lots. A second mechanism through which such programs could change property values is by increasing the total population living in an area, which can support more commercial activity and potentially promote safety (Ellen et al. 2002). A large literature has empirically investigated the relationship between investments in subsidized housing and neighboring property values, which we consider in the next section. Concern about the possible effects of project- based subsidies in geographically concentrating poor families has led to growing policy interest in either public housing or LIHTC developments that are designed as “mixed- income” developments (Joseph 2013). These are projects in which local authorities include “market- rate” units in redeveloped or new public or LIHTC housing. Casual empiricism about where people choose to live in the US context seems to suggest that most nonpoor households, all else equal, would prefer to live among other nonpoor households. If this is indeed true, that would mean that mixed- income developments might need to offer nonpoor households some sort of subsidy to choose units in these developments instead of somewhere else. So “market rate” may be something of a misnomer. A different potential consequence associated with mixed- income buildings is that overall construction costs for low- income units may be higher in a building that is mixed income (because of higher land costs and amenities needed to attract market- rate tenants) than it would be for a building that would house entirely low- income households. Tenant- based subsidies such as housing vouchers have long been thought to be a better policy mechanism to improve the neighborhood quality of poor households, since individual voucher holders can choose where to live rather than face a take- it- or- leave- it offer of a public housing unit in a development that houses larger numbers of other poor families (Olsen 2008). However, landlord discrimination against voucher holders—while outlawed in some states—is routinely found in audit studies (Lawyers Committee for Better Housing 2002). Further, finding rental units in better neighborhoods may entail substantial search costs for tenants who are issued a voucher, who may have limited transportation, child care, or information to access less disadvantaged neighborhoods (Rosen 2014). As noted above, what is essentially the housing voucher maximum rent (the FMR) is set at the metropolitanarea level; given that rental units in better neighborhoods tend to be more expensive (holding all else equal), voucher families will either be required to accept some reduction in unit quality in exchange for living in more affluent neighborhoods or in some cases may even be priced out of more affluent neighborhoods altogether. Household preferences could also attenuate the degree to which housingvoucher receipt translates into changes in neighborhood conditions. For example, social ties also likely play a role in potentially limiting the neighborhoods considered by voucher recipients (Desmond 2012). If people choose to locate near family and friends, and disadvantaged individuals tend to have

Low-Income Housing Policy

95

disadvantaged social networks located in higher- poverty neighborhoods, then this may restrict where voucher families look for housing. Section 2.5 reviews the empirical evidence on this question. This discussion highlights two open empirical questions that are critical for low- income housing policy. The first is whether compared to in-kind housing programs, providing low- income families with cash transfers would lead to larger or smaller changes in the housing and neighborhood environments in which they live. The second has to do with the ongoing debate in the housing policy community about project- based versus tenant- based subsidies in terms of the relative risks of “government failure” versus “market failure.” In practice these relative risks may vary according to local conditions, such as the tightness of the local housing market, and could also vary according to what type of place- based program is being considered as an alternative to tenant- based programs.30 Understanding more about the relative performance of project- based versus tenant- based programs under different situations (and for different types of program participants) would be very useful in informing future policy debates. 2.4.4

Concentrating versus Dispersing Subsidy Resources

As noted above, the current system of means- tested housing programs is unusual among current US social programs in its narrow distribution of resources: fewer than one- quarter of income- eligible households receive benefits from HUD programs, but those who do can receive subsidies worth roughly $8,000 per year,31 and which in more expensive cities can be worth $12,000 per year or even more (e.g., see Jacob and Ludwig 2012). In principle one could imagine making our housing programs more like our other social programs, and increasing the share of income- eligible people who benefit by reducing the per- participant subsidy value. Here we consider the conceptual trade- offs that would be associated with such a change in policy. For starters, it is worth considering the degree to which path dependence in housing policies has contributed to the current distribution of resources. As noted in section 2.2, when housing programs began in earnest in the 1930s, these subsidies were delivered by just one program—public housing. In many ways the government essentially backed into large per- participant subsidies by building developments that were of higher quality than most of the slum buildings from which families were originally drawn (which set the cost per housing unit), then setting subsidy amounts with an eye toward keeping units affordable to low- income households. The only way to reduce the subsidy amount would have been to either reduce the quality of the hous30. For example in the LIHTC program, developers rather than government officials propose the location of projects, so the risk of government failure in the location of those developments might be lower than with traditional public housing. 31. The average monthly HUD subsidy was $647 in 2013 (HUD Congressional Justification FY2015).

96

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

ing, which would be problematic, or increase the rent contributions required by residents, which would undermine the goal of helping the poorest families afford better units. Over time the shift toward other subsidy programs has, in principle, created more options for how the $40 billion per year in housing programs are allocated across households. In particular, the housing- voucher program rules could be set in any number of ways that would reduce the subsidy amount per household. One reason this has not happened is the opposition of housing advocates, who seek to grow the budgets of housing programs over time to a level where all income- eligible households can receive the large subsidy amounts that current housing- program participants receive. Since fewer than one- quarter of eligible households currently participate in such programs, this would require something like a quadrupling of the annual budget outlays for such programs. In the meantime, what are the trade- offs of the current approach versus spreading available resources out across more households? Some advocates argue that any level of housing subsidy below some threshold is insufficient to help poor families. One version of this argument is that in many housing markets, landlords would resist renting to poor households with only shallow subsidies, viewing them as too risky. A different version of the argument is that high housing costs combined with local housing codes and HUD quality standards sharply limit the pool of decent, lower- rent units. As a result, shallower subsidies would still leave poor families paying unsustainable shares of their rents or living in poor housing and so would not improve their well- being. It is worth noting that this assumption stands in contrast to the usual assumption within economics of diminishing marginal utility of consumption. It also stands in contrast to findings elsewhere in the social policy literature that there are diminishing marginal benefits from additional household resources on other outcomes that policy cares about, such as children’s life outcomes (Løken, Mogstad, and Wiswall 2012). In principle, a different way that the government could distribute subsidies more widely would be to provide time- limited subsidies, which could potentially allow a large share of all low- income families to receive subsidies at some point in their lifetimes. At present families can keep their subsidies indefinitely, so long as their incomes remain low enough and they do not violate any of the program’s behavioral rules (such as prohibition on drug offenses). This pattern departs from the original conception of the public housing program, which was intended to serve households who were temporarily suffering as a result of the Depression (Vale 2013). The idea was not to provide long- term, permanent subsidies to the poor. But as public housing and vouchers have shifted to serve the very poorest households, many tenants have stayed for longer tenures than the original framers of the programs envisioned. The change over time toward conceptualizing public housing as providing

Low-Income Housing Policy

97

longer- term subsidies should, in retrospect, not be surprising. The original design of the program failed to take into account how difficult it is to take assistance away from households who are receiving it. In the political arena, the plight of subsidized tenants who have lost their assistance is far more salient and visible than the plight of low- income households who have never received assistance at all. Moving to more time- limited subsidies might make the program less effective from the perspective of addressing problems related to low levels of permanent income (for one thing, landlords might be far less willing to accept poor tenants with a time- limited subsidy), but it would make the program more effective in helping address the problem of income volatility (e.g., see O’Flaherty 2011). Part of the problem is that when families experience negative income shocks, they can reduce their spending on expenses like food, clothing, or transportation, but housing is an expensive durable good that is not easily divisible and so spending on housing may be hard to adjust. Most renters sign annual lease agreements, which stipulate monthly payments of a fixed amount. Failure to meet agreed upon rental payments may lead to eviction, which in turn can lead to a spell of homelessness that in turn carries significant social cost (Desmond 2012). The only way to substantially reduce spending on housing is to move. The current structure of housing assistance programs makes them ill suited to address the problem of income volatility.32 Vouchers are not well designed to mitigate contemporaneous risk of homelessness because they are such a scarce subsidy—rationed to a small number of households from waiting lists numbering in the tens of thousands in many large cities (O’Flaherty and Ellen 2010). Reliable estimates of average waiting list times are difficult to find, but it is not uncommon for housing voucher waiting lists to exceed two years. Anecdotal evidence also points to similarly lengthy admission waiting lists for public housing and other project- based programs such as the LIHTC. This means that households experiencing sudden income loss rarely receive timely federal housing assistance that might prevent eviction. One way that liquidity- constrained households might try to adjust housing expenses in response to sudden loss of income is to sublet a portion of their housing—that is, to rent out a room. But for households that are already overcrowded this might be a difficult proposition. And the circumstances of sudden poverty might itself tax poor families of the bandwidth required to find sublets (Mani et al. 2013). Moreover, well- intended rules for federal assistance programs that restrict adjustments to household composi32. Some localities creatively use different federal funding sources to provide short- term rental assistance to help families stay in their homes. New York City’s Human Resource Administration operates a “one- shot” emergency assistance program—funded with TANF and local dollars—that is available for individuals or families who are facing eviction, escaping domestic violence, facing utility disconnection, and other extreme circumstances.

98

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

tion may make it more difficult for some low- income households to weather economic shocks by subletting or having other relatives or significant others cohabit. O’Flaherty and Ellen (2007) examine the programmatic rules and regulations that govern a variety of transfer programs serving low- income household (SNAP, SSI, and cash welfare) and find that many of these programs are structured with strong disincentives for multiple- adult households. They also note that subsidized housing is among the most restrictive programs by establishing minimum unit sizes for occupancy (as measured by persons per room). This means that many low- income households are unable to optimize household size the way that more affluent households often do in response to changing economic conditions—as evidenced by the large number of college graduates returning to live with their parents during the recent recession (Lee and Painter 2013). 2.4.5

Targeting of Housing Assistance

In 2012, housing authorities nationwide reported more than 6.5 million households on their waiting lists for housing vouchers or public housing.33 In light of this enormous demand for housing assistance, an important question is whether these scarce subsidies are targeted to the households most in need. Systematic evidence on local targeting policies is limited, but a 2012 HUD survey of housing authorities offers some insights. Housing authorities typically organize their waiting lists for housing assistance in one of three ways (or combination of them): (a) first come, first served; (b) random lottery; or (c) local subgroup preferences (such as recently homeless, domestic abuse victims, working families, high- rent burden, or overcrowding). Approximately 62 percent of PHAs covering roughly 77 percent of all housing vouchers nationwide use some type of preference for their housing voucher waiting list, sometimes in addition to a first- come, first- served rule. The numbers are similar for public housing admissions preferences. About 62 percent of PHAs with public housing use some admission preference and these PHAs account for about 81 percent of public housing units. The most commonly reported preferences are given to homeless families and individuals, households involuntary displaced (from natural disaster or government activity), victims of domestic violence, residents of the local jurisdiction, applicants living in substandard housing, households who are rent burdened, and veterans. Unfortunately, this survey did not gather data on how many households come in under each of these categories.34 33. This number may include some duplicate households if households are on multiple housing authorities’ waiting lists (HUD 2012). 34. We know relatively little about the waiting list practices for LIHTC developments. In some cases private developers or property managers may apply their own waiting list practices, but they may also coordinate with their relevant city agencies to organize their waiting list.

Low-Income Housing Policy

99

Among the different approaches for organizing and managing a waiting list, first come, first served might be the most problematic in terms of targeting the neediest households. It seems possible that familiarity with waiting list opening periods and ability to arrive at housing authority offices on a timely basis would wind up being negatively correlated with underlying need among income- eligible families. Lottery- based systems have the advantage of being fair the way most people would probably define it (horizontally equitable), but achieve no gains in targeting beyond the initial eligibility criteria. Preferences appear the most appealing of the three methods at targeting to needy households, if households cannot easily manipulate their status to the stated preferences. The more general question of how to target scarce low- income housing assistance is an important one that has received little rigorous attention. One possible framework is to ask for which populations might the possible cost benefit of housing assistance be most favorable. For example, if housing assistance is most beneficial to the homeless or young children in disadvantaged neighborhoods, perhaps housing programs should more explicitly target these groups. An obvious challenge to establishing these types of strong preferences is that it can create perverse incentives for households to change their status in order to receive housing assistance. Another dimension of targeting is how low- income housing programs respond to changes in underlying needs. Historically, the number of housing units constructed or households served in HUD’s three major rental assistance programs has not been closely connected to underlying macroeconomic conditions. The need for housing assistance can change as a result of tightening housing- market conditions or falling incomes. These two forces tend to pull in opposite directions, but not always. The Great Recession led to sustained reductions in income, but rents rose quickly after an initial downward drop. The size of HUD’s programs is a primarily a function of budgetary decisions rather than any direct link to outside economic conditions. The PHAs administering the voucher program generally have a budgetary cap, meaning that if falling tenant incomes are not sufficiently offset by falling rents, then they have to issue fewer vouchers. As for LIHTC, the credit itself is relatively fixed, but it behaves procyclically as the strength of investor interest in the credit has tended to match general economic forecasts. For those households already receiving assistance, the federal low- income housing program rules are well designed in times of economic downturn. Falling tenant incomes mean that tenants are required to pay less in rent, thus providing some buffer of support to negative economic shocks. For voucher holders, the size of the voucher depends on the FMR, which is tied to local rent data from the census, which tends to have some lag, but in theory captures some of the effects of increases in rents (or decreases).

100

2.5

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Review of Results of Research on Programs

In this section we review the available empirical research about the different means- tested housing programs described above. The available empirical research is limited, particularly high- quality empirical evidence that is capable of isolating causal relationships between housing programs and different outcomes of policy interest. For some programs there is essentially no available evidence whatsoever. In what follows we focus on high- quality evidence and, in every case, try to be clear about the strength of the empirical evidence we do have on these questions. Our emphasis on experimental studies reflects uncertainty on the nature of selection bias in the context of housing programs. It is generally presumed that take-up of housing assistance may be correlated with unobservable factors that themselves may affect most outcomes of interest. What is much less clear is the magnitude or even direction of this bias. Households that successfully take-up housing assistance may be negatively selected on the basis of difficult- to-observe targeting preferences such as domestic violence survivors or homelessness risk, or they could be positively selected on characteristics such as ability to find and lease-up private rental units or savviness in accessing government benefits. We consider what is known about the effects of means- tested housing programs on housing quality, affordability, access to different neighborhoods, residential mobility, and what one might call “indirect” outcomes of housing programs related to labor supply, health, children’s outcomes, and overall well- being. Within each outcome domain we consider what is known about the effects of different programs relative to not participating in any program at all. Where available, we also discuss research about the relative effectiveness of different programs compared to one another, given the trend over time from project- based to tenant- based subsidies. Since there is so little research right now on the LIHTC, we focus on studies of the relative effects of public housing versus vouchers. 2.5.1

Effects of Housing Programs on Housing Consumption (Quality and Quantity)

Means- tested housing programs for many people probably conjure up high- rise public housing projects of the sort that were built during the 1950s and 1960s, which have become synonymous with terrible living conditions and high rates of crime and racial segregation: Pruitt-Igoe in St. Louis, Jordan Downs in the Watts neighborhood of Los Angeles, Magnolia Projects in New Orleans, and of course the Robert Taylor Homes and CabriniGreen in Chicago (Olsen and Ludwig 2013). But the focus on the very difficult living conditions in these projects overlooks the fact that most public housing projects are not so distressed, and the worst of them have been torn down in recent years through HUD’s HOPE VI program.

Low-Income Housing Policy

101

Perhaps even more importantly, it is easy to forget exactly how bad the slum conditions were that for many families were the alternative to living in public housing, particularly as the developments were initially built. For example, in 1940 fully 45 percent of all housing units in the United States lacked complete plumbing facilities (defined as having a flush toilet, sink, and hot water).35 This figure is under 1 percent today, even for housing units rented by households in the bottom quintile of the income distribution (Quigley and Raphael 2004). Crowding and many other measures of housing quality have also improved. Most of the available research suggests that public housing improved the housing conditions of low- income residents. The challenge for empirical work on this question is carrying out an appropriately apples- to-apples comparison of housing conditions, since the type of family that winds up participating in the public housing program may be systematically different from other low- income families in a variety of ways—including along dimensions that are difficult to adequately measure in social science data sets. A number of studies using data from the 1960s and 1970s show that relative to unsubsidized households, people living in public housing increased their housing consumption by between 20 and 80 percent (Olsen 2003, table 6.8). For example, the most recent of these is by Olsen and Barton (1983), who use Census Bureau survey data of the housing market in New York City in the 1960s to estimate what are essentially hedonic regressions that try to price public housing units. They estimate that families in public housing consume 10– 70 percent more housing, with a dollar value equal to 20– 25 percent of the average income for these families. Their study suggests that it cost the New York City Housing Authority $1.14 to produce each extra $1 in housing consumption for families. Whether other types of means- tested housing subsidies could provide such services more efficiently is a topic to which we will return at the end of this subsection. One of the best studies comparing the housing conditions of families in and outside of public housing is by Currie and Yelowitz (2000), whose research design takes advantage of the fact that the number of bedrooms to which a family would be entitled within a housing project depends on the gender mix of children in the family. Specifically, children of the same gender have to share a bedroom but those of opposite genders do not, so that a family with one adult and two children will be eligible for a three- bedroom apartment if the household has a boy and a girl, but just a two-bedroom if the children are of the same sex. Using child- gender composition as an instrument for the likelihood of being in public housing (families eligible for larger apartments are 24 percent more likely to live in housing projects), they find participation in public housing reduces the likelihood a family is

35. https://www.census.gov/hhes/www/housing/census/historic/plumbing.html.

102

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

overcrowded by 16 percentage points. This finding holds for both blacks and whites.36 The same general finding seems to hold for the housing- voucher program as well. What is very clear is that the market rents of the housing units in which people reside is dramatically higher for people with vouchers compared to similar people without them. For example, Jacob and Ludwig’s (2012) study of housing vouchers in Chicago is able to compare similar types of families who do and do not receive vouchers because the city randomly assigned applicants to the voucher program waiting list. That lottery study shows that vouchers enable recipients to live in units with rents that are about 50 percent higher than the rents for the units in which they would otherwise live (this change in unit rent is equal to 25– 30 percent of average income for these households). If the housing market is working at all well this should be expected to translate into improved unit quality, although some observers have noted that landlords are aware of the rent limits in the voucher program and may artificially raise the rent of a unit to meet the tenant’s new ability to pay (Mallach 2007; Collinson and Ganong 2015). Some evidence that vouchers also improve direct measures of housing conditions, not just a total measure of housing consumption like rent, comes from the randomized welfare- to-work voucher study by Mills et al. (2006) that finds voucher receipt increases housing unit quality and size.37 This same study also shows that vouchers increase by over 20 percentage points the rate at which recipients and their children live on their own rather than with other relatives, which may reduce crowding and enable people to get away from difficult or even abusive relationships. Similarly, nonexperimental evidence from Carlson et al. (2012) finds that households receiving a voucher report fewer adult members following voucher receipt than a matched control group of welfare recipients. Many voucher recipients report in qualitative interviews that they value this new independence for its own sake as well. If both public housing and housing vouchers improve housing conditions for poor families, a natural follow-up question for public policy is which program is more effective in achieving this goal. Some of the best available evidence on this question comes from HUD’s Moving to Opportunity (MTO) experiment. Between 1994 and 1998, MTO enrolled 4,600 36. The sample mean for the census overcrowding measure they use in their paper is about 1 percent for whites and 10 percent for blacks (Currie and Yelowitz 2000, table 6), but these sample means are not quite the right benchmark for judging the size of the public housing effects, since the relevant mean would be the one for the set of families who would have been in public housing had the gender mix of children in the home been different (or in the language of Angrist, Imbens, and Rubin [1996], the “compliers”). 37. For example, the share of families in the control group that live in crowded housing conditions (more than one person per bedroom) at the time of their follow-up survey is about 39 percent, while the effect of voucher use (the treatment on the treated effect) is minus 22 percentage points ( p < .05); see Mills et al. (2006, table 5.3, 139). Similarly, the share of control group families reporting two or more housing problems is 13.5 percent, and the TOT is again about one- half that (minus 7 percentage points), although not quite significant.

Low-Income Housing Policy

103

low- income families with children living in very distressed public housing projects in five US cities (Baltimore, Boston, Chicago, Los Angeles, and New York City). It is important to keep in mind that families in the MTO study were coming from some of the most distressed public housing projects in the country, such as the Robert Taylor Homes on the south side of Chicago. Families were randomly assigned to one of three groups: a low- poverty voucher (LPV) group, which received extra mobility counseling assistance and a housing voucher that could only be redeemed in a census tract with a 1990 poverty rate below 10 percent; a traditional voucher (TRV) group that received a standard housing voucher (which was, at the time, the Section 8 program); and a control group that did not receive any extra help moving out of public housing. For present purposes what is most relevant is the contrast between the TRV and control groups in MTO.38 Data from the MTO five- year (“interim”) follow-up showed that relative to the control group, moving with a housing voucher increased the share of respondents rating their housing unit as good or excellent by 7 percentage points (the treatment on the treated or “TOT” effect), compared to a control mean of 52 percent (Orr et al. 2003, exhibit 3.5, 66). By the time of the long- term follow-up, which measured outcomes ten to fifteen years after random assignment, the control mean had risen to 57 percent, although the TOT effect on this measure had declined to 5 percentage points and was no longer statistically significant (Sanbonmatsu et al. 2011, exhibit 2.5, 56). At the time of the long- term follow up, however, MTO did reduce the rate of self- reported specific housing problems related to things like vermin (TOT effect of minus 14 percentage points, compared to a control mean of 52 percent, p < .05), heating or plumbing (TOT of minus 8 percentage points, control mean 37 percent, p < .10), and peeling paint or plaster (TOT of minus 19 percentage points, control mean of 47 percent, p < .05). The question of how project- based versus tenant- based subsidies change housing quality is closely related to, but slightly different from, the question of which program costs less to deliver a given level of housing quality because the former question ignores the possibility that the programs differ in their costs per participant. The answer to the question of which program is more cost effective is not obvious as a conceptual matter. As noted above, many housing advocates are concerned that landlords overcharge voucher holders in the private market, capitalizing partly on the fact that some landlords reportedly refuse to take housing vouchers. There has also been long- standing concern about the possibility that many families are in 38. Put differently, the LPV treatment in MTO adds the constraint to the normal voucher program that families could only redeem the vouchers initially in low- poverty census tracts, and so the findings for the LPV versus control contrast (recall all MTO families were living in public housing at baseline) are not directly relevant for the larger question of the relative effects of public housing versus the regular voucher program without the additional mobility constraints or supports. For this reason we also do not emphasize discussion of the Gautreaux mobility program in Chicago (see Rubinowitz and Rosenbaum 2000).

104

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

disequilibrium in the housing market because of the large transaction costs associated with changing housing units (e.g., see Rosen 1985). On the other hand, the fact that federal subsidies to local PHAs and private builders artificially distort the relative price of initial construction and operating costs suggests the possibility of inefficiency in that program. And the fact that unit rents in public housing are far below market levels will mean there may be excess demand for such units, even if they are not well maintained. While theory is ambiguous, the empirical research is fairly consistent in suggesting that tenant- based programs are able to deliver a given level of housing- unit quality at lower cost compared to project- based programs, or at least compared to HUD-sponsored, project- based programs such as public housing. Deriving reliable empirical evidence on this question is not entirely straightforward because it requires, among other things, some attempt to estimate the market rents of project- based housing units.39 Nor is it entirely straightforward to even estimate program costs, which are particularly challenging to calculate in the context of housing projects that receive subsidies from multiple sources (free land, favorable tax treatment, or loan terms) and have large fixed costs. With these caveats in mind perhaps the best evidence on this question comes from the Pittsburgh and Phoenix sites of the Experimental Housing Allowance Program (EHAP), which was a large demonstration project carried out in the 1970s. Mayo et al. (1980) estimate that the ratio of total costs to market rent equaled from 1.8 to 2.2 for public housing, from 1.5 to 2.0 for Section 236 (the new construction and rehabilitation program in effect at the time), and from 1.09 to 1.15 for EHAP housing allowances (like housing vouchers). Put differently, the tenant- based subsidy appears to be far more cost effective in producing housing units of a given quality. Similarly, Wallace et al. (1981) estimate that the Section 8 new construction program cost 44– 78 percent more than the Section 8 tenant- based subsidy (see also Shroder and Reiger 2000; GAO 2001, 2002). It is worth noting that this study only focused on two metropolitan areas and is now more than forty years old. So it is reasonable to ask whether the same cost differences would hold in different market conditions. Unfortunately, despite the fact that it has become the largest federal placebased housing program, there is no research of which we are aware that examines the effects of the LIHTC on the housing conditions of low- income families. There is unfortunately also very little evidence on the question of whether in-kind housing programs increase consumption more than cash transfers. As noted above, this question is relevant for judging the efficiency of using in-kind housing programs to address problems of housing affordability rather than other types of transfers. The one finding we know of is 39. Most analyses ignore other potential spillover effects, like those on the surrounding communities. For an excellent discussion of these issues see Olsen (2009).

Low-Income Housing Policy

105

Hanushek’s (1986) examination of changes in housing consumption in both the Negative Income Tax experiment (a cash transfer) versus the Experimental Housing Allowance Program. While he concluded the change in housing consumption was similar, this is too important an issue on which to rely on just a single data point. 2.5.2 Effects on Housing Affordability Housing assistance could advance well- being through reducing the share of family income used to pay for housing and freeing up additional resources for other critical needs such as high- quality day care, healthy food, and preventative health care. Both public housing and housing vouchers also appear to greatly increase housing affordability, defined as the share of family income devoted toward housing. Unfortunately, the excellent paper by Currie and Yelowitz (2000) on public housing relies on census data on rental payments that respondents seem to interpret as the rental value of their unit rather than what they actually pay out of pocket, and so that study with its strong research design is not able to address the effects of public housing on affordability. But other studies like Olsen and Barton’s (1983) using New York City data finds that public housing enables families to enjoy higher levels of nonhousing consumption by 14– 18 percent compared to observably similar families outside of public housing because the rent contribution for those living in public housing is equal to 30 percent of adjusted income, which is much less than the share of total income that unsubsidized families, on average, pay. This gain in nonhousing consumption from Olsen and Barton (1983) expressed as a share of total income (or put differently, the reduction in the share of income families have to pay as rent) is equal to about 12 percent.40 The available evidence about how the voucher program affects affordability is stronger, as we now have several randomized lottery studies on the question. In the voucher study in Chicago carried out by Jacob and Ludwig (2012), the average family at baseline (without a subsidy) was paying about 58 percent of their reported income toward rent. Voucher receipt enables families to reduce their out- of-pocket spending on rent to about 27 percent of reported income.41 Similarly, in the HUD Welfare to Work (WtW) experiment the average control group family spends about $529 on rent per month (including utilities), equal to roughly one- quarter of reported 40. Olsen and Barton (1983, table 5) show that in 1968 families had average incomes of about $5,000 absent the housing program, and were able to increase consumption of nonhousing goods by about $600. 41. It is possible that some, or perhaps even many, families have unreported income (e.g., see Edin and Lein 1997). Because the same income denominator is used to calculate the share of spending on housing for families both with and without vouchers, this means the Jacob and Ludwig study should still be getting the sign of the effect of vouchers on housing affordability correct. But because the denominator will be too small under both the voucher and no- voucher conditions, the “levels” (share of income spent on housing) will be too low in both cases and the percentage point change in share of income spent on housing will be too large.

106

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

monthly income.42 Welfare receipt reduces out- of-pocket spending on rent by $211 per month, or about 40 percent (Mills et al. 2006, exhibit 5.3, 139). While the public housing and voucher programs have similar rules about required rent contributions from participants, the two programs do have some differences that could affect housing affordability for households. For example, utilities are handled differently in the two programs, with many public housing projects just including utilities in rent rather than billing families separately. In the five- year MTO follow-up, while overall housing costs were not different across randomized groups, families that received regular housing vouchers were 12 percentage points more likely than controls to report having problems paying their utilities (control mean 27 percent, p < .05) (Orr et al., 2003, exhibit 3.3, 61). A similar pattern was found in the ten- to-fifteen- year MTO follow-up, with families who used a traditional housing voucher being 8 percentage points more likely than controls (24 percent) to have received a shut- off notice of some utility due to nonpayment in the past twelve months ( p < .05) (Sanbonmatsu et al. 2011, exhibit 2.4, 55). Relatively little is currently known about the effects of the LIHTC on housing affordability. The LIHTC rent limits mean that the program tends to reach low- income households, but not the very poorest households (Desai, Dharmapala, and Singhal 2010). There is research documenting the number of units produced under the program (e.g., see Cummings and DiPasquale 1999; Desai, Dharmapala, and Singhal 2010), and research showing that the LIHTC increases the total number of rental units in an area (Baum-Snow and Marion 2009).43 All else equal we would expect an outward shift in rental housing in these areas to reduce rents, but the magnitude of this effect is, as far as we are aware, not currently known. O’Regan and Horn (2013) report that tenants in LIHTC housing who earn less than 30 percent of the area median income face lower rent burdens than unassisted renters, but far higher burdens than households with similar incomes who are participating in HUD programs. 2.5.3

Effects of Housing Programs on Residential Mobility

Conceptually, the effects of means- tested housing programs on residential mobility are ambiguous, at least in the short run. Programs could reduce residential instability by cushioning subsidized families against having to move as a result of income shocks. On the other hand, to get a public housing subsidy families need to move into public housing, and because most 42. Exhibit 4.16 of Mills et al. (2006) reports monthly TANF cash benefits during the first period after random assignment of $1,325 for the control group, while exhibit 4.10 reports quarterly earnings of $1,863, or about $621 per month. 43. Baum-Snow and Marion (2009) find more crowd- out by new LIHTC units (that is, LIHTC units displace some private- market housing that would have been built anyway) in gentrifying areas. Malpezzi and Vandell (2002) do not find a detectable effect of the LIHTC on the supply of rental housing, but their research design is not nearly as convincing as that of Baum-Snow and Marion (2009).

Low-Income Housing Policy

107

housing- voucher applicants are living in housing units with rents far below the FMR (e.g., see Jacob and Ludwig 2012), families offered vouchers will also have a strong incentive to use the voucher to move into a new unit. Lowincome renters in the United States are also a fairly mobile population in general; it could be that subsidy receipt simply changes the timing of when families made a move that would have happened anyway. Unfortunately, there is no experimental evidence that we know of about the effects of public housing on residential mobility.44 However, the evidence we have suggests that public housing tenants tend to remain in their units for longer than other renters. Lubell, Shroder, and Steffen (2003) find that the average public housing resident stays in their unit for 8.5 years and the median stays for 4.9 years. As a comparison, the median renter in the United States had lived in their home for 2.2 years in 1998 (Hansen 1998).45 Of course, at least part of this difference in average rates of residential stability between public housing residents and unsubsidized households could be due to differences in the types of families who are in public housing. Experimental evidence from the voucher program is mixed. The welfareto-work voucher experiment by Mills et al. (2006) finds that the average control- group family in their study moved roughly twice over the five- year follow-up period; voucher receipt reduced the total number of moves by about 0.9 (the TOT effect). About 53 percent of the control group had moved out of their baseline census tract in the Mills et al. study; voucher receipt increased that by 11 percentage points. In contrast, the Chicago voucher study by Jacob and Ludwig (2012) finds the average number of moves over the follow-up period was about 2.7 for the control group; the effect of moving with a voucher (TOT) was to increase the number of moves by just 0.12. Few nonexperimental studies of vouchers are able to track voucher and control families’ moves across units. An exception is Carlson et al. (2012), who find that households with vouchers are more likely to move than a matched control at one year and four years after voucher receipt with qualitatively similar magnitudes to Mills et al. (2006). What about the relative effects on residential mobility from participation in the public housing program versus housing voucher program? The tento-fifteen- year MTO follow-up found that the average control group family 44. The data used by Currie and Yelowitz (2000) does not allow them to apply their naturalexperiment (IV) research design to measures of residential mobility. However, they do present some ordinary least squares (OLS) results that show that residence in public housing is correlated with a higher rate of changing schools by children in public housing families relative to their non- public- housing counterparts. (We recognize that changing schools mixes together the effects of residential moves with other reasons why children might change schools over time). In any case, Currie and Yelowitz argue that OLS results are likely to be biased in the direction of overstating any negative effects of public housing, so as the authors note, it is not clear what to make of that correlation. 45. Note that these tenure patterns are fairly consistent over time. Mateyka and Marlay report a median stay of 2.2 years for US renters in 2008.

108

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

moved 2.2 times over the follow-up period; families that used a housing voucher to move out of public housing wound up moving an extra time over the study period (Sanbonmatsu et al. 2011, exhibit 2.2, 53). Relative to the MTO control group, voucher recipients also experienced an increased likelihood of ever having been “doubled up,” equal to 7 percentage points versus a control mean of 19 percent. 2.5.4

Effects of Housing Programs on Access to Different Neighborhoods

Historically America’s public housing program reinforced residential segregation by race and income, in part through reserving some housing developments for blacks and some for whites. Data from the 1960s showed that 72 percent of public housing projects at the time were inhabited by people of a single race (Bonastia 2006, 74). Part of the issue also stemmed from the role that local politicians played in deciding where housing developments would be built.46 Opposition to new public housing was weakest in those communities where initial housing conditions were most distressed, and so projects were disproportionately likely to be built in predominantly lowincome and minority neighborhoods (e.g., see Hunt 2009). Additional evidence comes from the EHAP Housing Assistance Demand Experiment in the 1970s (Mayo et al. 1981, chapter 5). For example, families moving into public housing in the Pittsburgh EHAP site moved from neighborhoods with poverty rates of 37 percent to neighborhoods with poverty rates of 50 percent. Black participants in the public housing program experienced changes in tract- share minority from 52 percent to 69 percent. Newman and Schnare (1997, table 3) show that in the mid- 1990s public housing units were much more concentrated in extreme- poverty areas than were the units occupied by other low- income people (defined in their study as other welfare recipients). Fully 36 percent of public housing tenants lived in census tracts with poverty rates over 40 percent versus just 12 percent of other lowincome households. In addition, 38 percent of public housing residents were in census tracts with minority shares over 80 percent compared to 18 percent of other low- income households. More recent evidence suggests that households in place- based, assisted housing live in neighborhoods with lowperforming public elementary schools (Horn, Ellen, and Schwartz 2014). The expected effects of tenant- based subsidies on the neighborhood characteristics of residents are not clear as a conceptual matter. To the extent that neighborhood disadvantage is a “dis- amenity” that is capitalized into housing prices, subsidies that enable families to rent more expensive units 46. Hunt (2009, chapter 4) provides an excellent account for the city of Chicago. More generally, in order for a public housing project to be built in a political jurisdiction, it must establish a public housing authority (PHA). Many jurisdictions chose not to create one. Furthermore, because the PHA had to obtain the local government’s cooperation, the local government had veto power over the location of the projects.

Low-Income Housing Policy

109

should expand their choice set to include more units in more advantaged neighborhoods. On the other hand, racial discrimination may constrain the choices that families have about where they can use their housing vouchers, and some families may choose to stay in poor, racially isolated areas even after receiving a generous tenant- based subsidy because of proximity to family, friends, jobs, religious organizations, and so on. However, in practice the available evidence seems to suggest that tenantbased subsidies have relatively modest impacts on what types of neighborhoods low- income families reside in, at least when we contrast subsidy recipients with unsubsidized households. For example, data from the 1970s EHAP programs showed that housing allowances similar to the current housing voucher program, as well as the Section 236 rental housing program and the Section 23 leased housing program, had very small effects on the neighborhood conditions experienced by families. Similar findings come from more recent randomized lottery studies of the current- day housing voucher program. For example, Jacob and Ludwig (2012) find that the average unsubsidized family who applied for a housing voucher in the late 1990s in Chicago was living in a tract with a poverty rate of 26 percent; those families randomly assigned good positions on the voucher program waiting list who moved with a voucher were in tracts with poverty rates that were just 1 percentage point lower (the “control mean” for share black was 78 percent, with a TOT effect also of about 1 percentage point) Similarly, the control group in the Mills et al. (2006) study of HUD’s welfare- to-work voucher experiment were in tracts with an average poverty rate of 27 percent; the TOT effect was 2 percentage points (exhibit 3.6).47 As for the quality of local schools, in a reanalysis of data from HUD’s welfare- to-work study, Ellen, Horn, and Schwartz (2014) find that the families randomly assigned vouchers reached neighborhoods with schools that had the same proficiency rates as the schools near to control group families. The strongest available evidence suggests the voucher program, as it is widely implemented, does not seem to have large effects on the average neighborhood conditions of recipients. However, the design of the voucher appears to influence tenant neighborhood choices. The MTO illustrates how restricting vouchers to leases in low- poverty census tracts for a year, coupled with some counseling assistance, affects tenant location decisions. Unsurprisingly, households that received the restricted low- poverty voucher initially moved to neighborhoods with much lower poverty than households with a conventional, unconstrained voucher. This difference persists through the ten- to-fifteen- year follow-up, though it shrinks considerably. This low- poverty restriction also had the effect of substantially 47. Carlson et al. (2012) use a propensity- score- matching design and find that housing voucher recipients are not living in significantly different neighborhoods from nonrecipients in the short term; the effect is only about one- half a percentage point in tract poverty four years postreceipt.

110

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

reducing voucher take-up rates. Shroder (2002) finds that while counseling intensity was positively related to lease-up rates, the geographic restriction still lowered lease-up rates among the experimental group by 14 percentage points. Galiani, Murphy, and Pantano (forthcoming)—who specify a structural model of neighborhood choices and use MTO data to identify its key parameters—similarly find the locational restrictions substantially lowered take-up with counseling, partially offsetting this effect. A less prescriptive approach to stimulating moves to better neighborhoods, part of a HUD demonstration known as “Small Area Fair Market Rents,” entails increasing the maximum voucher subsidy in high- rent neighborhoods and lowering the maximum subsidy in low- rent neighborhoods. Collinson and Ganong (2015) evaluate this policy change from a single metro- wide maximum subsidy to ZIP Code voucher subsidy ceilings in the Dallas metro area. They find that relative to voucher holders in neighboring Fort-Worth, Dallas movers are in neighborhoods about 0.23 SD higher on a composite measure of quality three years after the policy change with little net costs to the government. To what degree does the shift from project- based to tenant- based subsidies increase access to more advantaged neighborhoods? The MTO experiment found that, relative to control group families that did not receive help moving out of public housing, those who used a regular Section 8 housing voucher experienced declines in neighborhood poverty but more modest declines in percentage minority. Moving out of public housing with a regular Section 8 voucher reduced average census tract poverty rates one year after the voucher offer by about 45 percent (about 22 percentage points compared to the average for the MTO control group of 50 percent), and reduced the average tract poverty rate families experienced over a ten- to-fifteen- year period by about 25 percent (11 percentage points compared to a control mean of 40 percent). The effect on average tract minority share was much smaller—treatment households ended up in neighborhoods with a percentage nonwhite that was about 3 percentage points lower than the 88 percent nonwhite neighborhoods where the MTO control families resided. As for the LIHTC program, it includes a rule under which developments receive more tax credits if they are located in census tracts in which at least half of households are LIHTC eligible, which incentivizes developers to build housing in such areas (Baum-Snow and Marion 2009). However, states adopt other siting priorities as well. Nonexperimental evidence shows that LIHTC tenants on average live in neighborhoods that have nearly identical poverty rates, slightly higher minority concentrations, and higher average crime rates as those lived in by poor households as a whole (Lens, Ellen, and O’Regan 2011). But we do not know what the neighborhood conditions would have otherwise been for the type of families served by the program.

Low-Income Housing Policy

2.5.5

111

Indirect Effects of Housing Programs (Labor Supply, Health, and Child Outcomes)

As noted above, one of the rationales for providing assistance to lowincome families in the form of in-kind housing benefits instead of cash is the possibility that housing consumption has positive externalities on labor supply. (That said, it bears repeating that about two- thirds of HUD subsidy recipients are either elderly or disabled.) While one of the major reviews of the empirical literature written a dozen years ago argued that “housing assistance is not persuasively associated with any effect on employment” (Shroder 2002, 381, 410), a growing body of evidence since that time provides a stronger basis for concluding there is some decline in work effort, at least as a result of HUD programs. Perhaps the best available empirical evidence on the effects of public housing on labor supply is the study by Susin (2005), who uses data from the Survey of Income and Program Participation (SIPP) to compare public housing residents with unsubsidized SIPP respondents who are matched on observable characteristics. Susin (2005) finds that public housing is associated with about 19 percent lower earnings. Other nonexperimental evidence comes from Newman, Holupka, and Harkness (2009), who match households in public housing addresses in the PSID to a sample of observably similar controls and find reductions in self- reported earnings in the first couple of years after moving into public housing, which fade out after three years (see also Olsen et al. 2005). All of these studies take a version of selection- on- observables assumptions and thus may be susceptible to bias from omitted variables, but the available evidence suggests that public housing reduces earnings of adult participants. There is stronger evidence for the effects on work effort from the voucher program. The study of the HUD welfare- to-work voucher experiment by Mills et al. (2006) finds sizable reductions in quarterly employment rates (3 or 4 percentage points, or 6– 8 percent of the control mean of 53 percent), but these were only statistically significant during the first year following random assignment. The Mills et al. study also found persistent increases in TANF receipt rate, equal to 4 percentage points during the first year (about 7 percent of the control mean of 56 percent) and equal to about 7 percentage points three years out (nearly 20 percent of the control mean). Jacob and Ludwig’s (2012) study of housing vouchers in Chicago finds that voucher receipt reduces quarterly employment rates by 4 percentage points (6 percent of the control mean), quarterly earnings by $330 (10 percent decline), and increased TANF receipt by 2 percentage points (15 percent). All of these effects appear to persist through eight years after random assignment (so more persistent than in Mills et al.), although updated data for this sample suggest the effects eventually do fade out after fourteen years (Jacob, Kapustin, and Ludwig 2015). Carlson et al. (2012), who employ a

112

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

difference- in-difference design with a matched control sample of households that applied for or received some form of public benefit in Wisconsin, find that voucher receipt is associated with a drop in earnings but little change in labor force participation. Reductions in earnings appear to fade out after five years; it is unclear whether this is a consequence of underlying unobserved differences between treatment and control groups or heterogeneity in effects across different populations. Concerns over public housing and the voucher program reducing work effort has prompted some experimentation around policies aimed at encouraging economic self- sufficiency and removing work disincentives. One notable example is the Jobs-Plus demonstration. Jobs-Plus randomized a bundle of varied employment services (which could include vocational training, educational programs, child care or transportation assistance) along with modified rent policies designed to reduce earnings disincentives to public housing developments across several cities. Residents at treatment developments experienced a 14– 20 percent increase in earnings relative to their control peers four and five years later with no detectable fade out (Bloom, Ricco, and Verma 2005). The design of the Jobs-Plus study was such that it was not possible to directly test which element of the intervention was most effective. A research demonstration is currently underway to test whether reducing work disincentives in the current rent calculation through reforming rent rules could attenuate the negative labor supply effects of public housing assistance. In any case, the full bundle of services provided under Jobs-Plus seems to generate benefits in excess of costs. Riccio (2006) reports that the cost per person in Jobs-Plus (including the foregone government revenue from lowering required contributions to rent) were on the order of $2,000 to $3,000 over the four- year study period. In contrast, the earnings gains per person over this period were around $4,600. Little is currently known about the effects of the LIHTC on labor supply. Unlike HUD programs like public housing or housing vouchers, the LIHTC uses a system of flat rents that should not generate a substitution effect on labor supply. However, to the degree to which the LIHTC subsidizes lowincome households we would expect there to still be an income effect that depresses work effort, potentially countervailed to some unknown degree by whatever the effects are of improved and more stable housing conditions on labor market success. Advocates also sometimes point to positive effects of housing on children’s outcomes as another type of externality and justification for in-kind housing programs. However, there is not very strong evidence in the literature for important externalities along these lines. For the public housing program, the best available evidence comes from the study by Currie and Yelowitz (2000), discussed above. Their study has

Low-Income Housing Policy

113

a strong research design for overcoming the possibility of selection bias and comparing public housing families to truly comparable nonparticipants. They find that public housing residence has no detectable effects for whites on schooling outcomes (as measured by grade retention in their census data), but reduces grade retention by 19 percentage points for blacks. The one important potential limitation with this finding is the reliance on grade retention as a measure of schooling outcomes, since schools in relatively higher- poverty areas may (all else equal) be less likely to hold children back.48 For the housing voucher program, Jacob, Kapustin, and Ludwig (2015) use administrative data on a large sample of children in Chicago combined with a random lottery design and find no statistically significant effects on various measures of children’s schooling outcomes, criminal involvement (as measured by arrest records), or health (as measured by Medicaid claims data). With statistically insignificant findings, a key issue always is the precision of the estimates, since null findings can often come with 95 percent confidence intervals that are so wide that they cannot rule out medium- size or even large effects. But in the Chicago voucher lottery the estimates can rule out effects of voucher receipt on children’s test scores that are any larger than about 0.06 to 0.09 standard deviations. Another concern is that if children’s outcomes are the result of accumulated exposure to developmentally productive environments, the effects of social programs may only reveal themselves over long periods of time, while most studies often follow families over only short periods. Yet the Chicago voucher study follows families for fourteen years and finds little evidence that impacts grow over time. These findings are similar to those found by the Welfare- to-Work experiment study by Mills et al. (2006), which relies on a smaller sample and parent reports of child outcomes. A nonexperimental study by Andersson et al. (2015) links national administrative data on housing assistance to detailed earnings records and uses a household fixed design to estimate the effects of parental housing assistance receipt during their child’s teenage years on that child’s earnings in early adulthood. The large national sample allows the authors to explore heterogeneity in terms of race/ethnicity and gender. They find a mixed pattern of earnings effects—positive and negative—which vary by 48. For example, Jacob and Lefgren (2009) find that in the Chicago Public School system in the early 1990s, retention rates for students in grades 3, 6, and 8 were on the order of 1 or 2 percent. In 1996/97 CPS enacted a policy to end “social promotion” and tie promotion to performance on a standardized achievement test. The performance standard was set to equal about the 15th to 20th percentile of the national achievement distribution, with about 30– 40 percent of students failing to meet the standard after the policy and about 10– 20 percent each year retained in grade. A different observational study (Newman and Harkness 2002) uses data from the 1997 National Survey of America’s Families to estimate that children who lived in public housing for more years between 1968 and 1982 had somewhat higher employment rates and labor earnings as young adults.

114

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

race/ethnicity and gender with black teenage girls appearing to benefit the most. While the authors are careful to try to account for possible sources of endogeneity, their preferred design may be still be susceptible to unobserved shocks that affect subsidy receipt as well as a child’s future earnings.49 An important limitation is the external validity of the authors’ household fixed effects design, which relies on very small sibling differences in exposure to housing assistance during ages thirteen to eighteen to identify the effects of public housing and vouchers on future earnings.50 The evidence thus suggests that children are not much affected when their families move into public housing or receive a housing voucher. Both of those interventions improve housing conditions, but do not seem to do much to change the neighborhood environments in which children live. In contrast, housing interventions that do change neighborhood conditions—such as MTO- induced moves from high- poverty housing projects into voucher- subsidized units in lower- poverty areas—do seem to generate some benefits to children. In the interim (five- year) MTO follow-up, relative to the control group (that did not receive help moving out of public housing) girls experienced improvements in mental health (0.19 standard deviations), declines in risky behavior (0.13 SD), and a decline of about 40 percent in lifetime arrest prevalence (Kling, Ludwig, and Katz 2005; Kling, Liebman, and Katz 2007; Orr et al. 2003). However, voucher receipt relative to distressed public housing may have, if anything, led to worse outcomes for boys with respect to outcomes like risky behavior (by .21 SD). We see a similar pattern, although somewhat more muted, in the long- term MTO follow-up that followed families for ten to fifteen years after the time of random assignment with respect to risky or antisocial behaviors (Sanbonmatsu et al. 2011), but sizable effects on mental health outcomes that—as at the five- year follow-up—go in opposite directions for boys versus girls (Kessler et al. 2014).51 An even longer- term follow-up of these MTO children that looked at long- term earnings as measured by IRS tax records suggests that those who were relatively young when their families moved (under age thirteen) experience earnings gains in adulthood (when people are in their twenties) from moving to low- poverty neighborhoods of about $3,500 per year—about 31 percent of the control mean (Chetty, Hendren, and Katz 2016). What is remarkable about this finding is that we would not necessarily have expected it, in the sense that up to this point we had seen no detectable changes in 49. For example, youth criminal activity that results in criminal convictions can cause families to lose subsidy or eligibility while presumably also having a direct effect on future earnings. 50. The vast majority of observations in their estimation sample have only one- or two- year differences in the amount of teenage sibling exposure to housing assistance. 51. Compared to the control group, girls in families assigned to the traditional voucher group in MTO had lower rates of major depression (6.5 percent versus 10.9 percent) and conduct disorder (0.3 percent versus 2.9 percent), while for boys there were higher rates of post- traumatic stress disorder from the traditional voucher treatment (4.9 percent versus 1.9 percent).

Low-Income Housing Policy

115

achievement test scores and few changes in educational attainment.52 Rickford et al. (2015) provide one candidate (partial) explanation for these earnings gains—MTO reduces use of African American vernacular English for these youth. A qualitatively similar pattern comes from two studies of public housing demolitions in Chicago, which compare the outcomes of children whose families were moved earlier versus later in time due to idiosyncracies in how the city prioritized which buildings to demolish when. Jacob (2004) shows that those who are moved as a result of public housing demolition wind up in census tracts with poverty rates about 14 percentage points below those of comparable children whose projects were not demolished (the tract poverty rate for these children is an astonishing 68 percent). As with MTO, these demolition- induced moves had a much more modest, if any, impact on neighborhood racial segregation. Jacob showed that these moves had no detectable benefits on any academic outcomes in the short run—in the form of test scores, grades, or absences, and, if anything, might have increased drop- out rates slightly. Yet a longer- term follow up of these children by Chyn (2015) shows that during adulthood those children who moved to lower- poverty areas due to project demolitions have higher employment rates (by 4 percentage points, compared to a control mean of 42 percent) and annual earnings ($602, compared to a control mean of $3,713). Another version of the externality argument is that investments in subsidized housing improve neighborhoods through removing blight, creating attractive new buildings, and repopulating neighborhoods. There is some evidence that LIHTC developments increase the value of surrounding properties, at least in low- income areas. For example, Baum-Snow and Marion (2009) find that the construction of LIHTC units increase the median value of nearby homes in low- income areas. Similarly, Schwartz et  al. (2006) examine the property value impact of city- assisted subsidized housing investments on distressed parcels in New York City, much of which used tax credits. Using geocoded government administrative data to estimate a difference- in-difference specification, they find that the value of properties surrounding the housing investment rose more after the completion of a new unit than the value of comparable properties in the same neighborhood but further away. The magnitudes of these effects are substantial, suggesting the city government could recoup its subsidies through resulting increases in property tax revenues. Of course, these results come from one city and focus on subsidized housing investments that were explicitly targeted to fix 52. Sanbonmatsu et al. (2011) show no changes in high school graduation or college attendance rates as measured by data from the National Student Clearinghouse (NSC). Chetty et al. (2016) use updated data on college attendance from IRS records that should be more accurate than the NSC data, and find some suggestive evidence of increased college attendance, but those results seem to be somewhat sensitive to the inclusion or exclusion of baseline covariates in the estimation model.

116

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

up the 100,000 blighted housing units and vacant lots that the city had taken over for tax foreclosure during the 1970s. Other recent studies have suggested that subsidized housing investments fail to deliver any significant effect on neighboring properties (Briggs, Darden, and Aidala 1999), while Ellen et al. (2007) suggest that effects differ across programs. And of course, place- based investments are extremely expensive. Thus, the neighborhood externality argument should be made cautiously, as it may apply only in limited circumstances, perhaps when investments are made deliberately to target neighborhood blight in a city with an otherwise strong economy. 2.6

Summary and Conclusions

In this chapter we set out to answer the questions: How does the United States spend its means- tested housing assistance dollars? Why has it made those choices? What does this spending accomplish? Unfortunately, only the first of these questions lends itself to a good answer at this point. Federal housing programs began in the 1930s with public housing, which over time has been joined by a large number of other programs that subsidize privately built and operated housing developments, as well as subsidies for tenants to live in private units of their own choosing (housing vouchers). The intellectual (as opposed to political) justification for these programs continues to be contested and somewhat unclear—as was also the case forty years ago when Henry Aaron wrote his excellent book Shelter and Subsidies (1972). Much of the support for means- tested housing programs today seems to be motivated by concerns about housing affordability for lowincome households. The rationale for providing in-kind housing support rather than cash transfers should hinge at least partly on the assumption that in-kind programs will lead to more housing consumption than would cash transfers of equal cost. Yet there is remarkably little evidence available to date on this first- order question. A different justification for housing programs (which in principle could also apply to cash transfers, since these would also stimulate housing consumption) is that housing consumption generates externalities. But there is surprisingly little good evidence about the effects of existing programs on the behavior and well- being of participating families. We say “surprisingly” both because these programs consume significant amounts of government resources each year (and so are important), and because the excess demand for these program services (fewer than one out of four income- eligible families in the United States participates in such a program) would seem to offer numerous opportunities to carry out studies with truly comparable comparison groups. The best available evidence suggests that increasing housing consump-

Low-Income Housing Policy

117

tion without improving neighborhood conditions may have little detectable impact on conventional measures of human capital accumulation of children and may reduce labor supply of working- age adults. But these questions are hardly settled; there is considerable room for further evidence on the effects of housing subsidy receipt on families and children. Indeed, there is virtually no evidence about the effects of what is currently our largest low- income housing subsidy program, the Low Income Housing Tax Credit. There is more robust evidence about the importance of neighborhoods, which suggests that exposure to less- distressed neighborhood conditions can improve health outcomes and overall well- being, and, as suggested by more recent research, among young children to boost their long- term labor market success as adults. Another surprise with existing research on means- tested housing programs is our limited understanding of potential innovations to existing or new potential programs. This is particularly surprising given the level of experimentation taking place at the local level by public housing authorities. For example, more than thirty housing authorities have been granted “Moving to Work” (MTW) status in the past decade, which has allowed them to waive a number of HUD regulations to tackle goals of enhancing economic self- sufficiency and improving resident opportunity. The housing policy landscape would seem to be rife with state and local variation in policy that is waiting to be studied. Some of the most important questions, then, for future research in this area include:

• What are the relative effects of in-kind housing programs versus equivalently costly cash transfer programs on housing consumption of poor households? Many local housing agencies use lotteries to allocate slots in oversubscribed housing programs, which can help identify the housing consumption effects of those programs. Some work like this has been done to date, but only in a few cities such as Chicago (Jacob and Ludwig 2012). For external validity purposes there would be value in having such estimates in more cities, although the primary challenge in that effort may be to assemble the government administrative records necessary to take advantage of these natural experiments. That would help identify the effects of housing programs on housing consumption, but what about the effects of cash transfers on housing consumption? Economists have developed a large parallel literature in trying to understand various behavioral responses of people to cash transfers, exploiting, for instance, variation over time in the value of the federal Earned Income Tax Credit or variation in the generosity of state- specific EITC programs. This work could also be extended to consider impacts on housing consumption as well. The challenge to this type of work may be that the social science data sets usually analyzed by economists often

118

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

do not include a great deal of information about housing conditions or consumption. • What are the externalities associated with both housing and meanstested housing programs? To date there is a small literature that studies housing- voucher lotteries suggesting that providing subsidies to previously unsubsidized households has fairly modest effects on nonhousing outcomes (Mills et al. 2006; Jacob and Ludwig 2012; Jacob, Kapustin, and Ludwig 2015). This research stands in contrast to nonexperimental studies that find that at least some features of housing consumption are associated with important nonhousing outcomes (e.g., see Leventhal and Newman 2010). Whether this discrepancy is due to problems with internal validity in the nonexperimental studies or with limited external validity of the voucher lotteries remains unclear at present and would be valuable to learn more about in future research. Expanding the set of cities for which we have housing voucher lottery studies, as suggested in the previous paragraph, would begin to clarify the degree to which the difference in experimental and nonexperimental literatures is due to external validity issues. • Similarly, there is virtually no research on the impact of living in Low Income Housing Tax Credit developments, despite the fact that the LIHTC program has become the largest federal low- income housing production program. One important barrier to progress here is surely data: Whatever administrative records the government has on LIHTC participants probably do not include much information on most of the outcomes of scientific and policy concern (and so would require figuring out a way to link to other government data), while most existing data sets that capture the outcomes of primary interest do not capture whether someone is living in a LIHTC- subsidized unit. A different barrier here is probably research design, in the sense that some detailed investigation would be required into the workings of the LIHTC program to uncover useful natural experiments (sources of exogenous identifying variation). • Questions also remain about the externalities associated with neighborhood conditions like racial and economic composition or social conditions or cohesion like “collective efficacy” (Sampson, Earls, and Raudenbush 1997), which housing programs could in principle affect. Here again the experimental and nonexperimental literatures seem to somewhat conflict. For example, evidence from the Moving to Opportunity (MTO) experiment suggest important changes in some outcomes, such as physical and mental health (Ludwig et al. 2011, 2012), overall well- being and (more recently) long- term earnings for children exposed to disadvantaged neighborhoods, but these impacts are less sweeping than what is generally suggested by nonexperimental studies. How do we reconcile these literatures? Identifying more natural experiments

Low-Income Housing Policy









119

that move families across neighborhoods, as with the public housing demolitions in Chicago studied by Jacob (2004) and Chyn (2015), would for starters help us learn more about the role of external validity with the MTO study as an explanation. What modifications to existing housing programs, if any, could make them more successful in enabling poor families to access less disadvantaged neighborhoods? For example with the current housing voucher program, a sizable share of families offered vouchers are not able to use them to lease a new unit within the program’s limited search window. And those who do lease up wind up living in neighborhoods that are not so different from the ones they were living in prior to receiving a subsidy. To the extent to which neighborhood conditions are important for at least some aspects of well- being, how can we modify our existing housing policy levers to do more to change the geographic concentration of disadvantage in the United States? Getting more housing agencies to agree to link their records to other government administrative data would allow researchers to begin studying some of the innovations that are already underway across cities. What are the advantages and disadvantages of relying on project- based versus tenant- based programs under different types of housing market or economic conditions? Studies like MTO tell us something about the effects of housing vouchers versus public housing, which is valuable but surely cannot be the only word on this subject given that it was carried out in just five cities with extremely distressed public housing developments. In principle the relative advantages and disadvantages of the two types of subsidy programs could vary according to the tightness of the local housing market, a type of potential contingency that we would need much more than five- city data points to understand. What are the advantages and disadvantages of relying on flat versus income- based rents in housing programs? Flat rents of the sort used in the LIHTC program have the advantage of minimizing work disincentive effects associated with the income- tied rents used in most HUD programs, but have the downside of making units more expensive to low- income households. Almost nothing is known on this point right now, which is a question that could in principle be answered by supporting and studying local housing authority experimentation with the different types of rent approaches. What are the benefits and costs of mixed- income developments to lowincome households? One issue is better understanding the degree to which nonpoor households wind up being implicitly subsidized to live in mixed- income developments, and what the other costs are of building mixed- income developments rather than those that would exclusively serve low- income families. A different but related question is how much more it costs to house poor households in higher- valued neigh-

120

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

borhoods. How to view these potential trade- offs depends partly on the size of the spillover benefits to poor families from living near nonpoor households, and how the social value of those spillovers compares to the opportunity cost of directly subsidizing fewer poor households. Little is currently known about the relative magnitudes of the different effects at play here with this trade- off. • What are the benefits and costs from a system that provides either smaller or more time- limited subsidies to a larger share of incomeeligible households than currently receive housing help from the government? There is an implicit disagreement right now between housing advocates who believe that the benefits of housing subsidies are convex in the subsidy amount and researchers who study other transfer programs who believe the benefits of subsidies in general are more likely to be concave in the subsidy amount. Understanding more about this question is critical for housing policy given that at present fewer than one in four income- eligible households receives rent support from federal means- tested subsidy programs. Finally, most of the research that has been carried out on low- income housing programs to date has focused on effects on the program participants themselves. But programs that seek to change the supply side of the privatehousing market or change how low- income families are distributed across neighborhoods have the potential to have impacts on nonparticipants as well through channels other than just the tax burden associated with financing the programs. For example, housing policies and programs may change the distribution of rents or house prices in the private market overall, or change the nature of “peer effects” that people experience within their neighborhoods or school settings. Unfortunately, remarkably little is currently known about what economists would call the “general equilibrium” effects of most housing programs. While studying general equilibrium effects is far more challenging than examining impacts on just the program participants, it nonetheless should be a high priority for future research in this area.

References Aaron, Henry J. 1972. Shelter and Subsidies: Who Benefits from Federal Housing Policies? Washington, DC: Brookings Institution. Andersson, Fredrik, John Haltiwanger, Mark Kutzbach, Giordano Palloni, Henry Pollakowski, and Daniel H. Weinberg. 2015. “Childhood Housing and Adult Earnings: A Between-Siblings Analysis of Housing Vouchers and Public Housing.” US Census Bureau Center for Economic Studies Paper no. CES-WP- 13-48, US Census Bureau, Washington, DC.

Low-Income Housing Policy

121

Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91 (434): 444– 55. Bauer, Catherine. 1957. “The Dreary Deadlock of Public Housing.” Architectural Forum 106 (5): 138– 42, 219– 21. Baum-Snow, Nathaniel, and Justin Marion. 2009. “The Effects of Low Income Housing Tax Credit Developments on Neighborhoods.” Journal of Public Economics 93 (5-6): 654– 66. Bayer, Patrick, Fernando Ferreira, and Robert McMillan. 2007. “A Unified Framework for Measuring Preferences for Schools and Neighborhoods.” Journal of Political Economy 115 (4): 588– 638. Bloom, Howard, James Riccio, and Nandita Verma. 2005. Promoting Work in Public Housing: The Effectiveness of Jobs Plus. New York: MDRC. Bonastia, Christopher. 2006. Knocking on the Door: The Federal Government’s Attempt to Desegregate the Suburbs. Princeton, NJ: Princeton University Press. Brewer, Mike, James Brown, and Wenchao Jin. 2012. “Universal Credit: A Preliminary Analysis.” Fiscal Studies 33 (1): 39– 71. Briggs, Xavier de Souza, Joe T. Darden, and Angela Aidala. 1999. “In the Wake of Desegregation: Early Impacts of Scattered-Site Public Housing on Neighborhoods in Yonkers, New York.” Journal of the American Planning Association 65 (1): 27– 49. Carlson, Deven, Robert Haveman, Thomas Kaplan, and Barbara Wolfe. 2012. “Long-Term Effects of Public Low-Income Housing Vouchers on Neighborhood Quality and Household Composition.” Journal of Housing Economics 21 (2): 101– 20. Chetty, Raj, Nathaniel Hendren, and Lawrence F. Katz. 2016. “The Effects of Exposure to Better Neighborhoods on Children: New Evidence from the Moving to Opportunity Experiment.” American Economic Review 106 (4): 855–902. Chyn, Eric. 2015. “Moved to Opportunity: The Long-Run Effect of Public Housing Demolition on Labor Market Outcomes of Children.” Working Paper, Department of Economics, University of Michigan. Collinson, Robert A., and Peter Ganong. 2015. “The Incidence of Housing Voucher Generosity.” Working Paper, New York University and Harvard University. http:// ssrn.com/abstract=2255799. Cummings, Jean L., and Denise DiPasquale. 1999. “The Low-Income Housing Tax Credit: An Analysis of the First Ten Years.” Housing Policy Debate 10 (2): 251– 307. Currie, Janet, and Firouz Gahvari. 2008. “Transfers in Cash and In-Kind: Theory Meets the Data.” Journal of Economic Literature 46 (2): 333– 83. Currie, Janet M., and Aaron Yelowitz. 2000. “Are Public Housing Projects Good for Kids?” Journal of Public Economics 75 (1): 99– 124. Desai, Mihir A., Dhammika Dharmapala, and Monica Singhal. 2010. “Tax Incentives for Affordable Housing: The Low Income Housing Tax Credit.” In Tax Policy and the Economy, vol. 24, edited by Jeffrey R. Brown, 181– 205. Chicago: University of Chicago Press. Desmond, Matthew. 2012. “Eviction and the Reproduction of Urban Poverty.” American Journal of Sociology 118:88– 133. Dubin, Robin A. 1988. “Estimation of Regression Coefficients in the Presence of Spatially Autocorrelated Error Terms.” Review of Economics and Statistics 70 (3): 466– 74. Edin, Kathryn, and Laura Lein. 1997. Making Ends Meet: How Single Mothers Survive Welfare and Low-Wage Work. New York: Russell Sage Foundation Press.

122

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Ellen, Ingrid Gould, Keren M. Horn, and Amy E. Schwartz. 2014. “Why Don’t Housing Choice Voucher Holders Live Near Better Schools?” Paper presented at 2014 APPAM Conference, Albuquerque, New Mexico, November 6– 8. Ellen, Ingrid Gould, Michael Schill, Amy E. Schwartz, and Ioan Voicu. 2002. “Revitalizing Inner-City Neighborhoods: New York City’s Ten Year Plan for Housing.” Housing Policy Debate 13 (3): 529– 66. Ellen, Ingrid Gould, Amy Ellen Schwartz, Ioan Voicu, and Michael Schill. 2007. “Does Federally-Subsidized Rental Housing Depress Property Values?” Journal of Policy Analysis and Management 26 (2): 257– 80. Evans, G. W. 2003. “The Built Environment and Mental Health.” Journal of Urban Health: Bulletin of the New York Academy of Medicine 80 (4): 536– 55. Falk, Gene. 2012. “Low-Income Assistance Programs: Trends in Federal Spending.” In Congressional Research Service (CRS), House Ways and Means Committee, US Congress. http://greenbook.waysandmeans.house.gov/sites/greenbook.ways andmeans.house.gov/files/2012/documents/RL41823_gb.pdf. Finkel, Meryl, and Larry Buron. 2001. Study on Section 8 Voucher Success Rates: Volume I: Quantitative Study of Success Rates in Metropolitan Areas. Cambridge, MA: Abt Associates. Finkel, Meryl, and Ken Lam. 2008. “Use of Flat Rents in the Public Housing Program.” Cityscape 10 (1): 91– 116. Fischer,Will, and Barbara Sard. 2013. Chart Book: Federal Housing Spending is Poorly Matched to Need. Center on Budget and Policy Priorities, Washington, DC. http://www.cbpp.org/cms/index.cfm?fa=view&id=4067. Fisk, William J., Quanhong Lei-Gomez, and Mark J. Mendell. 2007. “Meta-Analyses of the Associations of Respiratory Health Effects with Dampness and Mold in Homes.” Indoor Air 17 (4): 284– 96. Frieden, Bernard. 1980. “Housing Allowances: An Experiment that Worked.” Public Interest 59:15– 35. Friedman, Lawrence. 1968. Government and Slum Housing: A Century of Frustration. Chicago: Rand McNally. Galiani, Sebastian, Alvin Murphy, and Juan Pantano. Forthcoming. “Estimating Neighborhood Choice Models: Lessons from a Housing Assistance Experiment.” American Economic Review. Glaeser, Edward L., and Joseph Gyourko. 2002. “The Impact of Building Restrictions on Housing Affordability.” Federal Reserve Bank of New York, Economic Policy Review June: 1– 19. Goux, Dominique, and Eric Maurin. 2005. “The Effect of Overcrowded Housing on Children’s Performance at School.” Journal of Public Economics 89:797– 819. Hansen, Kristen A. 1998. “Seasonality of Moves and Duration of Residence.” US Bureau of the Census: Current Population Reports, October. https://www.census .gov/prod/3/98pubs/p70-66.pdf. Hanushek, Eric. 1986. “Non-Labor-Supply Responses to the Income Maintenance Experiments.” In Lessons from the Income Maintenance Experiments, edited by Alicia Munnell, 106– 21. Boston: Federal Reserve Bank of Boston. Hanushek, Eric A., John F. Kain, and Steven G. Rivkin. 2004. “Disruption versus Tiebout Improvement: The Costs and Benefits of Switching Schools.” Journal of Public Economics 88 (9): 1722– 46. Hays, R. Allen. 1995. The Federal Government and Urban Housing: Ideology and Change in Public Policy (2nd ed.). Albany, NY: State University of New York Press. Hirsch, Arnold R. (1983) 1998. Making the Second Ghetto: Race and Housing in Chicago, 1940–1960. Chicago: University of Chicago Press. Horn, Keren M., Ingrid Gould Ellen, and Amy Ellen Schwartz. 2014. “Do Housing

Low-Income Housing Policy

123

Choice Voucher Holders Live Near Good Schools?” Journal of Housing Economics 24:109– 21. Hunt, D. Bradford. 2009. Blueprint for Disaster: The Unraveling of Chicago Public Housing. Chicago: University of Chicago Press. Jacob, Brian A. 2004. “Public Housing, Housing Vouchers and Student Achievement: Evidence from Public Housing Demolitions in Chicago.” American Economic Review 94 (1): 233– 58. Jacob, Brian A., Max Kapustin, and Jens Ludwig. 2015. “The Impact of Housing Assistance on Child Outcomes: Evidence from a Randomized Housing Lottery.” Quarterly Journal of Economics 130 (1): 465– 506. Jacob, Brian A., and Lars Lefgren. 2009. “The Effect of Grade Retention on High School Completion.” American Economic Journal: Applied Economics 1 (3): 33– 58. Jacob, Brian A., and Jens Ludwig. 2012. “The Effects of Housing Assistance on Labor Supply: Evidence from a Voucher Lottery.” American Economic Review 102 (1): 272– 304. Joseph, M. L. 2013. “Mixed-Income Symposium Summary and Response: Implications for Antipoverty Policy.” Cityscape 15 (2): 215– 21. Kessler, Ronald C., Greg J. Duncan, Lisa A. Gennetian, Lawrence F. Katz, Jeffrey R. Kling, Nancy A. Sampson, Lisa Sanbonmatsu, Alan M. Zaslavsky, and Jens Ludwig. 2014. “Associations of Housing Mobility Interventions for Children in High-Poverty Neighborhoods with Subsequent Mental Disorders during Adolescence.” Journal of the American Medical Association 311 (9): 937– 47. Kling, Jeffrey R., Jeffrey B. Liebman, and Lawrence F. Katz. 2007. “Experimental Analysis of Neighborhood Effects.” Econometrica 75 (1): 83– 119. Kling, Jeffrey R., Jens Ludwig, and Lawrence F. Katz. 2005. “Neighborhood Effects on Crime for Female and Male Youth: Evidence from a Randomized Housing Voucher Experiment.” Quarterly Journal of Economics 120 (1): 87– 130. Lawyers Committee for Better Housing. 2002. “Locked Out: Barriers to Choice for Housing Voucher Holders.” Chicago: Lawyers Committee for Better Housing. http://lcbh.org/reports/locked- out- barriers- choice- housing- voucher- holders. Lee, Kwan Ok, and Gary Painter. 2013. “What Happens to Household Formation in a Recession?” Journal of Urban Economics 76 (C): 93– 109. Lennen, Mary Clare, Lauren D. Applebaum, J. Lawrence Aber, and Katherine McCaskie. 2003. “Public Attitudes Towards Low-Income Families and Children: Research Report no. 1.” Columbia University, National Center for Children in Poverty. Lens, Michael C., Ingrid Gould Ellen, and Katherine O’Regan. 2011. “Do Vouchers Help Low-Income Households Live in Safer Neighborhoods? Evidence on the Housing Choice Voucher Program.” Cityscape: A Journal of Policy Development and Research 13 (3): 135– 59. Leventhal, Tama, and Sandra Newman. 2010. “Housing and Child Development.” Children and Youth Services Review 32 (9): 1165– 74. Løken, Katrine V., Magne Mogstad, and Matthew Wiswall. 2012. “What Linear Estimators Miss: The Effects of Family Income on Child Outcomes.” American Economic Journal: Applied Economics 4 (2): 1– 35. Lubell, Jeffrey M., Mark Shroder, and Barry Steffen. 2003. “Work Participation and Length of Stay in HUD-Assisted Housing.” Cityscape 6 (2): 207– 23 Mallach, Alan. 2007. “Landlords at the Margins: Exploring the Dynamics of the One to Four Unit Rental Housing Industry.” Working Paper no. RR07-15, Joint Center for Housing Studies, Harvard University. Malpezzi, Stephen, and Kerry Vandell. 2002. “Does the Low-Income Housing Tax Credit Increase the Supply of Housing?” Journal of Housing Economics 11 (4): 360– 80.

124

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Mani, Anandi, Sendhil Mullainathan, Eldar Shafir, and Jiaying Zhao. 2013. “Poverty Impedes Cognitive Function.” Science 341 (6149): 976– 80. Mateyka, Peter, and Matthew Marlay. 2010. “Residential Duration by Tenure, Race, and Ethnicity.” Paper presented at the Annual Meetings of the American Sociological Association, Atlanta, GA, August 14– 17. Mayo, Stephen K. 1981. “Theory and Estimation in the Economics of Housing Demand.” Journal of Urban Economics 10:95– 116. Mayo, Stephen K., Shirley Mansfield, David Warner, and Richard Zwetchkenbaum. 1980. Housing Allowances and Other Rental Assistance Programs—A Comparison Based on the Housing Allowance Demand Experiment, Part 2: Costs and Efficiency. Cambridge, MA: Abt Associates Inc. Meehan, Eugene J. 1979. The Quality of Federal Policymaking: Programmed Failure in Public Housing. St Louis: University of Missouri Press. Mills, Gregory, Daniel Gubits, Larry Orr, David Long, Judie Feins, Bulbul Kaul, Michelle Wood, Amy Jones & Associates, Cloudburst Consulting, and the QED group. 2006. The Effects of Housing Vouchers on Welfare Families. Washington, DC: US Department of Housing and Urban Development, Office of Policy Development and Research. Mitchell, J. Paul, ed. 1985. Federal Housing Policy and Programs: Past and Present. New Brunswick, NJ: Center for Urban Policy Research. Newman, Sandra, and Joseph Harkness. 2002. “The Long-Term Effects of Public Housing on Self-Sufficiency.” Journal of Policy Analysis and Management 21 (1): 21– 43. Newman, S., C. S. Holupka, and J. Harkness. 2009. “The Long-Term Effects of Housing Assistance on Work and Welfare.” Journal of Policy Analysis and Management 28:81– 101. doi: 10.1002/pam.20403. Newman, Sandra J., and Ann B. Schnare. 1997. “ ‘ . . . And a Suitable Living Environment’: The Failure of Housing Programs to Deliver on Neighborhood Quality.” Housing Policy Debate 8:703– 41. Nichols, Albert, and Richard Zeckhauser. 1981. “Targeting Transfers through Restrictions on Recipients.” American Economic Review Papers and Proceedings 72:372– 77. O’Flaherty, Brendan. 2011. “Homelessness as Bad Luck: Implications for Research.” In How to House the Homeless, edited by Ingrid Gould Ellen and Brendan O’Flaherty, 143– 82. New York: Russell Sage Foundation. O’Flaherty, Brendan, and Ingrid Gould Ellen. 2007. “Do Government Programs Make Households Too Small? Evidence from New York City.” Population Research and Policy Review 26 (4): 387– 409. O’Flaherty, Brendan, and Ingrid G. Ellen, eds. 2010. How to House the Homeless. New York: Russell Sage Foundation. Olsen, Edgar O. 2003. “Housing Programs for Low-Income Households.” In MeansTested Transfer Programs in the United States, edited by Robert A. Moffitt, 365– 442. Chicago: University of Chicago Press. ———. 2008. Getting More from Low-Income Housing Assistance. Washington, DC: The Hamilton Project. ———. 2009. “The Cost-Effectiveness of Alternative Methods of Delivering Housing Subsidies.” Paper presented at the Annual Research Conference of the Association of Public Policy Analysis and Management, Washington, DC, November 5– 7. Olsen, Edgar O., and David M. Barton. 1983. “The Benefits and Costs of Public Housing in New York City.” Journal of Public Economics 20 (3): 299– 332. Olsen, Edgar O., and Jens Ludwig. 2013. “Performance and Legacy of Housing Poli-

Low-Income Housing Policy

125

cies.” In Legacies of the War on Poverty, edited by Martha J. Bailey and Sheldon Danziger, 206– 34. New York: Russell Sage Foundation. Olsen, Edgar O., Catherine A. Tyler, Jonathan W. King, and Paul E. Carrillo. 2005. “The Effects of Different Types of Housing Assistance on Earnings and Employment.” Cityscape 8 (2): 163– 88. O’Regan, Katherine M., and Keren M. Horn. 2013. “What Can We Learn about the Low-Income Housing Tax Credit Program by Looking at the Tenants?” Housing Policy Debate 23 (3): 597– 613. Orr, Larry, Judith D. Feins, Robin Jacob, Erik Beecroft, Lisa Sanbonmatsu, Lawrence F. Katz, Jeffrey B. Liebman, and Jeffrey R. Kling. 2003. “Moving to Opportunity for Fair Housing Demonstration Program: Interim Impacts Evaluation.” Report prepared by Abt Associates Inc., and the National Bureau of Economic Research for the US Department of Housing and Urban Development, Office of Policy Development and Research. Washington, DC: US Department of Housing and Urban Development. Poverty and Race Research Action Council. 2005. Keeping the Promise: Preserving and Enhancing Housing Mobility in the Section 8 Housing Choice Voucher Program: Final Report of the Third National Conference on Housing Mobility. http:// www.prrac .org/ full_text .php?text_id=1048&item_id=9646&newsletter _id=0&header=Current%20Projects. Quigley, John M., and Steven Raphael. 2004. “Is Housing Unaffordable? Why Isn’t It More Affordable?” Journal of Economic Perspectives 18 (1): 191– 214. ———. 2005. “Regulation and the High Cost of Housing in California.” American Economic Review 95 (2): 323– 28. Riccio, James A. 2006. “Jobs-Plus: A Promising Strategy.” Testimony presented to the Subcommittee on Federalism and the Census, House Committee on Government Reform. http://www.mdrc.org/publication/jobs- plus- promising- strategy. Rickford, John R., Greg J. Duncan, Lisa Gennetian, Ray Yun Gou, Rebecca Greene, Lawrence F. Katz, Ronald C. Kessler, et al. 2015. “Neighborhood Effects on Use of African-American Vernacular English.” Proceedings of the National Academy of Sciences 112 (38): 11817– 22. Riis, Jacob (1890) 1970. How the Other Half Lives: Studies among the Tenements of New York. Cambridge, MA: Belknap Press. Rosen, Eva. 2014. “Selection, Matching, and the Rules of the Game: Landlords and the Geographic Sorting of Low-Income Renters.” Working Paper no. W14-11, Joint Center for Housing Studies, Harvard University. Rosen, Harvey S. 1985. “Housing Subsidies: Effects on Housing Decisions, Efficiency, and Equity.” Handbook of Public Economics, vol. 1, edited by Alan J. Auerbach and Martin Feldstein, 375– 420. Amsterdam: Elsevier. Rubinowitz, Leonard S., and James E. Rosenbaum. 2000. Crossing the Class and Color Lines: From Public Housing to White Suburbia. Chicago: University of Chicago Press. Sampson, Robert J., Felton Earls, and Stephen W. Raudenbush. 1997. “Neighborhoods and Violent Crime: A Multilevel Study of Collective Efficacy.” Science 277 (5328): 918– 24. Sanbonmatsu, Lisa, Jens Ludwig, Lawrence F. Katz, Lisa A. Gennetian, Greg J. Duncan, Ronald C. Kessler, Emma Adam, Thomas W. McDade, and Stacy Tessler Lindau. 2011. Moving to Opportunity for Fair Housing Demonstration Program: Final Impacts Evaluation. Washington, DC: US Department of Housing and Urban Development, Office of Policy Development and Research. Schill, Michael H. 1993. “Distressed Public Housing: Where Do We Go from Here?” University of Chicago Law Review 60 (2): 497– 554.

126

Robert Collinson, Ingrid Gould Ellen, and Jens Ludwig

Schill, Michael H., and Susan M. Wachter. 1995. “The Spatial Bias of Federal Housing Law and Policy: Concentrated Poverty in Urban America.” University of Pennsylvania Law Review 143:1285– 342. Schwartz, Alex F. 2014. Housing Policy in the United States, 3rd ed. New York: Routledge. Schwartz, Amy, Ingrid Gould Ellen, Ioan Voicu, and Michael Schill. 2006. “The External Effects of Subsidized Housing.” Regional Science and Urban Economics 36:679– 707. Shroder, Mark D. 2002. “Does Housing Assistance Perversely Affect Self-Sufficiency? A Review Essay.” Journal of Housing Economics 11 (4): 381– 417. Shroder, Mark D., and Arthur J. Reiger. 2000. “Vouchers versus Production Revisited.” Working Paper, US Department of Housing and Urban Development. Sinai, Todd, and Joseph Gyourko. 2004. “The (Un)Changing Geographical Distribution of Housing Tax Benefits: 1980 to 2000.” Tax Policy and the Economy 18:175_208. Spence, Lewis H. 1993. “Rethinking the Social Role of Public Housing.” Housing Policy Debate 4 (3): 355– 68. Susin, Scott. 2005. “Longitudinal Outcomes of Subsidized Housing Recipients in Matched Survey and Administrative Data.” Cityscape 8 (2): 189– 218. US Department of Housing and Urban Development (HUD). 1974. Housing in the Seventies. Washington, DC: HUD. ———. 2012. “PHA Preferences Web Census Survey.” Washington, DC: HUD Office of Policy Development and Research. ———. 2013. “Worst Case Housing Needs 2011: Report to Congress.” Washington, DC: HUD Office of Policy Development and Research. US Government Accounting Office (GAO). 2001. “Federal Housing Programs: What They Cost and What They Provide.” Report to Congressional Committees no. GAO- 01-901R, Washington, DC. ———. 2002. “Federal Housing Assistance: Comparing the Characteristics and Costs of Housing Programs.” Report to Congressional Committees no. GAO02– 76, Washington, DC. Vale, Lawrence J. 2000. From the Puritans to the Projects: Public Housing and Public Neighbors. Cambridge, MA: Harvard University Press. ———. 2013. “Public Housing in the United States: Neighborhood Renewal and the Poor.” In Policy, Planning and People Promoting Justice in Urban Development, edited by Naomi Carmon and Susan Fainstein. Philadelphia: University of Pennsylvania Press. Von Hoffman, Alex. 1996. “High Ambitions: The Past and Future of Low-Income Housing Policy.” Housing Policy Debate 7 (4): 423– 46. ———. 2012. “A Rambling Edifice: American Housing Policy in the Twentieth Century.” Working Paper no. W12-9, Joint Center for Housing Studies, Harvard University. Wallace, E. James, et al. 1981. Participation and Benefits in the Urban Section 8 Program: New Construction and Existing Housing. Cambridge, MA: Abt Associates. Wright, Gwendolyn. 1981. Building the Dream: A Social History of Housing in America. New York: Pantheon Books.

3

Employment and Training Programs Burt S. Barnow and Jeffrey Smith

3.1

Introduction

The United States has used employment and training programs as a policy tool during two periods, first during the Great Depression, when work relief and other employment- related programs were used, and since 1961, when a broad array of employment and training programs were implemented. This chapter focuses on employment and training programs implemented in the latter period, with emphasis on current means- tested programs and developments since 2000.1 Burt S. Barnow is the Amsterdam Professor of Public Service and Economics at the Trachtenberg School of Public Policy and Public Administration at George Washington University. Jeffrey Smith is professor of economics and of public policy at the University of Michigan and a research associate of the National Bureau of Economic Research. This is a revised version of a paper prepared for the National Bureau of Economic Research conference on means- tested transfer programs held in Cambridge, Massachusetts, on December 4– 5, 2014. The authors would like to acknowledge participants at that conference and Dan Black and Robert Moffitt in particular for useful comments. We are also grateful to David Balducchi, Sandy Baum, David Greenberg, Anita Harvey, Carolyn Heinrich, Sheena McConnell, Peter Mueser, Austin Nichols, Ernie Stromsdorfer, and two anonymous reviewers for comments; Sandy Baum and Anita Harvey for helping us to find data; and Colenn Berracasa for outstanding research assistance. Any remaining errors are the authors’ responsibility. Disclaimer on potential conflicts of interest: Both authors have undertaken contract research sponsored by the US Department of Labor as well as serving on numerous technical advisory panels and providing comments on draft reports for various evaluations. In particular, in regard to evaluations considered in detail in this chapter, both authors were part of the NORC subcontract for the nonexperimental component of the National JTPA study, both authors reviewed drafts and provided technical advice on the recent TAA evaluation, and both are members of the technical advisory panel for the WIA experiment. It should, but does not, go without saying that the views expressed represent those of the authors alone. For acknowledgments, sources of research support, and disclosure of the authors’ material financial relationships, if any, please see http://www.nber.org/chapters/c13490.ack. 1. For a discussion of programs during the Great Depression, see Kesselman (1978). Developments from 1961 through 2000 are discussed in more detail in LaLonde (2003).

127

128

Burt S. Barnow and Jeffrey Smith

Although people often think of employment and training programs as synonymous with vocational classroom training, workforce programs actually use a variety of approaches. Indeed, in an analysis of data from the Workforce Investment Act (WIA) from program years 2002 to 2005 (July 1, 2002 through June 30, 2005), Trutko and Barnow (2007) found that less than half (46.6 percent) of adults who exited the program received classroom training, with individual states ranging from 14 percent to 96 percent of participants receiving classroom training. Both Butler and Hobbie (1976) and Perry et al. (1975) classify employment and training programs into four broad categories: (a) skill development programs, which increase vocational skills through classroom or onthe- job training; (b) job- development programs, which consist of public employment programs where jobs are specifically created for the participants; (c) employability development programs, which, according to Butler and Hobbie (1976) emphasize personal attitudes and attributes needed for employment (i.e., what we would now call “soft skills”); and (d) work experience programs, which provide employment experiences intended to help workers gain the same attitudes and attributes as employability development programs through paid or unpaid work. This classification system covers programs intended to increase human capital and create new jobs, although individual programs may have other goals as well. What is missing are categories that include programs intended to provide a better match between workers and jobs (sometimes called labor exchange programs) and programs that provide job seekers with more information about themselves (through counseling and assessment) and the jobs that are available (labor market information [LMI]).2 This chapter focuses on means- tested employment and training programs in the United States. As such, we generally exclude programs that do not do means testing. This set includes vocational education programs (though we do briefly discuss Pell grants, which many students use to attend vocational education, later in the chapter).3 We also exclude (a) the Unemployment Insurance (UI) program; (b) the Employment Service (ES) funded by the 2. LaLonde (2003) includes labor exchange programs as well as counseling and assessment and LMI in the job- development category. Barnow and Nightingale (2007) take a broad view of policies that affect the labor force, and they include several other program categories: insurance and cash payments (e.g., unemployment insurance, disability insurance, and workers’ compensation); regulations and mandates (e.g., minimum wage, living wage, occupational safety and health, and discrimination statutes and executive orders); tax incentives and credits (e.g., lifelong earning credit, earned income tax credit, and economic development programs such as empowerment zones and enterprise zones that offer place- based tax credits); and social and support services and payments (e.g., need- based stipends, transportation assistance, and subsidized or paid child care). Alternatively, labor market programs are sometimes divided into active programs, which impose requirements on those who benefit such as job search requirements, and passive programs, which provide cash or in-kind assistance with no requirements for the recipients. 3. Vocational education can be classified as education rather than training, but many courses offered by community colleges as vocational education also enroll individuals in employment and training programs in the same course.

Employment and Training Programs

129

Wagner-Peyser Act; (c) the Worker Profiling and Reemployment Services (WPRS) and Reemployment and Eligibility Assessment (REA) programs for UI claimants; (d) registered apprenticeship programs; and (e) vocational rehabilitation and the Ticket to Work program, both of which provide services aimed at returning people with disabilities to productive employment. To avoid duplication, we do not cover programs surveyed in other chapters, such as the earned income tax credit (chapter 2 of volume 1) and the welfare- to-work programs associated with Temporary Assistance for Needy Families or TANF (chapter 4 of volume 1). In addition, we exclude (a) programs that operate through mandates to employers, such as the minimum wage and civil rights legislation; (b) place- based programs, such as economic development programs, empowerment zones, and enterprise zones because they do not directly serve individual workers; and (c) programs for in-school youth. Finally, among the programs that remain (there are a lot of programs!) we devote the bulk of our attention to large (in persons served or budget or both) programs operated by the federal government for which credible impact evaluations exist. In that sense, we look where the light is, but there are a lot of keys there too. The remainder of the chapter is organized as follows: We begin in the next section by briefly laying out the case for government involvement in the area of employment and training programs. Existing theory provides a case for the standard interventions, though the case would benefit from a stronger empirical foundation. Following that, we provide a history of US workforce development programs and then describe the current array of means- tested programs. Some, but by no means all, of the wide diversity of existing programs could be justified based on the need to target specific interventions to specific populations. We then describe the key issues involved in evaluating employment and training programs, and how the US literature has addressed them, as a prelude to our discussion of results from recent evaluations of major US employment and training programs. The US literature largely relies on occasional social experiments and more frequent analyses that attempt to solve the problem of nonrandom selection into programs or into particular services via conditioning on observed participant characteristics, particularly past labor market outcomes. The existing evidence makes it clear that some programs (in particular, the adult funding stream of the Workforce Investment Act program) have positive impacts on labor market outcomes in excess of their costs, while many others do not. Explaining the differences in impacts among programs (and between funding streams within programs) remains an important topic for future research. In addition, we argue that the literature should shift its focus somewhat from research that estimates the impacts of program participation to research on how to better operate existing programs. The final section summarizes and offers some suggestions for future work and institutional change. We emphasize the potential for generating addi-

130

Burt S. Barnow and Jeffrey Smith

tional policy- relevant knowledge via (a) improvements in the quality of administrative data as well as more intensive use of the administrative data already available and (b) “designing in” credible evaluation designs such as regression discontinuity following the (very) successful example provided by the education policy field. 3.2

Justifications for Government Involvement

This section considers what, if any, substantive economic justification underlies the types of means- tested employment and training programs currently operated by the US government and considered in this chapter. Employment and training programs, sometimes referred to (by us and by others) as workforce development programs, clearly do not meet the usual definition of public goods as they are both excludable and rivalrous, so other explanations must be invoked. One straightforward view sees employment and training programs as what Richard Musgrave (1959) termed “merit goods,” a good that, although not meeting the criteria of a public good, is so highly valued by society that it is provided publicly. Education is the most common example of a merit good, and as noted above, the line between occupational training and education is fuzzy. Musgrave and Musgrave (1989, 81) expand on the concept of merit goods, noting that merit goods might be provided either as the imposition of the preferences of the ruling elite on the poor or as a means of correcting market imperfections such as imperfect information, externalities, and interpersonal utility preferences.4 Another potential rationale for government intervention in the provision of employment and training services is market failure due to imperfect access to capital, especially among the poor. In some situations, training programs might be offered to achieve equity when some segment of the population is harmed by unforeseen market events or by specific government interventions. Programs targeted on workers who lose their jobs due to technical change, changes in consumer preferences, or changes in trade patterns exemplify the general case, while programs targeted on workers who lose their jobs due to trade agreements exemplify the second category. Finally, imperfect information on the current and future labor market, particularly on the wages associated with occupations requiring training, could lead workers to systematically underinvest in training or to invest in the wrong types of training. Government may have a comparative advantage in collecting both labor market information and information about training providers, as it can amortize the fixed costs of doing so over many individu4. In the latter cases, the provision of such goods does correspond to the traditional definition of a market failure. For a discussion of the rationale for in-kind redistribution to increase social welfare, see Garfinkel (1973).

Employment and Training Programs

131

als. These arguments rationalize both the collection of the information and its distribution via caseworkers and websites. Some of the rationales for government intervention in the employment and training field call for means testing, but others do not. If, for example, it is the poor who mostly experience challenges with access to capital and information, then it makes sense to have means testing for such programs. However, to the extent the public (or ruling elite in Musgrave’s terminology) views employment and training programs as a merit good that should be provided to all, then means testing would not be required. The existence of a rationale for government involvement in workforce development does not imply that most (or even very much) training should be financed by the government or that the government should directly provide some (or even any) training or other services. In regard to the first point, Mikelson and Nightingale (2004) conclude that private- sector spending on training may be ten times as great as public- sector contributions, and other researchers have developed larger estimates. In regard to the second point, we detail below how the major US federal programs contract out to other providers (some of them other units of government, such as community colleges) much of their service provision. 3.3

History of US Employment and Training Programs

Workforce programs in the United States began in the 1930s with several efforts to deal with the high unemployment associated with the Great Depression. During the Great Depression, eight major work relief and public works programs were initiated.5 Under President Hoover, the Reconstruction Finance Administration provided loans to state and local governments for welfare and public employment. Although this function was in effect for less than one year, $300 million in nominal dollars was spent on work relief, and at its peak nearly two million people were employed through the program. When Franklin Roosevelt assumed the presidency, a number of public- employment programs were enacted. All these programs ended by 1943, when the unemployment rate dropped to 1.9 percent and there was no need for large- scale, employment- creation programs. Public- employment programs largely vanished until the 1970s. After discussing the Wagner-Peyser Act, which began during the Great Depression, this section focuses on the flagship US Department of Labor programs beginning in the 1960s. We then describe other US Department of Labor programs as well as significant programs operated by other federal agencies.6 5. The discussion of programs during the Great Depression is based on Kesselman (1978). 6. For reviews of the employment and training programs between 1961 and 1973, see Clague and Kramer (1976), Perry et al. (1975), Barnow (1993), King (1999), and O’Leary, Straits, and Wandner (2004).

132

Burt S. Barnow and Jeffrey Smith

3.3.1

The Wagner-Peyser Act

The Wagner-Peyser Act of 1933 established the Employment Service (ES, but sometimes called the Job Service in some states), the longest continuously operating workforce program in the United States.7 The ES is open to all job seekers and employers, so it is not a means- tested program. The ES focuses on providing a variety of employment- related labor exchange services including, but not limited to, job search assistance, job referral, placement assistance for job seekers, reemployment services to unemployment insurance claimants, and recruitment services to employers with job openings. Services are delivered in one of three modes: self- service, facilitated self- help, and staff assisted. Depending on the budget available and needs of the labor market, other services such as assessment of skill levels, abilities, and aptitudes; career guidance; job- search workshops; and referral to training may be available.8 The services offered to employers, in addition to referral of job seekers to available job openings, include assistance in development of job order requirements; matching job seeker experience, skills, and other attributes with job requirements; assisting employers with special recruitment needs; arranging for job fairs; helping employers analyze hardto-fill job orders; assisting with job restructuring; and helping employers deal with layoffs.9 With enactment of the Workforce Investment Act in 1998, the ES was named as a mandated partner in the One-Stop delivery system. 3.3.2

The Area Redevelopment Act

No major federal employment and training programs emerged following the Great Depression until the 1960s.10 In 1961, the Area Redevelopment Act (ARA) was passed to stimulate growth in areas with high unemployment by providing loans, financial assistance, and technical assistance to firms developing new products, and training for workers who would be employed by firms that expanded or relocated. The ARA was never a large program, and training enrollments ranged from 8,600 in 1962 to a high of about 12,000 before the program ended in 1965. 3.3.3

The Manpower Development and Training Act

The Manpower Development and Training Act of 1962 (MDTA) was the first federal program to provide training on a larger scale. The original intent 7. See Eberts and Holzer (2004) for a review of the ES history. 8. In many states, the ES administers other programs and services at the direction of the governor. For example, in many states the ES administers the Trade Adjustment Assistance (TAA) program and the Supplemental Nutrition Assistance Program (SNAP), formerly known as food stamps. 9. The description of the Employment Service is from the US Department of Labor (2010a). http://www.doleta.gov/programs/wagner_peyser.cfm, accessed on November 1, 2014. 10. This discussion is based largely on Barnow (1993).

Employment and Training Programs

133

of MDTA was to retrain workers who lost their jobs due to “automation,” the term used at that time for technical change. From the beginning, the program also served disadvantaged workers, and services to the economically disadvantaged soon predominated, as job losses due to automation failed to materialize. A total of approximately 1.9 million workers enrolled in MDTA between 1963 and 1972, with about two- thirds of the participants enrolled in classroom training and one- third enrolled in on- the- job training ([OJT], which is informal training by employers who receive subsidies of up to 50 percent of wages for up to six months).11 Administration of MDTA was complex. The original legislation called for states to eventually pay for half the program, but these requirements were postponed and diluted, and eventually states were only required to make a 10 percent match that could be an “in- kind” contribution. Administration of the OJT component of the program was eventually shifted from the US Department of Labor to the Job Opportunities in the Business Sector (JOBS) program, which was operated by the National Alliance of Business, a nonprofit business trade association. The institutional classroom training was largely administered by the US Department of Labor, with relatively minor roles played by state and local governments. Although MDTA was by far the largest employment and training program in the 1960s and early 1970s, there were many other workforce development programs in operation. Table 3.1 describes the major programs that operated during this period. Barnow (1993) describes ten programs that operated during this period, and Perry et al. (1975) provide detailed information about most of the programs, as well as evidence on their effectiveness. Franklin and Ripley (1984, 7) note the consequences of having so many programs available with similar intent: “The need for coordination among manpower programs and agencies serving the poor became increasingly apparent during the 1960s. A fearsome degree of fragmentation and rococo complexity resulted from the large number of separate programs, each with its own target groups, application procedures, funding cycles, and delivery mechanisms.”12 The only US Department of Labor program providing training still in operation from the 1960s is the Job Corps, a primarily residential program for disadvantaged youth that we describe in more detail below.

11. See Mangum (1968). 12. It is not clear what the optimal number of employment and training programs is, and there is still debate about whether there are “too many” programs. We discuss this issue later in the chapter, but note here that to a large extent there are many programs because they serve different populations or provide a different mix of services. Interesting research opportunities exist on the issue of how many programs there should be and how they should be differentiated in mission and targeting.

1964– 1974

1965– 1974

1965– 1974

1967– 1974

1968– 1974

1967– 1974

1968– 1974

1972– 1974

Vocational education (Smith Hughes Act of 1917)

Neighborhood Youth Corps in school, summer, out of school (Economic Opportunity Act of 1964)

Job Corps (EOA)

Operation Mainstream (EOA)

Job Opportunities in the Business Sector (Presidential initiative)

Work Incentive Program (Social Security Act)

Concentrated Employment Program (EOA)

Public Employment Program (Emergency Employment Act)

Source: Barnow (1993).

1963– 1974

Manpower Development and Training Act

Dates of operation (FY 1963– 1974)

Subsidized public employment

Coordinates employment and training services of other programs

Vocational training, work experience, support services, placement

Subsidized on- the- job training in the private sector

Subsidized employment in paraprofessional positions in public and nonprofit agencies

Vocational skills training and basic skills in a residential setting

Subsidized work experience in public and nonprofit agencies

Occupational training in public schools

Vocational training in a classroom setting and subsidized training by employers on the job

General purpose

Major employment and training programs (1963–1973)

Name and authorizing legislation

Table 3.1

National: DOL Local: Chief elected officials

National: DOL Local: CAAs, local government

National: DOL Local: welfare offices, ES, and WIN offices

National: DOL and National Alliance of Business (NAB) Local: NAB offices

National: OEO, DOL after 1967 Local: CAAs and public agencies

National: Office of Economic Opportunity (OEO); DOL after 1969

National: DOL Local: Community action agencies (CAAs), local governments, schools, ES

National: HEW Local: School districts

National: Department of Labor (DOL) and Department of Health, Education, and Welfare (HEW) Local: Employment Service (ES), school districts, skills centers

Administrative agencies

Unemployed adults

Disadvantaged youth and adults

Aid to Families with Dependent Children (AFDC) recipients

Disadvantaged adults

Disadvantaged adults

Disadvantaged youth

Disadvantaged youth

General, not seeking academic degree population

Economically disadvantaged and dislocated workers

Target groups

234,300

92,900

124,700

49,300

20,000

42,100

In school 129,400 Out of school 84,300 Summer 362,500

6,674,0000

Economically disadvantaged 126,200 Dislocated workers 83,700

Average annual enrollment

Employment and Training Programs

3.3.4

135

The Comprehensive Employment and Training Act of 1973

In 1973, MDTA was replaced by the Comprehensive Employment and Training Act (CETA). The new program included a change in the mix of activities offered and in the responsibilities of different levels of government. President Nixon was a strong advocate for the “New Federalism,” which sought to give state and local governments more control over who was served and how they were served. The CETA program represented a major departure from MDTA in several ways. First, decisions about who would be served and how they would be served were primarily made at the local level rather than at the federal or state level; in fact, CETA was the high point for local authority compared to the MDTA program that preceded it and the Job Training Partnership Act (JTPA), Workforce Investment Act (WIA), and Workforce Innovation and Opportunity Act (WIOA) programs that succeeded it. Under CETA Title I, prior to the 1978 amendments, funds were distributed by formula to cities, counties, or consortia of local governments. Any jurisdiction with a population of 100,000 or more was entitled to be recognized as a “prime sponsor.” Areas that were ineligible for designation on their own and failed to join a consortium were included in a “balance of state” prime sponsor that was administered by the state. Prime sponsors were required to submit an annual plan to the US Department of Labor for approval, and they were also required to establish a planning council with representatives of various constituencies, including the private sector. Prime sponsors had significant latitude in determining their mix of activities and participants under Title I; activities available included classroom and onthe- job training, public service employment, and work experience. A concern under MDTA that persists to the present day is that states and local programs would engage in “creaming” or “cream skimming” by selecting as participants those among the eligible applicants most likely to do well after participation whether or not the program helps them (Perry et al. 1975, 151; Mangum 1968, 169; Mirengoff and Rindler 1978, 176). Several features of CETA were designed to mitigate this issue. First, categorical programs were established for groups with severe barriers—Indians and Native Americans, and migrant and seasonal farmwokers—and these categorical programs remain part of the program mix today. Additionally, prime sponsors were required to make assurances in their annual plans that they would serve those “most in need,” including “low- income persons of limited English- speaking ability.” The original CETA statute included a public service employment program in Title II, and 1974 legislation added a countercyclical public service employment program as Title VI. Over time, the public service employment components grew to be the largest part of CETA. In 1977, the Youth Employment and Demonstration Projects Act

136

Burt S. Barnow and Jeffrey Smith

(YEDPA) created two new categorical youth programs for the prime sponsors to administer, the Youth Employment and Training Program (YETP) and the Youth Community Conservation and Improvement Projects (YCCIP); YETP provided training and work experience, primarily for in-school disadvantaged youth, and YCCIP provided training and work experience primarily for out- of-school disadvantaged youth. The legislation also established a large demonstration program, the Youth Incentive Entitlement Program (YIEPP), to test the feasibility and impact of guaranteeing part- time school- year and full- time summer jobs to disadvantaged youth to encourage them to remain in school. A year later, the Young Adult Conservation Corps (YACC) was added to provide participating youth a conservation experience. Although over $1 billion was spent on the YEDPA programs, only a few of the programs were rigorously evaluated. A National Academy of Sciences review of the evidence on the YEDPA programs concluded that “despite the magnitude of resources devoted to the objectives of research and demonstration, there is little reliable information on the effectiveness of the programs in solving youth employment problems” (Betsey, Hollister, and Papageorgiou 1985, 22).13 Several other national programs were added to CETA during this period, including the Skill Training Improvement Program (STIP), which was one of the first US initiatives to offer long- term training to dislocated workers through competitively funded projects; Help through Industry Retraining and Employment (HIRE), which provided training to veterans though the National Alliance of Business initially and later through prime sponsors; and the Private Sector Initiative Program (PSIP), which provided training in conjunction with the newly established private industry councils affiliated with the CETA prime sponsors. Programs HIRE and PSIP were early efforts to try to more effectively involve the private sector in federally sponsored training programs, an effort whose goals have yet to be fully achieved. By 1976, concern had increased that the public service employment (PSE) slots were allowing local governments to substitute federal funds for state and local funds to support positions, a phenomenon known as fiscal substitution.14 As a result, several modifications were made to PSE Title VI requirements. The PSE positions that became vacant could only be used in special projects that lasted for twelve or fewer months. In addition, indi13. The director of the YEDPA program was more optimistic about what was learned from the experience (see Taggart 1981). 14. Butler and Hobbie (1976), in a report prepared by the Congressional Budget Office, summarized the research by Johnson and Tomola (1976), which concluded that fiscal substitution reached 100 percent within eighteen months of a PSE slot being funded. A reanalysis of the data by Borus and Hamermesh (1978) found that the amount of substitution was very sensitive to the assumptions of the statistical model used. A qualitative field analysis conducted at roughly the same time concluded that much of what appeared to be substitution was instead maintenance, where PSE workers filled slots that would have been abolished in the absence of the PSE funding (see Nathan et al. 1981).

Employment and Training Programs

137

viduals hired for new Title VI positions and half the Title VI positions that became vacant were required to be individuals unemployed for at least fifteen weeks and a member of a low- income family. Amendments to CETA in 1978 were enacted to address concerns that the program was not creating jobs, but instead substituting federal funds for state and local funds. Among the changes that were instituted, PSE wages in most places were capped at $10,000 annually, but in high- wage areas salaries could be up to $12,000; average national wages were capped at $7,200, lowering the national average by $600;15 new PSE participants could not have their wages supplemented by the prime sponsor; and prime sponsors were required to establish independent monitoring units to investigate violations of laws and regulations. Cook, Adams, and Rawlins (1985, 13) refer to the 1978 amendments as “the beginning of the end for PSE.” All the restrictions on qualifications, salaries, and project characteristics made PSE unattractive to prime sponsors, so that when the Reagan administration proposed barring PSE in the new Job Training Partnership Act, there was little objection. 3.3.5

The Job Training Partnership Act of 1982

The Comprehensive Employment and Training Act was due to expire in 1982, and the replacement program, the Job Training Partnership Act (JTPA) was a bipartisan effort sponsored by Senators Edward Kennedy and Dan Quayle. The new law reflected President Reagan’s view of federalism, which included a larger role for state government and smaller roles for the federal government and local government. Public service employment, which had become increasingly unpopular with the public and less desirable for local governments as restrictions on participants and activities were added, was prohibited under JTPA. Some key features of JTPA included: programs for economically disadvantaged youth and adults continued to be locally administered; states assumed a much greater role in monitoring the performance of local programs; the private sector was given the opportunity to play a major role in guiding and/or operating the local programs; and the system was to be “performance driven,” with local programs rewarded or sanctioned based on their performance. As we describe below, both the role of the private sector and the performance measurement system remain important but unsettled issues. The JTPA included three categorical funding streams that were distributed by formula to the states and then to local areas.16 The Title II-A pro15. All these figures are in nominal dollars. 16. Allocations for the adult and youth program to states and substate areas were based equally on the number unemployed in areas of substantial unemployment (local areas with at least a 6.5 percent unemployment rate), the number unemployed in excess of 4.5 percent of the labor force, and the number of economically disadvantaged adults; allocations for the dislocated worker program distributed by formula were based equally on the number unemployed, the number unemployed in excess of 4.5 percent, and the number unemployed for fifteen weeks or longer (see Johnston 1987).

138

Burt S. Barnow and Jeffrey Smith

gram provided funding for economically disadvantaged adults and youth, the Title II-B program was for summer youth employment and training, and the Title III program served dislocated workers.17 National programs for Indians and Native Americans and migrant and seasonal farmworkers were authorized by Title IV of the legislation. The Title II programs were conducted through local service delivery areas (SDAs), which were similar in nature to the prime sponsors under CETA. The minimum population size for automatic designation as an SDA was increased from 100,000 under CETA to 200,000 in an effort to reduce the number of local programs from the over 450 prime sponsors under CETA; the failure to have a provision for balance of state units actually led to an increase in the number of local programs to over 600. Major activities provided under Title II-A were occupational and basic skills training, OJT, job- search assistance, and work experience.18 The JTPA focused on the poor. All out- of-school youth and at least 90 percent of those served in the adult program had to meet income- based eligibility requirements. There were no income- related eligibility requirements for those served in the dislocated worker program.19 As noted above, JTPA attempted to increase the role of the private sector in guiding employment and training programs. In 1978, CETA was amended to authorize the creation of private industry councils (PICs), but the PICs gained much more authority under JTPA, where the PICs served as boards of directors for the local programs and could operate the programs if they voted to do so. The PIC members were appointed by the chief local official(s) in the SDA, and a majority of the members were required to be from the private sector. The Title III program for dislocated workers was originally a state- level program, and states were required to match federal funding on a dollar- fordollar basis. Congressional concern about services to dislocated workers was high, and JTPA was modified in major ways in 1988 with the Economic Dislocation and Worker Adjustment Assistance Act (EDWAAA).20 This legislation required governors to distribute at least 60 percent of the Title II funds to substate areas. Major amendments to JTPA were enacted in August 1992.21 The amendments made the program more prescriptive in 17. The 1992 JTPA amendments established a separate program, Title II-C, for services to youth. 18. Reviews of the JTPA literature are found in Johnston (1987) and Levitan and Gallo (1988). 19. Devine and Heckman (1996) analyze the eligibility requirements for JTPA from equity and efficiency perspectives. 20. In addition to modifying JTPA, Congress also passed the Worker Adjustment and Retraining Notification (WARN) Act in 1988, which required employers under certain circumstances to provide workers with notice sixty days in advance of plant closings and major layoffs. 21. For a summary of the amendments, see Barnow (1993), and for a thorough discussion of the amendments and their impacts, see Trutko and Barnow (1997).

Employment and Training Programs

139

terms of who could be served and what activities could be undertaken. For example, the amendments required that at least 65 percent of the Title II-A participants possess at least one characteristic that classified them as “hard to serve.”22 3.3.6

The Workforce Investment Act of 1998

The Workforce Investment Act (WIA) was enacted August 7, 1998, to replace JTPA.23 States had the option of being “early implementers,” but most states began implementing the new law July 1, 2000. The WIA program maintained the formulas used to distribute funds to states and substate areas. The WIA is based on seven guiding principles: 24 1. Streamlined services: Integrating multiple employment and training programs at the “street level” through the One-Stop delivery system to simplify and expand access to services for job seekers and employers. 2. Individual empowerment: Empowering individuals to obtain the services and skills they need to enhance their employment opportunities through Individual Training Accounts (ITAs), voucher- like instruments that enable eligible participants to choose the qualified training program they prefer. Vendors were to meet performance criteria established by the states, and vendors that met the criteria were to be included in an eligible training- provider list. 3. Universal access: Granting access to all job seekers and others interested in learning about the labor market through the One-Stop delivery system. The concept was that anyone interested in what were termed core employment- related services could obtain job- search assistance as well as labor market information about job vacancies, the skills needed for occupations in demand, wages paid, and other relevant employment trends in the local, regional, and national economy.25 4. Increased accountability: Holding states, localities, and training providers accountable for their performance. The WIA was intended to improve 22. These characteristics were basic skills deficient, high school dropout, welfare recipient, disabled, homeless, or an offender. Youth could also meet the requirement if they were pregnant or a parent, or below the appropriate grade level for their age. 23. The Workforce Investment Act reauthorized several programs in addition to the workforce program commonly referred to as WIA. Title I of WIA establishes the workforce program usually referred to as WIA; Title II authorizes the federal adult education and literacy program; Title III amends the Wagner-Peyser Act to better integrate Wagner-Peyser labor- exchange activities by requiring that Employment Service activities be integrated with the One-Stop Career Center system; Title IV amends the Rehabilitation Act of 1973, which authorizes the state vocational rehabilitation program for individuals with disabilities; and Title V includes general provisions dealing with matters such as state unified plans and state incentive grants (see Bradley 2013). 24. This is based on Barnow and King (2005), and the principles are described in the WIA White Paper available at www.doleta.gov/usworkforce/documents/misc/wpaper3.cfm. 25. Access to staff- assisted services varied among local service delivery areas, depending on state and local policies and funding availability.

140

Burt S. Barnow and Jeffrey Smith

the performance measurement system established under JTPA by holding states accountable for their performance and building continuous improvement into the system. 5. A strengthened role for local Workforce Investment Boards (WIBs) and the private sector: The framers of WIA envisioned that the local WIBs would have a stronger role in administering the system than the PICs under JTPA and that employers would participate more in administering the system than they had under JTPA.26 6. Enhanced state and local flexibility: Giving states and localities the flexibility to build on existing reforms to implement innovative and comprehensive workforce investment systems was a priority under WIA. Through such mechanisms as unified planning and waivers, states and their local partners were provided flexibility to tailor delivery systems to meet the particular needs of individual communities. 7. Improved youth programs: Linking youth programs more closely to local labor market needs and the community as a whole, and providing a strong connection between academic and occupational learning were envisioned under WIA. Many of the guiding principles do in fact reflect meaningful changes in the delivery system for workforce investment services in the nation’s primary employment and training program. The utilization of a one- stop system began several years prior to enactment of WIA on a voluntary basis in local areas, but the 1998 statute required many workforce development programs to colocate and coordinate services in One-Stop Career Centers, which the US Department of Labor has recently rebranded as American Job Centers (AJCs).27 The One-Stop centers were intended to provide the “core” and “intensive” services mandated by WIA (described below); to provide access to workforce development programs and services offered by OneStop partners; and to provide access to the labor market information, job search, placement, recruitment, and labor- exchange services offered by the Employment Service (Bradley 2013). The One-Stops were required to include over a dozen programs that provide services to job seekers: WIA adult, youth, and dislocated worker programs; federal Department of Labor programs authorized under WIA including Job Corps, the Native American program, and the Migrant and 26. Although the Department of Labor envisioned a stronger role for the private sector under WIA, there is scant evidence of this occurring. There were no major changes in the WIA statute that would have mandated a stronger role for employers, and neither of the two studies of WIA implementation, D’Amico et al. (2004) and Barnow and King (2005) found growth in the role of the private sector under WIA. 27. Ironically, when local programs first formed One-Stop centers on their own, state and federal officials sometimes expressed concern that employees of one organization might provide services to customers who were supposed to be served by a different program, and the practice was often discouraged.

Employment and Training Programs

141

Seasonal Farmworker program; Employment Service programs authorized by the Wagner-Peyser Act; adult education and literacy programs; vocational rehabilitation; welfare- to-work programs; the Senior Community Service Employment Program; postsecondary vocational education; Trade Adjustment Assistance (TAA); programs administered by the Veterans’ Employment and Training Service; community services block grants; employment and training activities operated by the US Department of Housing and Urban Development; unemployment insurance; and registered apprenticeship programs.28 Depending on state and local policies, other relevant programs may be present at the One-Stops. Optional partners noted by the Department of Labor include Temporary Assistance for Needy Families (TANF), employment and training programs operated in conjunction with the food stamps program (now the Supplemental Nutrition Assistance Program [SNAP]), Department of Transportation employment and training programs, and programs operated under the National and Community Service Act of 1990.29 Although the Department of Labor’s White Paper called for “streamlined services,” this goal was hindered by another feature of WIA, namely the requirement that the program offer services in sequence from core to intensive to training. D’Amico and Salzman (2004, 102) note that “JTPA was faulted for authorizing expensive training services as a first, rather than as a last, resort.” As a result, WIA established three levels of service that customers were required to access sequentially:30 (a) core services, including outreach, job- search and placement assistance, and labor market information available to all job seekers; (b) intensive services, including more comprehensive assessments, development of individual employment plans and counseling, and career planning; and (c) training services, including both occupational training and training in basic skills. Participants who reach the third step use an “individual training account” (ITA) to select an appropriate training program from a qualified training provider.31 28. US Department of Labor (2010b). Retrieved from http://www.doleta.gov/usworkforce /onestop/partners.cfm on November 15, 2014; also available from http://www.doleta.gov /programs/factsht/pdf/onestoppartners.pdf retrieved November 15, 2014. The programs are described slightly differently at the two sites. Note that the welfare- to-work programs, which referred to special programs for TANF recipients that were operated through local WIA programs, are no longer in operation. 29. Ibid. 30. Several reviewers questioned whether WIA specifically required sequencing of services or if the sequencing was imposed by the Department of Labor. Section 134 of the statute reserves intensive services for those unable to obtain or retain employment after receipt of core services, and training is reserved for individuals who are unable to obtain or retain employment after receipt of intensive services. As the Department of Labor noted when it later emphasized that WIA was not a “work first” program, there are no minimum time periods that a person must receive core or intensive services before receiving training. 31. US Department of Labor (2014m). (http://www.doleta.gov/programs/general_info.cfm; retrieved November 15, 2014.)

142

Burt S. Barnow and Jeffrey Smith

Individual empowerment was an important feature of WIA, implemented largely through the ITAs.32 Although vouchers were used by some local areas under JTPA, ITAs were the default approach to training under WIA.33 Local areas had a great deal of flexibility in administering the ITAs, and some local areas tended to give customers wide latitude in using their ITA, while others restricted customers in terms of cost, past performance of the vendor, and qualifications and aptitude of the customer for the course.34 Universal access was envisioned as an important feature of WIA to avoid stigmatizing the program due to its having poor and low- skilled customers. With the colocation of the Employment Service in most One-Stop centers, it was anticipated that all adult job seekers, not just the poor or unemployment insurance claimants who were required to search for work, would use the One-Stops to obtain labor market information and search for work. Access to intensive services and training was restricted, however, to public assistance recipients and other low- income individuals when the local workforce area had insufficient funds to serve all potential customers who might benefit from training.35 The goal of increased accountability was addressed in two ways—changes were made to the performance measurement system used under JTPA, and states and local areas were asked to establish an eligible training provider (ETP) list of vendors with strong performance. Only vendors on the list could accept ITAs from WIA participants. The performance measures varied over the existence of JTPA, with a trend toward longer postprogram follow-up for earnings measures.36 Changes in the performance system between JTPA and WIA include the following:37 Under JTPA, only local areas were subject to performance measures, but under WIA, the federal government sets standards for states, and the states establish standards for local areas; under JTPA, after the initial few years, local standards were adjusted by a regression model intended to hold areas harmless for differences in customer characteristics and economic condi32. Although WIA required that ITAs be available to training customers in most circumstances, exceptions included when on- the- job training and customized training are provided, when the local board determines that there are too few providers available to meet the intent of vouchers, and when the local board determines that there is a local program of demonstrated effectiveness for meeting the needs of special low- income participant populations that face multiple barriers to employment (Patel and Savner 2001, 1). 33. For a review of the use of vouchers under JTPA, see Trutko and Barnow (1999). 34. See D’Amico et al. (2004), Barnow (2009), and King and Barnow (2011). We discuss the ITA experiment later in the chapter. 35. Later in the chapter we discuss the characteristics of WIA exiters. In table 3.3 we note that among adult exiters who left the program between April 2012 and March 2013, 60.9 percent received training. 36. See Barnow (2011) for a discussion of the WIA performance- measurement system and a comparison with the JTPA system. In the same volume, Borden (2011) discusses the problems associated with measuring performance. 37. This information is from Blank, Heald, and Fagnoni (2011), King and Barnow (2011), and Barnow (2011).

Employment and Training Programs

143

tions, but under WIA standards were established through negotiations.38 Performance was initially measured under JTPA at the time of program exit and thirteen weeks after exit, and under WIA the employment and earnings measures used the second and third quarters after program exit. The JTPA statute did not specify a source for the data used to measure performance, but WIA specified the use of unemployment insurance wage records. During the WIA period, the Office of Management and Budget (OMB) sought to have all workforce- oriented federally sponsored programs use “common measures” so that programs could be compared, but only the Department of Labor complied. Originally, the WIA Adult program had four measures—entered employment rate, employment retention rate, earnings change, and the employment and credential rate; these measures are defined below when program outcomes are provided. For dislocated workers, an earnings replacement rate was used instead of an earnings change measure. Youth ages nineteen to twenty- one had the same measures as adults, and youth ages fourteen to eighteen had three core measures—the skill attainment rate, the diploma or equivalent attainment rate, and the retention rate (D’Amico et al. 2004). In addition, there were employer and participant customer satisfaction measures. The performance measures were modified somewhat in 2006, as described in Training and Guidance Letter 17– 05 (TEGL 17– 05). The TEGL indicated three common measures to be used for adults in the Adult and Dislocated Worker programs—entered employment rate, employment retention (the proportion of adults employed in the first quarter after exit who were employed in the second and third quarters after exit), and average earnings (total earnings in the second and third quarters after exit for adults employed in each of the first three quarters after exit). The three common youth measures (beginning in 2006) are placement in employment or education, attainment of a degree or certificate, and literacy and numeracy. The states using common measures stopped using customer satisfaction measures beginning in 2006, as they were not included in the common measures. The second effort to increase accountability in WIA was the use of an eligible training provider (ETP) list. With customers having a greater role in selecting their field of training and vendors through the use of ITAs, there was a risk that customers might select vendors based on vendor claims rather than the performance of the programs and the customer’s suitability for the program selected. Governors were thus given the opportunity to establish an ETP list that only included vendors who had a good track record for that program. Evidence indicates that although some states were able to develop satisfactory ETP lists, there were severe challenges in meeting this require38. In the last few years of WIA, the Department of Labor resurrected the idea of using statistical models to adjust performance standards. Results of these efforts are described in Eberts, Bartik, and Huang (2011).

144

Burt S. Barnow and Jeffrey Smith

ment of WIA, and thirty- five states received waivers that permitted them to implement only a portion of the ETP requirements or to delay implementation (Van Horn and Fichtner 2011).39 The WIA included several changes to the eligibility requirements for youth participants, as well as changes in the programs themselves. There was separate funding for a summer youth program and a year- round program under JTPA, while WIA included only a year- round program; as D’Amico et al. (2004, viii– 1) note, the summer program was a major DOL program for thirty- six years, so this was a substantial change. The specific eligibility requirements varied somewhat, but in both JTPA and WIA there was a heavy emphasis on serving poor youth; a study by the US General Accounting Office (GAO 2002, 7) suggested that the eligibility changes may have resulted in the youth served by WIA coming from poorer families than under JTPA, and D’Amico et al. (2004, viii– 1) drew a similar conclusion. The new program also required that at least 30 percent of the funds be spent on out- of-school youth. The GAO (2002, 6) notes that WIA’s intent was for longer- term and more comprehensive services than had been provided under JTPA, and the statute required that ten program elements be made available to youth enrolled in the program.40 There were two large- scale studies of the implementation of WIA in its early years: D’Amico et al. (2004) and Barnow and King (2005). D’Amico et al. (2004) conducted their study in twenty- one states and thirty- eight local areas between 1999 and 2004. Barnow and King (2005) based their analysis on eight states and sixteen local areas visited in 2002. Below we briefly summarize the major findings of the two studies, starting with D’Amico et al. (2004). D’Amico et al. (2004) organize their summary of accomplishments and challenges by a slightly modified list of the seven guiding principles listed above. Regarding the principle of streamlining services through integration, D’Amico et al. (2004, I- 4) conclude “Despite numerous challenges that have been encountered along the way (and sometimes outright resistance), partnership formation represents a highly successful and, in the long term, potentially critically important accomplishment engendered by WIA.” The authors note that the system encountered a number of challenges, and the 39. Van Horn and Fichtner (2011, 155) note that “Education and training establishments and their trade organizations marshaled opposition to performance reporting and undermined or quashed implementation throughout the country.” D’Amico et al. (2004, I- 12) also note the failure of the ETP list to achieve its expected role in the system. 40. The ten required youth services are: (a) tutoring, study- skills training, and instruction leading to completion of secondary school; (b) alternative secondary school services; (c) summer employment linked to academic and occupational learning; (d) paid and unpaid work experience; (e) occupational- skills training; (f ) leadership development; (g) supportive services; (h) adult mentoring during the program and at least twelve months afterward; (i) at least twelve- month follow-up after program completion; and ( j) guidance and counseling (see US General Accounting Office 2002, 7).

Employment and Training Programs

145

greatest challenge appeared to be finding each partner’s share of financing the One-Stop infrastructure. Other challenges they note include differing visions among partners on what service integration means, differences in program goals and customer needs across partners, varying cultures among One-Stop partners, logistical issues in arranging colocation, different management information systems for various programs, and separate performance and reporting requirements among programs. D’Amico et al. (2004) found that states and local areas had made great progress in promoting universal access through the One-Stop system. As evidence, they note that states and local areas established nearly 2,000 One-Stop centers by 2003, and that 40 percent of the local areas had six or more access points to services. The authors observe that promoting universal access creates some important tensions in the system; for example, by broadening services to the entire population, fewer resources are available for the poor, and local areas must decide how to balance provision of lower- tier services with the desire to provide training to those who need more skills.41 The study found that the principle of empowering customers through choice has been “enthusiastically embraced by One-Stop administrators and staff” (D’Amico et al. 2004, I- 11). Specifically, they point to the widespread use of ITAs as evidence of the popularity of giving customers choice. The authors note that many local areas cap the ITAs at levels as low as $1,500 so that resources are spread across many participants. The goal of enhancing state and local flexibility is considered a major success by D’Amico et al. (2004, I- 12). The authors state: our field researchers were struck by the enormous diversity in WIA service designs and delivery structures across the country. Thus, within the broad constraints of the legislation, local areas vary markedly in their governance and administrative structures, the way local boards operate, the procedures for designating One-Stop operators and the responsibilities with which the operator is charged, the ways partners work together to staff various services, how adult and dislocated workers move through the service levels, how priority for target groups is established, whether or not training is emphasized, caps placed on ITA amounts, and so forth. Although states and local areas appreciated the freedom, the researchers felt that the states and local areas would have benefited from technical assistance on promising practices. Although employment and training programs have had performance measurement systems since the 1970s, D’Amico et al. (2004) found the principle of promoting performance accountability was the most challenging of 41. As is shown in tables 3.5 and 3.6, only about one- half of those served by the WIA adult program meet the definition of low income, and fewer than 10 percent of the customers received training.

146

Burt S. Barnow and Jeffrey Smith

the WIA principles to implement. One aspect of the accountability system is the requirement that states and local areas establish an ETP list of training vendors. Problems cited regarding the ETP list include that high standards limit the choice of vendors available to customers, many vendors (including a number of low- cost, high- quality community colleges) dislike the ETP application procedures and may not seek to be listed, and the data used to compile the list is often of questionable reliability. States and local areas also expressed concern about the performance measurement system used for the WIA program. The concerns included that the measures were too numerous and complex, the definitions used for some of the measures (such as credentialing) were vague and potentially unreliable, the system promoted “cream skimming” of the potential customers most likely to look good on the performance measures, the states and local governments spent a significant effort managing their numbers rather than focusing on providing appropriate services, the system was not useful for program management because of the long lag between when customers were served and when the results were measured, and the differing measures across programs hindered partnership development. D’Amico et al. (2004) found that although WIA continued the requirement that business representatives make up a majority of state and local boards and the Department of Labor encouraged states to make increased use of business in shaping their programs, “in practice [local workforce areas] are lagging in their ability to engage businesses seriously in strategic planning or serve them as customers with high- quality services” (I- 17). The final guiding principle for WIA was improving youth programs. D’Amico et al. (2004) found that at the time of their site visits, states and local areas were “lagging badly behind in their implementation of youth programming, partly because of the time delays inherent in needing to appoint a Youth Council [one of the new requirements of WIA] and competitively select service providers” (I- 19). Other challenges for the WIA youth program included the abolition of the summer youth program, the requirement that individual eligibility be documented rather than being able to use presumptive measures such as participation in free and reduced- price school lunch programs, dealing with the statutory requirement that ten program elements be included in youth programs, and connecting WIA programs with the One-Stop system for older youth. Barnow and King (2005) organized their findings around five major topics: (a) leadership, (b) system administration and funding, (c) organization and operation of One-Stop Career Centers, (d) service orientation and mix, and (e) use of market mechanisms. The study states exhibited a range of leadership patterns in setting up, implementing, and operating their workforce development systems. In five of the eight states, the governor’s office played a strong leadership role, but in others the governor gave discretion to local workforce areas. The state

Employment and Training Programs

147

legislature had a leadership role in three states, resulting in bipartisan state workforce legislation. Business’s role was strong at the state level in only a few of the states. At the local level, however, business engagement was found to be strong in half of the states. The WIA’s administrative structure is complex, distinguishing between policy development, program administration, and service delivery more explicitly than earlier workforce legislation. It also requires states to balance state and local responsibilities and make decisions about how to administer WIA in conjunction with other state employment security, economic development, and related programs. The most common approach in the states in the study is that policy was developed by the state and local WIBs, program administration was undertaken by agencies at the state and local level, and service delivery was carried out by vendors. Some study states adopted this separation of responsibilities several years prior to WIA. Some states and local areas found that they did not have sufficient funding to provide training to all they believed would benefit from the service, and they limited training by rationing it and/or by limiting the amount that would be paid for training programs. Barnow and King (2005) found wide variation in how states and local areas interpreted the requirement to operate programs through the OneStop system. They found that challenges arose related to how the mandatory and optional partners related to each other at the centers and regarding how the centers are operated and funded. In some states, key programs such as WIA, the Employment Service, and TANF are highly integrated, but in others TANF, which is an optional partner, has no presence at OneStop centers, and/or the Employment Service has a separate office. Although Unemployment Insurance (UI) is a mandatory partner, the study found its role in the One-Stop Career Centers to be minimal; this is because in the years prior to WIA, UI staff in most states were located in call centers and primarily dealt with clients through telephone and Internet contact. The study found that TANF, Vocational Rehabilitation, and the Veterans Employment and Training Service did not fit well in One-Stop Career Centers because of conflicting goals, cultures, or other differences. There was variation in how the infrastructure of One-Stop centers was financed, and the issue of funding the centers was a source of contention in most of the study sites. Barnow and King (2005) found that service orientation evolved significantly in the early years of WIA implementation in many states. Initially, states and local areas interpreted the statutory language to require a “work- first” or labor market attachment orientation based on early guidance provided by the Department of Labor and the statutory requirement for sequencing of core, intensive, and training services. Later, the Department of Labor made it clear that a work- first orientation was not required and that states could place greater emphasis on training. After that, states

148

Burt S. Barnow and Jeffrey Smith

diverged in their orientation, with some still emphasizing finding work, and others focusing more on human- capital development through training, and still others leaving orientation up to local areas. Market mechanisms were to play a major role under WIA, and Barnow and King (2005) analyzed the ETP list requirement and the use of performance measures to reward and sanction states and local areas based on how well they did in terms of customers’ employment and earnings. They found that three states already had systems in place to monitor training provider performance, and these states had little problems with the ETP list concept. Of the remaining states in the study, three had problems initially with the ETP concept but were able to adapt, and two states found the system to be burdensome for training providers and reported that some vendors refused to participate in WIA because of the ETP requirements.42 A second market mechanism implemented in WIA is the use of ITAs. Barnow and King (2005) reached conclusions similar to D’Amico et al. (2004), finding that the ITAs were popular with customers and accepted by local programs as a useful feature.43 Barnow and King (2005) also reached conclusions similar to D’Amico et al. (2004) on the WIA performance standards system. State and local areas were critical of the elimination of a regression- based adjustment system to level the playing field and replacement of this approach with negotiations between states and the Department of Labor, particularly because many states believed that the Department of Labor representatives often did not negotiate fairly. Barnow and King (2005) also found that a majority of states in their sample engaged in strategic behavior designed to make their measured performance look good. 3.3.7

The Workforce Innovation and Opportunity Act of 2014

Although WIA was originally authorized for five years, fifteen years passed before the two houses of Congress and the administration were able to agree on new legislation. In 2014, working largely behind the scenes, the House and Senate reached agreement on the Workforce Innovation and Opportunity Act (WIOA) as a replacement for WIA, again with broad bipartisan, bicameral support. The bill was introduced in May 2014 with sponsorship by both parties in both houses of Congress, and the bill was signed July 22, 42. This is one of the few areas where Barnow and King (2005) reach different conclusions from D’Amico et al. (2004). We believe that the somewhat more positive conclusions regarding the ETP list for Barnow and King relate to the nature of their sample of states, which included a relatively high proportion of states with something resembling an ETP list in place prior to WIA. As noted earlier, Van Horn and Fichtner (2011) reported that a majority of states now have waivers to some or all the ETP list requirements, indicating that this feature has not been widely implemented. 43. Perez-Johnson, Moore, and Santillano (2011) provide results from an experimental evaluation comparing three models for administering ITAs. We discuss this experiment in detail below.

Employment and Training Programs

149

2014. The WIOA makes some significant changes to the nation’s workforcedevelopment system and is authorized through 2020. Highlights of the new law are described below. Proposed regulations were issued April 16, 2015, but are not discussed in this chapter, as they are subject to revision.44 Most of the new legislation became effective July 1, 2015. The new legislation maintains much of the structure of WIA, with states having a prominent administrative role and services delivered through local workforce areas designated by the states. The WIOA also maintains WIA’s funding streams for adults, dislocated workers, and youth, and requires activities at the state and local levels to be overseen by a board with a majority of the members from the private sector. Funds are distributed to the state and substate levels using formulae similar to those used under WIA. States are required to establish unified strategic planning across core programs defined as the WIOA Adult, Dislocated Worker, and Youth programs; Adult Education and Literacy programs; the Wagner-Peyser Employment Service, and state Vocational Rehabilitation programs. If taken seriously by the states this could be important, but a unified plan could simply consist of separate plans attached to each other. The boards at the state and local levels have streamlined membership requirements, which are expected to reduce their size; boards under JTPA sometimes included fifty members or more. The desire to be inclusive of many interest groups is admirable, but such large bodies may not (and often did not) function well. The boards also have new responsibilities to develop strategies to meet worker and employer needs. The Act adds flexibility at the local level to provide incumbent worker training and transitional jobs as allowable activities and promotes workbased training by increasing the maximum reimbursement rate for on- thejob training from 50 percent to 75 percent; the law also emphasizes training that leads to industry- recognized, postsecondary credentials. These changes are all efforts to make the program more attractive to employers and, it is hoped, increase their participation. The WIOA attempts to strengthen program accountability in several ways. The performance measures for core workforce development programs are aligned, and new performance indicators are added related to services to employers and postsecondary credential attainment.45 Data on training providers’ outcomes must be made available, and programs are to be evaluated by third parties. 44. Interpretation of the WIOA statute is based on the US Department of Labor’s (2014l) WIOA Fact Sheet, accessed at http://www.doleta.gov/wioa/pdf/WIOA-Factsheet.pdf retrieved on November 16, 2014, and National Skills Coalition (2014). 45. The statute is more prescriptive than previous laws. For example, the law requires that median postprogram earnings be used as a performance measure and that statisticaladjustment models be developed to adjust standards for variations in customer characteristics and economic conditions.

150

Burt S. Barnow and Jeffrey Smith

States are required to identify economic regions within the state, and local areas are required to coordinate planning and service delivery on a regional basis. Prior legislation has also mentioned regional coordination. Although perhaps laudable in concept, these efforts are difficult to enforce. Also, these provisions cannot address issues of labor market areas that cross state borders. The statute seeks to provide better services to job seekers in a number of ways. First, WIOA promotes the use of career pathways programs and sectoral partnerships for training programs, two approaches that appear promising.46 Second, the statute allows states to transfer unlimited amounts of their grant between the adult and dislocated worker programs.47 Third, WIOA adds basic skills deficient as a priority category for participants, along with low income, for adult services. Fourth, WIOA requires that 75 percent of youth funds be used for out- of-school youth, a large increase over the 30  percent required under WIA. Fifth, WIOA combines the core and intensive service categories under WIA into a new category called career services, and it abolishes the requirement that customers pass through core and intensive services before receiving training. The WIOA also permits direct contracts with higher- education institutions (rather than placing participants on an individual basis or with ITAs), a practice that was commonly used prior to WIA and was permitted with funds provided under the American Reconstruction and Recovery Act (ARRA). Finally, WIOA changes the partners required to be in the American Job Centers. Under WIOA, the Wagner-Peyser Employment Service is required to be colocated in the AJCs, and the TANF program is made a mandatory partner instead of an optional partner. The WIOA also authorizes the use of performance- based contracting for training providers.48 Although there is currently hope in the workforce development community that WIOA will improve the workforce development system, sometimes promising ideas, like the eligible training provider list, prove more beneficial in theory than in practice. 46. Career pathways are defined in Section 3 of the statute. Training and Employment Notice 39– 11 (TEIN 39– 11) issued by the Employment and Training Administration states that “Career pathways programs offer a clear sequence of education coursework and/or training credentials aligned with employer- validated work readiness standards and competencies. TEIN 39– 11 has links to information about career pathways programs. The approach has been adopted by the US Department of Labor, the US Department of Education, and the US Department of Health and Human Services. Sectoral programs are programs that provide training for an industry sector, presumably with significant input from sector employers. 47. Under WIA, states had to receive permission from DOL to transfer funds among the adult and dislocated worker programs. Although such transfers used to be routinely approved, in recent years DOL was more rigid (see Barnow and Hobbie 2013). 48. The Department of Labor has changed policies on the use of performance- based contracting several times. Although there is appeal to pay for performance, some abuses of performance- based contracting appear to have led to large profits for some vendors, so the policy was tightened (see Spaulding 2001).

Employment and Training Programs

3.3.8

151

Employment and Training Program Expenditures and Enrollments over Time

Table 3.2 shows estimated expenditures on Department of Labor employment and training programs except the Wagner-Peyser Act from 1965 to 2012; figure 3.1 shows the trend in total funding in real 2012 dollars graphically, and figure 3.2 shows the trend in funding for Department of Labor programs as a percentage of gross domestic product (GDP). The data from 1984 on were compiled by the Employment and Training Administration Budget Office and are believed to accurately reflect final budget authority for each year, including supplemental appropriations, recisions, and transfers.49 Data from 1965 through 1983 were obtained primarily from unpublished data from the ETA Budget Office, and the data are believed to be accurate but may not reflect all recisions, supplemental appropriations, and transfers.50 Data on dislocated workers is unavailable prior to 1984, as there was not a separate program for dislocated workers before that, and from 1984 through 1992 JTPA youth and adult funding are not available separately. In addition to the MDTA, JTPA, and WIA programs, the table includes other ETA programs, such as the Senior Community Service Employment Program, the Indian and Native Americans program, the Migrant and Seasonal Farmworker program, and a number of other national activities.51 The table must be read with caution because of the ways that programs were funded at various times. For example, during the Great Recession, the American Recovery and Reconstruction Act (ARRA) added $4.279 billion for the covered programs. The ARRA funds were intended to be spent in a timely manner, and there were restrictions on how long the funds were available for spending. Eberts and Wandner (2013) find that the WIA Adult program spent 72 percent of the ARRA funds available in the first five quarters, and the Dislocated Worker program spent 60 percent. Thus, most of the ARRA funds were actually spent in PY 2009 and PY 2010. Other 49. Budget authority is the amount of money available for spending, but actual expenditures in a year can reflect carryovers of funds from prior years and amounts available but unspent. Transfers reflected in the table refer to transfers among programs at the national level, but they do not reflect transfers of funds within states between the WIA Adult and Dislocated Worker programs. Finally, the Job Corps was removed from the ETA in FY 2008, and although it was later added back to the ETA budget, it was maintained in a separate account. We obtained Job Corps data for FY 2008 and after from OMB budget documents for the Department of Labor. The detailed Employment and Training Administration budget data was obtained from the US Department of Labor (2014a). (http://www.doleta.gov/budget/bahist.cfm; accessed on February 15, 2015.) 50. We are grateful to Anita Harvey of the ETA Budget Office for providing the data, but she is not responsible for the analysis performed. Data on Job Corps was obtained from other budget documents, but we were not able to find data for all individual years. For years where Jobs Corps data are missing, Job Corps budget authority is included in the total column but was not available separately. 51. Programs are described on the Employment and Training Administration’s web site, http://www.doleta.gov/# accessed March 2, 2015.

Table 3.2

Historical budget authority, US Department of Labor employment and training programs (in thousands of nominal dollars) Total employment and training

1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 Transition quarter 1977 1978 1979 1980 1981 1982

Dislocated workers

Adults

529,406 671,095 861,044 398,497 409,992 1,451,215 1,622,997 2,682,066

MDTA era (1962–1972)a x 266,505 x 339,649 x 296,247 x 296,418 x 272,616 x 336,380 x 335,752 x 424,368

1,549,416 2,275,584 3,739,450 5,827,720 597,500 17,200,830 3,652,630 10,510,312 8,387,193 8,100,887 3,300,301

CETA era (1973–1982)b n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

Youth except Job Corps 127,742 263,337 348,833 281,864 320,696 356,589 426,458 517,244 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a

Job Corps

E&T programs as % of GDP

52,523 303,527 209,000 282,300 278,400 169,782 160,187 202,185

0.07 0.09 0.10 0.04 0.04 0.14 0.15 0.22

n/a n/a n/a n/a n/a n/a 280,000 380,000 470,000 465,000 n/a

0.11 0.15 0.23 0.33 0.13 0.85 0.16 0.41 0.30 0.26 0.10

1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999

4,329,876 6,863,525 4,100,662 3,649,194 4,041,913 4,138,911 4,140,485 4,283,975 4,968,253 4,555,331 4,843,266 5,410,010 4,352,602 4,513,678 5,178,903 6,837,464 7,018,662

JTPA era (1983–2000)c n/a n/a 317,250 4,849,862 222,500 2,710,700 95,702 2,419,061 200,000 2,590,000 287,220 2,527,536 283,773 2,497,205 463,603 2,444,585 526,979 2,961,364 576,986 2,435,196 651,246 1,015,021 1,535,056 1,151,000 988,021 1,496,964 1,228,550 996,813 311,460 1,091,900 850,000 751,672 1,286,200 895,000 997,672 1,345,510 955,000 1,000,965 1,403,510 954,000 1,250,965

n/a 1,014,100 617,000 612,480 656,350 716,135 755,317 789,122 867,486 919,533 966,075 1,040,469 1,089,222 1,093,942 1,153,509 1,246,217 1,307,947

0.12 0.17 0.09 0.08 0.08 0.08 0.07 0.07 0.08 0.07 0.07 0.07 0.06 0.06 0.06 0.08 0.07

2000 2001 2002 2003

5,969,155 6,041,678 6,417,023 5,713,068

WIA era (2000–2014)d 1,589,025 950,000 1,433,951 950,000 1,602,110 945,372 1,454,891 894,577

1,357,776 1,399,148 1,454,241 1,509,094

0.06 0.06 0.06 0.05 (continued )

1,250,965 1,377,965 1,353,065 1,038,669

Employment and Training Programs Table 3.2

2004 2005 2006 2007 2008 2009 2010 2011 2012

153

(continued) Total employment and training

Dislocated workers

5,566,051 5,680,372 5,736,193 5,595,655 5,147,987 9,581,432 7,337,268 7,170,341 7,699,612

1,445,939 1,303,918 1,528,549 1,390,434 1,464,707 2,902,391 1,410,880 1,283,303 1,210,536

Adults

Youth except Job Corps

Job Corps

E&T programs as % of GDP

893,195 882,486 840,588 826,105 861,540 1,356,540 860,116 769,576 770,811

995,059 980,801 928,716 964,930 983,021 2,231,569 1,026,569 905,754 904,042

1,535,623 1,544,951 1,573,270 1,566,205 919,506 1,242,938 1,680,626 1,734,150 1,702,946

0.05 0.04 0.04 0.04 0.04 0.04 0.05 0.05 0.05

Sources: DOL Budget Authority from 1948–1989; DOL (2014) Budget Authority Tables; DOL (2015) Budget Authority Tables; 1967, 1968, 1969, 1970, and 1971 appendices of US Government Budgets; 1973 Manpower Report of the President; Betsey et al. (1985). Notes: An “x” indicates not applicable; “n/a” indicates not available. Unless otherwise noted, these are Budget Authority figures. Table excludes funding for Wagner-Peyser Act. a MDTA era: The “Total Employment and Training” budget may seem unusually large in 1972 due to Emergency Employment Assistance, a temporary program. During the MDTA era, “Youth Except Job Corps” is Neighborhood Youth Corps. Budget data in the following categories for the following years are obligations: “Adults” (1965– 1972); “Youth except Job Corps” (1965– 1972); “Job Corps” (1970– 1972). Budget data for “Job Corps” for 1965– 1969 are appropriations. b CETA era: The “Total Employment and Training” budget is large in some years due to the following programs: Community Service Employment for Older Americans (1974– 1981); Temporary Employment Assistance (1975– 1981); YEDPA (1977 total employment and training budget includes appropriations for YEDPA, which were disbursed over four years, 1978– 1981). “Total Employment and Training” budget data for 1977 is a combination of Budget Authority and outlays. Budget data for “Job Corps” for years 1978– 1981 are outlays. c JTPA era: From 1983– 1992: JTPA IIA included both adult and youth activities, so the funds cannot be divided into separate categories; combined adults and youth budget includes JTPA Summer Youth Employment and Training. d WIA era: Budget figures for 2009 may seem unusually large due to the following: all categories in 2009 include appropriations for ARRA, which were disbursed over several years.

examples of appropriations expected to fund programs over several years include funding for public service employment (beyond what was provided for in the regular CETA program) in 1977 and the Youth Employment and Demonstration Projects Act of 1977.52 The table and graphs show that overall funding in nominal terms has increased over the period covered, but the share of GDP devoted to workforce development programs has followed an irregular course. In the 1960s, the programs constituted between 0.04 and 0.10 percent of GDP. Employment and training programs peaked as a share of GDP in the 1970s, due in 52. We were unable to locate original budget documents for the 1977 fiscal year, but we did find several documents that gave total YEDPA spending each year, so we have included YEDPA based on these spending figures.

154

Fig. 3.1

Burt S. Barnow and Jeffrey Smith

Funding for DOL employment and training programs, 1965–2012

Fig. 3.2 Funding as percentage of GDP, DOL employment and training programs, 1965–2012

large part to one- time efforts such as a large- scale public service employment appropriation of $6.8 billion in 1977 and the Youth Employment and Demonstration Projects Act, also in 1977. During the 1970s, employment and training programs were consistently over 0.10 percent, and reached a peak of 0.64 percent in 1977. During the WIA era in the 1980s and 1990s, funding as a share of GDP gradually declined, from about 0.07 percent of GDP in the earlier years to the 0.04– 0.05 range toward the end of the WIA

Employment and Training Programs

155

era; as noted earlier, there was a temporary increase in expenditures due to the ARRA during the Great Recession. Although the focus of this chapter is on US programs, it is instructive to compare US expenditures on publicly funded training with those of other nations. The Organization for Economic Cooperation and Development (OECD) estimated the share of GDP devoted to training in a number of countries. The OECD estimates that the United States spent 0.04 percent of GDP on training in 2012, which is substantially less than in most of the countries tracked, including Austria (0.45 percent), Belgium (0.15 percent), Canada (0.08 percent), Denmark (0.74 percent), Estonia (0.17 percent), Finland (0.52 percent), France (0.34 percent), Germany (0.22 percent), Italy (0.15 percent), Japan (0.05 percent), Korea (0.07 percent), the Netherlands (0.11 percent), New Zealand (0.13 percent), Norway (0.15 percent), Portugal (0.27 percent), and Sweden (0.09 percent). There were several countries that spent the same percentage of GDP or less on training, including Chile, Czech Republic, Mexico, and the Slovak Republic.53 The programs that have fared the best since the 1980s (in terms of funding, but not in terms of program effectiveness) are the dislocated worker programs under JTPA and WIA. Unlike the other programs, funding for dislocated workers has grown substantially since 1985. 3.4

Characteristics of Employment and Training-Program Participants

This section describes the characteristics of employment- and trainingprogram participants, or “customers,” as they are sometimes called. 3.4.1

Characteristics of Recent WIA Exiters

The most recent data available on the adult, dislocated worker, and youth WIA participants are shown in table 3.3. Unfortunately, data on WIA enrollments are not easy to interpret. As noted above, the WIA Adult and Dislocated Worker programs require participants to receive core and intensive services before they can receive the more expensive training services. Core services can be accessed with or without staff assistance (including, in the latter case, via the Internet), and states are asked to report only customers who receive staff assistance; it is likely that states and localities vary in how they interpret the definition of “staff assisted,” particularly since customers who are recorded as staff assisted count in calculating performance, while those who are recorded as self- service do not. Moreover, WIA core services include the same types of services provided by the ES, whose staff members are colocated with WIA at the One-Stop centers, and states vary in their policies regarding coenrollment in the ES and WIA. In addition to varying 53. The OECD data on training as a share of GDP was obtained from http://stats.oecd.org /Index.aspx?DatasetCode=LMPEXP.

156 Table 3.3

Burt S. Barnow and Jeffrey Smith Characteristics of WIA adult and dislocated worker exiters by training status April 2012–March 2013 Adults

Age 18– 21 22– 54 55 and over Gender Female Male Individual with disability Race/ethnicity Hispanic Black, not Hispanic White not Hispanic Other Veteran Average preprogram quarterly earnings Low income Limited English Single parent Public assistance Highest grade/education Less than 12 High school grad. High school equiv. Some postsecondary College graduate BA

Dislocated workers

All

Training

All

Training

9.3 76.9 13.8

11.0 81.7 7.3

3.2 77.2 19.6

2.2 83.6 14.2

47.6 52.4 3.9

54.5 45.5 3.3

48.5 51.5 3.1

47.6 52.4 2.4

10.5 23.6 59.2 6.7 7.8 $6,006 50.2 3.0 15.1 27.4

15.2 23.5 54.0 7.3 7.3 $5,432 60.9 3.3 20.3 32.0

12.8 18.0 62.8 6.4 7.6 $8,566 n/a n/a n/a n/a

12.8 18.8 63.1 5.3 8.8 $8,295 n/a n/a n/a n/a

10.8 37.8 8.0 30.2 13.2

8.1 40.7 9.3 31.3 10.5

n/a n/a n/a n/a n/a

n/a n/a n/a n/a n/a

Source: Social Policy Research Associates (2013).

by state, all of these policies also vary over time in some states, leaving both cross- sectional and longitudinal comparisons open to misinterpretation. For example, in PY 2006, New York adopted a policy of coenrolling all Wagner-Peyser customers in WIA, resulting in an increase in the number of adult WIA exiters entering employment from 20,963 in PY 2005 to 210,049 in PY 2007—more than a 900 percent increase.54 The general trend over time was for increased coenrollment of Wagner-Peyser participants in WIA, making comparisons of enrollments over time difficult to assess. Table 3.3 shows the characteristics of exiters from the WIA Adult and Dislocated Worker programs from April 2012 through March 2013.55 Nine 54. See Trutko and Barnow (2010) for more examples in the variation in how customers are classified across states and over time. 55. The WIA data system is designed to provide data for performance measurement. Because the performance measures track cohorts of exiters, data are provided on exit cohorts rather than all participants in a given period.

Employment and Training Programs

157

percent of the exiters from the Adult program are ages eighteen to twentyone, and thus were also eligible for the Youth program.56 The program served a slightly higher percentage of men than women, 52 percent compared to 48 percent. Individuals with disabilities constitute nearly 4 percent of all adult exiters and about 3 percent of those who received training. A majority of the adult exiters, 59 percent, are white, with black non-Hispanics making up 25 percent of the exiters and Hispanics constituting 10.5 percent. Preprogram quarterly earnings were about $6,000 for those with earnings for all exiters and about $5,400 for adult exiters who received training.57 Because core services under WIA are open to all and access to training is only restricted to low- income individuals if there is not sufficient funding available for all customers the programs would like to enroll, the WIA customers are not as economically disadvantaged as one might expect. Only one- half of the exiters from the Adult program are classified as low income, and only 61 percent of adult exiters who received training are classified as low income, which is somewhat surprising given the focus on low- income families and the relatively broad definition of low income.58 About onequarter of all adult exiters and one- third of adult exiters receiving training are public assistance recipients.59 The educational attainment of the adult exiters is fairly high. Less than 11 percent of the exiters had not completed high school or passed the GED, 30 percent had some postsecondary education, and 13 percent had at least a bachelor’s degree. Adult exiters who received training had roughly equivalent levels of education. Characteristics of dislocated worker exiters are not markedly different from those of the adult exiters. Dislocated worker exiters are less likely to be under twenty- one (3 percent compared to 9 percent for adults), and they are more likely to be age fifty- five and above (20 percent compared to 56. Customers who are coenrolled in two programs are reported for both programs. Thus, some adults are also included in youth program data and dislocated worker program data. 57. Quarterly earnings are derived from state unemployment insurance wage records and thus do not include self- employment income or earnings from government, military, or informal employment. Earnings are the average for the second and third quarters prior to entry if earnings were positive for both quarters. If earnings were positive for only one of the second and third quarters, then the value used is earnings in that quarter. Individuals with zero earnings in both quarters are not included in the average. See appendix B in Social Policy Research Associates (2013) for definitions of terms used. 58. The WIA definition of “low income” is complex; see Social Policy Research Associates (2013, 299) for the full definition. It is broader than being in poverty, and includes all recipients of cash assistance (such as TANF), SNAP (food stamps), and individuals whose family income is less than 70 percent of the lower living standard income level. In 2014, the poverty level for a family of four in the forty- eight contiguous states was $23,850 and 70 percent of the lower living standard income level for a family of four ranged from $23,285 to $31,945, depending on the state of residence and whether the family lived in a specific metro area or a nonmetro area (see Federal Register 2014). 59. Public assistance recipient for WIA reporting is broadly defined and includes TANF, general assistance, SNAP, supplemental security income, and refugee cash assistance (see Social Policy Research Associates 2013).

158

Burt S. Barnow and Jeffrey Smith

14 percent). They are slightly more likely to be white (63 percent compared to 59 percent). Not surprisingly, their quarterly preprogram earnings are substantially higher than adult exiters ($8,566 compared to $6,006). Characteristics of WIA youth exiters are presented in table 3.4. In addition to presenting the data for all youth, data are available for two categories of in-school youth (high school or below and postsecondary) and two categories of out- of-school youth (high school dropouts and high school graduates). Roughly 40 percent of the youth exiters attended high school or a lower level of school, and nearly one- quarter of the exiters were in each of the out- of-school categories (high school dropouts and high school graduates); the balance, about 4 percent of the total, attended a postsecondary school. The WIA Youth program is much more income targeted than the Adult or Dislocated Worker programs, and 97 percent of the exiters were lowincome youth (not shown in table). The youth participants differ in several other ways from those served by the Adult and Dislocated Worker programs. Women made up a majority of the exiters in all categories, with the smallest proportion among dropouts, most likely because young women are more Table 3.4

Characteristics of WIA youth exiters by education status April 2012–March 2013 Attending school

Number of exiters Age 14– 15 16– 17 18 19– 21 Gender Female Male Individual with disability Race/ethnicity Hispanic Black, not Hispanic White, not Hispanic Other Veteran (among 19– 21) Homeless or runaway youth Offender Pregnant or parenting youth Basic literacy skills deficient Ever in foster care

Not attending school

All

High school or below

Postsecondary

High school dropout

High school graduate

112,386

52,954

4,630

28,087

26,706

6.6 36.6 21.9 34.8

13.7 60.5 19.9 5.9

0.1 4.4 19.7 75.8

0.4 26.4 24.8 48.4

0.0 5.8 23.4 70.8

54.6 45.4 13.2

54.1 45.9 19.0

63.6 36.4 5.9

51.2 48.8 7.7

57.4 42.6 8.8

32.5 32.5 29.8 5.2 0.3 4.5 9.5 24.0 64.3 3.7

35.1 32.3 27.1 5.5 0.1 2.6 6.2 19.5 61.0 4.7

43.5 23.4 29.2 3.9 0.3 3.4 5.2 40.5 56.4 3.1

27.2 33.1 34.9 4.8 0.1 6.7 15.3 26.0 74.5 3.0

31.1 33.8 30.1 5.0 0.5 5.6 9.5 28.2 61.3 2.6

Source: Social Policy Research Associates (2013).

Employment and Training Programs

159

likely to stay in school than young men. The racial/ethnic mix of youth exiters also differs from what we see for adults and dislocated workers. In contrast to the other two programs, whites comprised only about 30 percent of exiters, with similar numbers of Hispanic nonblacks and non-Hispanic blacks exiting the program. The WIA youth exiters had a high prevalence of conditions likely to serve as barriers to employment. The proportion of participants with a disability was considerably higher among youth than in the two other programs, running at 13 percent of all youth exiters compared to 4 percent for the Adult program and 3 percent for the Dislocated Worker program. Nearly 5 percent of the youth were classified as homeless or runaway youth, with a rate of nearly 7 percent for dropouts. Nearly 4 percent of the youth exiters had been in foster care, with a higher rate for those attending high school (4.7 percent) and a considerably lower rate for high school graduates (2.6 percent). Nearly two- thirds of the youth exiters were classified as being deficient in basic literacy skills, with nearly three out of four deficient among high school dropouts. 3.4.2

Services Received by WIA Exiters

Table 3.5 summarizes the services received by WIA adult and dislocated worker exiters for recent years (program years 2008 through 2012). Among the adult customers, only 10 to 13 percent received training. Training was somewhat more common for dislocated workers, ranging from 14 percent to 19 percent during this period. Some of the participants received specialized training. Among those receiving training, on- the- job training was received by between 7 percent and 13 percent of the adults and 6 percent and 12 percent of the dislocated workers. Skill upgrading and retraining was slightly more common, with 12 percent to 15 percent of the adults and 14 percent to 16 percent of the dislocated workers receiving this type of training. The incidence of entrepreneurial training, adult basic education (ABE) or English as a Second Language (ESL) in combination with training, and customized training were all less than 5 percent for both adult and dislocated worker exiters.60 About three- quarters of the adult and dislocated worker exiters received other types of training, presumably mostly classroom training. 3.4.3

Outcomes for WIA Exiters

The WIA statute requires that data on the satisfaction levels of employers and participants be collected as part of the performance measurement system. However, when the Department of Labor adopted the common 60. Customized training refers to vocational training developed with input from employers regarding eligibility, curriculum, and requirements for successful completion. Also sometimes referred to as “employer- based training,” customized training often includes provisions for employers to hire or give preference in hiring to individuals who have successfully completed the training.

160 Table 3.5 Year

Burt S. Barnow and Jeffrey Smith Services received by WIA adult and dislocated worker exiters, PY 2008–PY 2012 2008

2009

2010

WIA—Services received by adult exiters General information Total number of exiters 1,040,676 1,187,450 1,252,411 Did not receive training (%) 89.1 86.8 86.7 Received training (%) 10.9 13.2 13.3 Types of training On- the- job training (%) 9.0 7.4 8.9 Skill upgrading & retraining (%) 12.4 14.5 13.1 Entrepreneurial training (%) 0.4 0.1 0.3 ABE or ESL in combination with 2.5 2.9 4.3 training (%) Customized training (%) 6.5 7.5 6.8 Other occupational skills training (%) 72.5 70.7 71.0 WIA—Services received by dislocated worker exiters General information Total number of exiters 364,044 581,985 760,853 Did not receive training (%) 83.8 80.8 81.8 Received training (%) 16.2 19.2 18.2 Types of training On- the- job training (%) 7.5 5.9 6.8 Skill upgrading & retraining (%) 13.6 16.3 14.6 Entrepreneurial training (%) 1.5 0.3 0.3 ABE or ESL in combination with 2.1 2.3 1.8 training (%) Customized training (%) 1.5 1.4 1.3 Other occupational skills training (%) 77.2 76.4 78.2

2011

2012

1,144,947 89.3 10.7

1,111,555 89.6 10.4

10.8 13.1 0.3 3.4

12.6 13.0 0.2 3.1

5.7 70.4

5.7 69.2

750,409 84.5 15.5

705,706 86.0 14.0

10.1 15.2 0.4 1.7

11.8 14.7 0.3 1.4

1.0 74.7

0.7 74.4

Source: Social Policy Research Associates (2013). Notes: Years 2008 through 2011 are program years, for example, PY 2008 is July 1, 2008 through June 30, 2009; 2012 is April 1, 2012 through March 31, 2013. Types of training received may not sum to 100 percent due to enrollment in more than one type of training and rounding.

measures for performance, most states were given waivers from this requirement. Seven states still report satisfaction data, and collectively their satisfaction scores (among respondents) averaged 84 for participants and 77 for employers.61 Data on outcomes for recent exiters from WIA are shown in table 3.6. There are three common measures for adults, dislocated workers, and older 61. Satisfaction is measured by a three- question survey called the American Consumer Satisfaction Instrument (ACSI). Employers and participants respond to each question on a 1 to 10 scale, and the total score is a weighted average of the three responses, scaled to range from 0 to 100. The ASCI measure and its use are described in the Department of Labor (2014h) TEGL 36– 10, accessed at http://wdr.doleta.gov/directives/corr_doc.cfm?DOCN=3052 retrieved November 22, 2014. Satisfaction data are only available for Arizona, Hawaii, Michigan, Minnesota, Puerto Rico, Rhode Island, and Vermont.

Employment and Training Programs Table 3.6

161

Outcomes for PY 2012 WIA exiters for selected subgroups of adults, dislocated workers, and youth

Adults Veterans Public assistance Individuals with disabilities Older individuals With training With only core and intensive Dislocated workers Veterans Displaced homemakers Individuals with disabilities Older individuals With training With only core and intensive Older youth Veterans Public assistance Individuals with disabilities Out- of-school All youth

Entered employment rate (%)

Employment retention rate (%)

Six- month average earnings ($)

59.9 56.5 62.7 41.2 47.9 74.5 58.6 60.0 56.6 54.8 45.5 48.1 81.2 56.4 69.7 54.8 64.8 70.0 69.9 Placement in employment or education (%) 66.0

81.9 81.0 80.4 75.4 81.4 87.3 81.1 84.3 82.4 80.0 78.7 81.4 90.0 82.8 87.7 60.0 85.0 87.9 87.5 Attainment of a degree or certificate (%) 62.3

13,335 15,726 10,447 11,086 14,437 15,986 12,935 15,930 17,073 11,049 13,152 16,221 16,965 15,653 n/a n/a n/a n/a n/a Literacy and numeracy gains (%) 47.5

Source: US Department of Labor (2014d). Data at http://www.doleta.gov/performance /results/eta_default.cfm#wiastann, retrieved November 22, 2014. Note that the outcome data is for program year 2012, July 1, 2012, through June 30, 2013, while the participant data in the two prior tables covers a slightly different period. The data for older youth is based on seven jurisdictions: Arizona, Hawaii, Michigan, Minnesota, Puerto Rico, Rhode Island, and Vermont.

youth: (a) entered employment: of those who are not employed at the date of participation, (number of participants who are employed in the first quarter after the exit quarter)/(number of participants who exit during the quarter); (b) employment retention: of those who are employed in the first quarter after the exit quarter, (umber of participants who are employed in both the second and third quarters after the exit quarter)/(number of participants who exit during the quarter); and (c) average earnings62: of those who are employed in the first, second, and third quarters after the exit quarter, (total 62. Previously, instead of postprogram earnings, the measure for adults was change in earnings from preenrollment earnings, and the measure for dislocated workers was the earnings replacement rate.

162

Burt S. Barnow and Jeffrey Smith

earnings in the second quarter plus total earnings in the third quarter)/ (number of participants who exit during the quarter). The entered employment rate for adult and dislocated worker exiters in PY 2012 was 60 percent, but there was a great deal of variation among subgroups. Subgroups with lower entered employment rates include individuals with disabilities (41 percent for adults and 46 percent for dislocated workers), and older individuals63 (48 percent for both programs). Based on data from Social Policy Research Associates (2013), which covers a slightly different period, there is little variation in the entered employment rate by race/ethnicity and gender. Perhaps surprisingly, adults who were receiving public assistance at entry had an above average entered employment rate. Another surprising finding is that the entered employment rate for older youth (70 percent) is a full 10 percentage points higher than the rate for exiters from the adult and dislocated worker programs (60 percent).64 The entered employment rate differs by a fairly large amount for individuals who received training (75 percent for adults and 81 percent for dislocated workers) compared to those who only received core and/or intensive services (59 percent for adults and 56 percent for dislocated workers). As noted earlier, under WIA, states vary greatly in the proportion of participants that receives training. An analysis of 2002– 2005 data found that the percentage ranged from 14 percent to 96 percent (Trutko and Barnow 2007).65 Subgroup outcomes varied by considerably less for employment retention. Most subgroups fell within a range of 80 percent to 90 percent retention. The only exceptions were individuals with disabilities (75 percent) and older youth veterans (60 percent). The range of outcomes among subgroups was also not large for six- month- average earnings. In part, the compression in the range in outcome results across subgroups is likely an artifact of the way the measures are defined. The entered employment rate is based on data for all exiters, but the employment retention measure only includes individuals who were employed in the first quarter after exit, and the average earnings measure is based only on exiters who were employed in all three quarters after exit; because these measures are based on customers with initial postprogram success, it is not surprising that customers included in the calculations tend to do well on the measures. The three outcome measures used for youth are: (a) placement in employment or education: of those who are not in postsecondary education or employment at the date of participation, (number of youth participants 63. The published data do not include a definition for “older individual,” but based on data in Social Policy Research (2013) it is likely that this refers to participants age fifty- five and above. 64. The older youth outcomes are reported only for six states that have a waiver from using the common measures. 65. Training was much more prevalent in the period covered by Trutko and Barnow (2007), perhaps in part because coenrollment of all Employment Service customers became more common after 2005.

Employment and Training Programs

163

who are in employment or enrolled in postsecondary education and/or advanced training/occupational skills training in the first quarter after the exit quarter)/(number of youth participants who exit during the quarter); (b) attainment of a degree or certificate: of those enrolled in education at the date of participation or at any point during the program, (number of youth participants who attain a diploma, GED, or certificate by the end of the third quarter after the exit quarter)/(number of youth participants who exit during the quarter); and (c) literacy and numeracy gains: of those outof-school youth who are basic skills deficient, (number of youth participants who increase one or more educational functioning levels)/(number of youth participants who have completed a year in the program plus the number of youth participants in the program who exit before completing a year in the youth program). Although we report the youth common measure results in table 3.6, we do not discuss them here because we do not consider them of particular interest. We would prefer to see youth measures based on school status at the time of entry, as employment and earnings measures are of interest for out- of-school youth, but educational measures are of more interest for in-school youth.66 3.5

Current Funding Levels

Table 3.7 lists major means- tested employment and training programs funded by the Department of Labor and other federal agencies. Only programs with at least $30 million in budget authority for 2014 are included. We also omit temporary programs, pilots, and demonstrations, regardless of their size. So, for example, we omit the Transition Assistance Program that provides assistance to separating veterans so that they can reenter the civilian labor market because the program is only funded at $14 million annually and the Trade Adjustment Assistance Community College and Career Training (TAACCCT) grants, which included $464 million in funding for FY 2014 because it is a temporary program with no funding for future years. For each program, we provide a brief description of funding level, program eligibility, and activities. 3.5.1

US Department of Labor Programs

Job Corps 67 Job Corps is a largely residential education and vocational training program serving young people ages sixteen through twenty- four through 66. The Department of Labor does report the outcomes used for Adults and Dislocated Workers for subgroups as well as all older youth, but these results are only available for six states. 67. Material on Job Corps is from http://www.jobcorps.gov/AboutJobCorps.aspx retrieved November 23, 2014.

List of selected means-tested employment and training programs and budgets Agency/office

Programs of other federal agencies Ed./Office of Vocational and Adult Education HHS/Administration for Children & Families Ed./Office of Vocational and Adult Education USDA/Food and Nutrition Service

8,181 1,517e 564 416f

175 166d 82 80 78 46 38

1,684 1,219b 818 764 664 433 306c

Funding—FY 2014 in millions of dollarsa

Sources: Food and Nutrition Service—2015 Explanatory Notes; DOL Budget in Brief FY2015; http://www2.ed.gov/programs/adultedbasic/funding.html; http://www.acf.hhs.gov /programs/ofa/resource/tanf- financial- data- fy- 2013; US Department of Labor (2014c) http://www.doleta.gov/budget/docs/14_final_appropriation_action.pdf. a Unless otherwise noted. Figures rounded to nearest million. Appropriations unless otherwise noted. All figures are in nominal dollars. The DOL ETA figures reflect budgets after the evaluations are set aside. b Includes national emergency grants. c Budget for training only (does not include cash payments). d Actual collected through fees. e Expenditures, FY 2013. f Appropriations, FY 2013.

Pell Grants Temporary Assistance for Needy Families (TANF) grants Adult Education—grants to states SNAP Employment & Training

Department of Labor programs Job Corps DOL/Employment Training Administration WIA Dislocated Workers DOL/Employment Training Administration WIA Youth DOL/Employment Training Administration WIA Adults DOL/Employment Training Administration Wagner-Peyser-Funded Employment Service DOL/Employment Training Administration Senior Community Service Employment Program DOL/Employment Training Administration Trade Adjustment Assistance (TAA) DOL/Employment Training Administration Disabled Veterans Outreach Program (DVOP) and Local Veterans’ Employment Representative Program (LVER) DOL/Veterans’ Employment and Training Service H- 1B Job Training Grants DOL/Employment Training Administration National Farmworker Jobs Program (NFJP) DOL/Employment Training Administration Reintegration of Ex-Offenders (RExO) DOL/Employment Training Administration YouthBuild DOL/Employment Training Administration Indian and Native American Employment and Training DOL/Employment Training Administration Homeless Veterans Reintegration Program (HVRP) DOL/Veterans’ Employment and Training Service

Program title

Table 3.7

Employment and Training Programs

165

vocational and academic training. Through a nationwide network of campuses, Job Corps offers a comprehensive array of career development services to at-risk young women and men to prepare them for careers. Job Corps integrates the teaching of academic, vocational, and employability skills, and social competencies through a combination of classroom, practical, and work- based learning experiences. Enacted budget authority for the Job Corps for PY 2014 is $1,684 million. WIA Adult and Dislocated Worker Programs The WIA Adult and Dislocated Worker programs are “designed to provide quality employment and training services to assist eligible individuals in finding and qualifying for meaningful employment, and to help employers find the skilled workers they need to compete and succeed in business.”68 The program funds are allocated to states by formula based on their unemployment rates and on their number of economically disadvantaged individuals and then distributed to local areas by the same formula. Participants can receive core, intensive, and training services at local American Job Centers, which were described above. Core services are available to all, but intensive and training services are reserved for low- income individuals if there are insufficient funds to serve everyone. The Dislocated Worker program is restricted to individuals who have lost their job or are about to lose their job; the definition also includes self- employed individuals (including farmers and ranchers) who have lost their potential livelihood and displaced homemakers who have lost their financial support from another family member. For PY 2014, the funds appropriated are $764 for the Adult program and $1,219 million for the Dislocated Worker program.69 WIA Youth Program70 The WIA Youth program serves eligible low- income youth, ages fourteen to twenty- one, who face barriers to employment. Funds for youth services are allocated to state and local areas based on a formula distribution. Service strategies, developed by workforce providers, prepare youth for employment and/or postsecondary education through strong linkages between academic and occupational learning. Local communities provide youth activities and services in partnership with the WIA American Job Center system and under the direction of local Workforce Investment Boards. To participate, youth must have low income and one or more prescribed barriers to employment. Budget authority for PY 2014 is $818 million.

68. http://www.doleta.gov/programs/general_info.cfm retrieved November 23, 2014. 69. The dislocated worker appropriation includes formula grants to states and funds for the national reserve fund, which is distributed by ETA based on proposals from the states. 70. Material on the WIA Youth program is from US Department of Labor (2014k). (http:// www.doleta.gov/youth_services/wiaformula.cfm; retrieved November 23, 2014.)

166

Burt S. Barnow and Jeffrey Smith

Wagner-Peyser Employment Service71 The Wagner-Peyser Act of 1933 established a nationwide system of public employment offices known as the Employment Service (ES). The Act was amended in 1998 to make the Employment Service part of the One-Stop services delivery system. The Employment Service focuses on providing a variety of employment- related labor- exchange services including but not limited to job- search assistance, job referral, and placement assistance for job seekers, and reemployment services to unemployment insurance claimants. As described earlier, the ES also provides a variety of services to employers. The Employment Service is open to all, but veterans receive priority of service and are eligible for other special services; although the Employment Service is not means tested, we include it here because many WIA customers are coenrolled in the Employment Service program. Budget authority for PY 2014 is $664 million. Senior Community Service Employment Program72 The Senior Community Service Employment Program (SCSEP) is a community service and work- based job- training program for older Americans. Authorized by the Older Americans Act, the program provides training for low- income, unemployed seniors. Participants also have access to employment assistance through American Job Centers. The SCSEP participants gain work experience in a variety of community service activities at nonprofit and public facilities, including schools, hospitals, day- care centers, and senior centers. The program provides over 40 million community service hours to public and nonprofit agencies, allowing them to enhance and provide needed services. Participants work an average of twenty hours a week, and are paid the maximum of the relevant federal, state, or local minimum wages. This training is intended to serve as a bridge to unsubsidized employment opportunities for participants. Participants must be at least fifty- five years old, be unemployed, and have a family income of no more than 125 percent of the federal poverty level. Funding for PY 2014 is $433 million. Trade Adjustment Assistance (TAA)73 The Trade Adjustment Assistance (TAA) Program is a federal entitlement program that assists US workers who have lost or may lose their jobs as a result of foreign trade. This program seeks to provide adversely affected 71. Material on the Wagner-Peyser Act Employment Service is from http://www.doleta.gov /programs/Wagner_Peyser.cfm retrieved November 23, 2014. 72. Material on the SCSEP is from the US Department of Labor (2014f). (http://www.doleta .gov/seniors/; retrieved November 24, 2014.) 73. Material on Trade Adjustment Assistance is from the US Department of Labor (2014g). (http://www.doleta.gov/tradeact/docs/program_brochure2014.pdf.)

Employment and Training Programs

167

workers with opportunities to obtain the skills, credentials, resources, and support necessary to become reemployed. Participants are eligible to receive employment and case management services, training, cash payments called trade readjustment allowances (TRA) when unemployment insurance is exhausted, and job search and relocation allowances. The program also includes a wage subsidy for up to two years that is available to reemployed older workers and covers a portion of the difference between a worker’s new wage and their old wage (up to a specified maximum amount). Note that this program is not means tested. Enacted budget authority for PY 2014 is $306 million. Employment Services for Veterans74 The Veterans’ Employment and Training Service (VETS) offers employment and training services to eligible veterans through the Jobs for Veterans State Grants Program. Under this grant program, funds are allocated to state workforce agencies in direct proportion to the number of veterans seeking employment within their state. The grants support two principal state workforce agency staff positions: Disabled Veterans’ Outreach Program Specialists (DVOPs) and Local Veterans’ Employment Representatives (LVERs). The DVOP and LVER staff provides services to all eligible veterans, but their efforts are concentrated on outreach and the provision and facilitation of direct client services to those who have been identified as most in need of intensive employment and training assistance. Disabled Veterans Outreach Program (DVOP) specialists provide intensive services to meet the employment needs of disabled veterans and other eligible veterans, with the maximum emphasis directed toward serving those who are economically or educationally disadvantaged. Local veterans’ employment representatives conduct outreach to employers and engage in advocacy efforts with hiring executives to increase employment opportunities for veterans, encourage the hiring of disabled veterans, and generally assist veterans to gain and retain employment. The LVER staff conducts seminars for employers and job- search workshops for veterans seeking employment, and facilitate priority of service in regard to employment, training, and placement services furnished to veterans by all staff of the employment service delivery system. Combined enacted budget authority in PY 2014 for DVOP and LVER positions is $175 million. H-1B Job-Training Grants 75 The Job Training for Employment in High Growth Industries Grants are designed to provide training for workers according to need in different 74. Material on Employment Services for Veterans is from the US Department of Labor (2014i). (http://www.dol.gov/vets/programs/empserv/employment_services_fs.htm.) 75. Material on H- 1B Job Training Grants is from the US Department of Labor (2014a).

168

Burt S. Barnow and Jeffrey Smith

sectors of the economy. The funding for this program is provided from H- 1B visa fees. The Department’s long- term goal for the program is to decrease the need for these visas by helping American workers develop the high- level skills needed by these employers. The Department intends to use this program to support training and education models that lead to highly skilled technical jobs. The fees collected for this program in FY 2014 totaled $166 million. Migrant and Seasonal Farmworker Program76 The Employment and Training Administration’s Migrant and Seasonal Farmworker Program, also sometimes called the National Farmworker Jobs Program, provides services to the American farmworker population to help combat the chronic underemployment experienced by workers who depend primarily on agricultural labor jobs (US Department of Labor 2014b). The National Farmworker Jobs Program provides funding to community- based organizations and public agencies to assist migrant and seasonal farmworkers and their families attain greater economic stability. Farmworkers also receive training and employment services through the nationwide network of American Job Centers. Funding for PY 2014 is $82 million, which is expected to serve approximately 19,000 participants.77 Reintegration of Ex-Offenders (RExO)78 The RExO program provides funding to pilots and demonstration projects designed to test the effectiveness of successful models and practices found in community and faith- based environments and other government systems that have not been tested for their adaptability to the public workforce system. The RExO is designed to strengthen communities through projects that incorporate mentoring, job training, education, legal aid services, and other comprehensive transitional services. Grants are awarded through a competitive process open to any nonprofit organization with 501(c)(3) status, unit of state or local government, or any Indian and Native American entity eligible for grants under Workforce Investment Act Section 166 in areas with high poverty and crime rates that meet the requirements of the solicitations. Enacted budget authority for the program in PY 2014 is $80 million.

76. Material on the Migrant and Seasonal Farmworker Program is from the US Department of Labor 2014b). (http://www.doleta.gov/Farmworker/; retrieved November 23, 2014.) 77. Number of participants expected and funding level are from US Department of Labor (2014b). 78. Material on RExO is from the US Department of Labor (2014e). (http://www.doleta .gov/RExO/.)

Employment and Training Programs

169

YouthBuild 79 YouthBuild is a community- based alternative education program that provides job training and educational opportunities for at-risk youth ages sixteen to twenty- four. The program was transferred from the Department of Housing and Urban Development to the Department of Labor in 2006. Youth learn construction skills while constructing or rehabilitating affordable housing for low- income or homeless families in their own neighborhoods. Youth split their time between the construction site and the classroom, where they earn their GED or high school diploma, learn to be community leaders, and prepare for college and other postsecondary training opportunities. YouthBuild includes significant support systems such as mentoring, follow-up education, employment, and personal counseling services and participation in community service and civic engagement. There are over 250 DOL funded YouthBuild programs in forty- five states serving over 10,000 youth per year.80 Funding for YouthBuild in PY 2014 is $78 million. The Indian and Native American Program81 The Indian and Native American Program serves American Indians and Native Americans through a network of 178 grantees. To meet the employment and training needs of the Indian, Alaskan Natives, and Native Hawaiian populations, the enacted budget authority for PY 2014 is $46 million. At this funding level, the program serves approximately 28,000 unemployed and underskilled Indian, Alaskan Native, and Native Hawaiian adults and youth. Homeless Veterans Reintegration Program82 The purpose of the Homeless Veterans’ Reintegration Program (HVRP) is to provide services to assist in reintegrating homeless veterans into meaningful employment within the labor force and to stimulate the development of effective service delivery systems that will address the complex problems facing homeless veterans. Funds are awarded on a competitive basis to eligible applicants. Grantees provide an array of services utilizing a case management approach that directly assists homeless veterans as well 79. Material on YouthBuild is from the US Department of Labor (2014n). (http://www .doleta.gov/Youth_services/Youth_Build.cfm; retrieved November 23, 2014.) 80. The data on enrollment and the number of program sites is from YouthBuild’s web site (https://youthbuild.org/ ); the numbers are higher than what is found on the Department of Labor web site. 81. Material on the Indian and Native American Program is from the US Department of Labor (2014a). 82. Material on the Homeless Veterans Reintegration Program is from the US Department of Labor (2014j). (http://www.dol.gov/vets/programs/fact/Homeless_veterans_fs04.html.)

170

Burt S. Barnow and Jeffrey Smith

as critical linkages for a variety of supportive services available in their local communities. The program is “employment focused,” and veterans receive the employment and training services they need in order to reenter the labor force. Job placement, training, job development, career counseling, and resume preparation are among the services that are provided. Supportive services such as clothing; provision of or referral to temporary, transitional, and permanent housing; referral to medical and substance abuse treatment; and transportation assistance are also provided to meet the needs of this target group. Budget authority for the program in FY 2014 is $38 million. 3.5.2

Employment and Training Programs Operated by Other Agencies

Pell Grants (US Department of Education)83 The Pell grant program provides need- based grants to low- income undergraduate and certain postbaccalaureate students to promote access to postsecondary education.84 Students may use their grants at any one of approximately 5,400 participating postsecondary institutions. Grant amounts are determined by the student’s expected family contribution, the cost of attendance (as determined by the institution), the student’s enrollment status (full time or part time), and whether the student attends for a full academic year or less. The maximum Pell grant for the award year beginning July 1, 2015, is $5,775.85 Financial need is determined by the US Department of Education. Pell grants may be used at two- year and four- year institutions, and at public and private institutions. Students can receive Pell grants for study toward a degree or eligible certificate programs. The funds can be used to pay for tuition, fees, and other expenses, or to help pay for the living expenses of the student. Although the program is not an entitlement, sufficient funds are generally appropriated to meet the needs of all eligible applicants. For the 2013/14 school year, the latest period for which data are available, an estimated $33.7 billion was spent on Pell grants, according to Baum, Elliott, and Ma (2014). Analysis of National Postsecondary Student Aid Survey (NPSAS) data indicates that $5,060 million was used for occupational degrees and another $3,121 million was used for certificate programs in the 2011/12 school year.86 Thus, Pell- grant spending is the largest single 83. Information on Pell grants is from http://www2.ed.gov/programs/fpg/index.html. 84. Pell grants to students pursuing postbaccalaureate study are rare and restricted to students pursuing a teaching certificate. 85. From https://studentaid.ed.gov/types/grants- scholarships/pell accessed February 1, 2015. 86. We are extremely grateful to Sandy Baum for performing the analyses generating these estimates. The Department of Education’s programs that are classified as career and technical education can be found at: http://nces.ed.gov/surveys/ctes/tables/postsec_tax.asp (accessed March 1, 2015).

Employment and Training Programs

171

source of funding for means- tested employment and training programs, and it is roughly twice the spending for the WIA Adult, Dislocated Worker, and Youth programs combined.87 Temporary Assistance for Needy Families (TANF) (US Department of Health and Human Services)88 Temporary Assistance for Needy Families is the federal welfare program for families with children. Under TANF the federal government provides block grants to the states, which use these funds to operate their own programs. In order to receive federal funds, states must also spend some of their own dollars on programs for needy families. States can use federal TANF and state maintenance of effort (MOE) dollars to meet any of the four goals set out in the 1996 law: “(1) provide assistance to needy families so that children may be cared for in their own homes or in the homes of relatives; (2) end the dependence of needy parents on government benefits by promoting job preparation, work, and marriage; (3) prevent and reduce the incidence of out- of-wedlock pregnancies and establish annual numerical goals for preventing and reducing the incidence of these pregnancies; and (4) encourage the formation and maintenance of two- parent families.” The 1996 law sets forth twelve categories of work activities that can count toward the required work rates. Nine of these twelve categories are core categories that can count toward any hours of participation; participation in the three noncore categories can only count if the individual also participates in core activities for at least twenty hours per week (thirty hours for two- parent families). The nine core activities are: unsubsidized employment, subsidized private- sector employment, subsidized public- sector employment, work experience, on- the- job training, job- search and job- readiness assistance, community service programs, vocational educational training (for up to twelve months), and providing child- care services to an individual who is participating in a community service program. The three noncore activities are: job- skills training directly related to employment, education directly related to employment, and satisfactory attendance at secondary school or in a course of study leading to a GED. Federal expenditures for work- related TANF activities for FY 2013 were $1,517 million.

87. Pell grants can be used for WIA participants if the income and program requirements are met, and the WIA statute required that funding sources such as Pell grants be used when possible. However, training programs offered by WIA are often too short to qualify for Pell grants, and dislocated workers enrolled in WIA training programs may not meet the income requirements. For the period from April 1, 2012 through March 31, 2013, Social Policy Research Associates (2013) reports that out of 1.6 million exiters, 17,246 adult programs participants, 10,836 dislocated worker participants, and 14,115 youth received Pell grants. 88. Material on TANF is from Center on Budget and Policy Priorities (2012).

172

Burt S. Barnow and Jeffrey Smith

Adult Education Basic Grants to States (US Department of Education) This program provides grants to states to fund local programs of adult education and literacy services, including workplace literacy services, family literacy services, English literacy programs, and integrated English literacycivics education programs. Participation in these programs is limited to adults and out- of-school youths age sixteen and older who are not enrolled or required to be enrolled in secondary school under state law. More than 2,500 programs deliver instruction through public schools, community colleges, libraries, and community- based organizations, and other providers. The programs provide instruction in reading, numeracy, GED preparation, and English literacy. More than 1.8 million adults participated in programs in program year 2011/12. The appropriation for the program for FY 2014 is $564 million. Supplemental Nutrition Assistance Program Employment and Training (SNAP E&T) (US Department of Agriculture)89 The SNAP E&T program is a funding source that allows states to provide employment and training and related supportive services to individuals receiving Supplemental Nutrition Assistance Program (SNAP) benefits. These services are intended to assist recipients in gaining skills, training, work, or experience that will increase their employment and earnings and reduce their need for SNAP. In an average month in FY 2013, more than 47 million individuals received SNAP benefits; however, in 2012, the most recent year for which data are available, only 15.3 percent of nonelderly adult SNAP recipients participated in SNAP E&T activities. The SNAP E&T program supports a range of employment and training activities for SNAP recipients. Such activities can include job search, job- search training, work experience or workfare, and education and training including basic- skills instruction. Employability assessments and case management services can be part of a component, but cannot be stand- alone activities. The SNAP E&T program can also be used to provide job- retention services for up to ninety days after an individual who received other services under SNAP E&T gains employment. The 2013 appropriation for the program was $416 million. 3.6

Program Evaluation Issues

Employment and training programs in the United States have received more attention from evaluators than many programs far larger in budgetary terms, such as SNAP or Social Security Disability Insurance (SSDI). Relatively large participant populations as well as available administrative data 89. Material on SNAP E&T is from Lower-Basch (2014).

Employment and Training Programs

173

plus the absence of a constituency powerful enough to block serious evaluation conspire to make this so.90 We have (much) more to say about the substantive findings from that evaluative activity later on. In this section, we lay the foundation for our substantive discussion by describing the fundamental evaluation problem, along with the usual approaches to solving it, both in the broad sense of evaluation policy and in the narrow sense of applied econometrics. Our discussion presumes voluntary rather than mandatory participation because almost all participants in the major (and most minor) US employment and training programs volunteer for the privilege. To fix ideas, it helps to adopt some formal notation. We use the standard potential outcomes framework, in which Y1i denotes the outcome that individual “i” would experience if she received program services (and so is “treated” in the jargon of the literature), while Y0i denotes the outcome that same individual would receive if she did not receive program services (and so is not “treated”).91 The outcome remains generic at this point; it could be earnings, employment, an indicator for obtaining a job with employerprovided health insurance, and so forth. The term “potential” outcomes refers to the fact that each individual will actually experience only the outcome associated with their program participation status, while the other outcome remains an unrealized, and thus unobserved, potential. If we define Di as an indicator variable for program participation, then we can write the observed outcome Yi as a function of the potential outcomes: Yi = DiYi1 + (1 − Di )Y0i. The impact of the program on individual “i” equals the difference between their treated and untreated potential outcomes, or δi = Y1i – Y0i. Interest in impact evaluations centers on various means of these individual impacts. The most common parameter of interest in practice is the average impact of the treatment on the treated, given by E(Y1 – Y0 | D = 1). In words, this parameter captures the mean difference between the outcome with treatment and the outcome without treatment for those who receive the treatment. If the policy question at hand concerns whether to keep or eliminate a program as it currently operates, versions of this parameter for different outcomes lie on the benefit side of the relevant cost- benefit calculation. A second parameter of interest is the average treatment effect (ATE) in the population. In notation, we have E(Y1 – Y0). Typically, interest lies in the ATE for some actual or potential eligible population. Versions of this 90. Perusing the last print version of the Digest of the Social Experiments, Greenberg and Shroder (2004), suggests a strong revealed preference for experimenting on disadvantaged people and criminals and a strong revealed preference against experimenting on the middle class. Greenberg, Shroder, and Onstott (1999) provide some quantitative confirmation of these patterns based on an earlier edition of the book. 91. The potential outcomes framework is variously attributed to Frost (1920), Neyman (1923), Fisher (1935), Roy (1951), Quandt (1972), and Rubin (1974). Disciplinary affiliation and academic genealogy strongly predict attributions.

174

Burt S. Barnow and Jeffrey Smith

parameter for various outcomes figure in a cost- benefit calculation designed to answer the question of whether or not a mandatory version of a program makes sense from an (economic) efficiency point of view. Less frequently, evaluations of job- training programs consider other parameters of interest, which in turn address other substantive questions of interest. Quantile treatment effects (QTEs) reveal how a treatment affects the entire distribution of outcomes. In practice, they consist of differences between the corresponding quantiles of the outcome distributions for the treated and (corrected for selection, if required) untreated units, that is, differences of quantiles of F(Y1 | D = 1) and F(Y0 | D = 1). Thus, for example, with experimental data, the QTE at the median is the difference between the median outcome in the treatment group and the median outcome in the control group. They also provide information about how the program affects inequality within the treated population and, as in Bitler, Gelbach, and Hoynes (2006), they can even play a role in testing theoretical models of participant behavior. We find them surprisingly underutilized in practice in evaluations of job- training programs.92 Another question of frequent policy interest concerns the impact of programs on participants at the margin of participation, where the margin may depend on the choices of would-be participants, of program staff, or both. The effect on marginal participants informs choices about whether to modestly expand or contract the program. Pinning down effects on marginal participants requires additional work at the design stage and/or additional measurement. That additional work enables estimation of impacts at the margin as in the discontinuity design exploited in Black et al. (2003), or of local average treatment effect in a randomized encouragement design, or a subgroup analysis of impacts on likely marginal participants as identified by program staff or the participants themselves. In our view, such work gets done far too infrequently in this literature, given that the relevant policy question (at least implicitly) is almost always expansion or contraction rather than eliminating the program or making it mandatory. The final parameters of interest require the joint distribution of the treated and untreated outcomes rather than just their marginal distributions. Examples of such parameters include the variance of impacts, the fraction of impacts that are positive, and quantiles of the distribution of impacts (which are not the same thing as impacts on quantiles of the outcome distribution). Because (perhaps wrongly) the literature rarely analyzes these parameters in practice, we deem them beyond the scope of our chapter (see Heckman, Smith, and Clements [1997] and Djebbari and Smith [2008] for more). One path to avoid all of the conceptual and econometric complications 92. For more on QTEs see, for example, Koenker and Bassett (1978), Heckman, Smith, and Clements (1997), Abadie, Angrist, and Imbens (2002), Bitler, Gelbach, and Hoynes (2005), and Djebbari and Smith (2008).

Employment and Training Programs

175

associated with alternative treatment effect parameters leads to the common effect model. Much of the applied literature, especially the older applied literature, implicitly assumes a common effect world, in which “the effect” of training is the same for all participants. A more sophisticated version of this view allows that the effect of training varies, but assumes that neither potential participants nor program gatekeepers can predict the variation, with the result that it plays no role in program participation decisions. In our view, the available evidence militates strongly against the common effect view, particularly in the context of training programs as operated in the United States. One very compelling reason for thinking that there are heterogeneous treatment effects is that there are, almost always in the US program context, heterogeneous treatments. In this sense, the programs covered in this chapter differ from both the budgetary treatments (e.g., the Earned Income Tax Credit) and the cash and in-kind transfer programs (e.g., Temporary Assistance for Needy Families) considered in the other chapters of this volume. Coding up an indicator variable for receipt of training or, even more dramatically, receipt of any services from some employment and training program, implicitly disguises a substantial amount of heterogeneity in the program as experienced by participants. One trainee may take a community college course in cosmetology, while another takes a course from the Salvation Army in computer repair, and still another receives subsidized on- the- job training at Whataburger. More broadly, some participants may receive instruction designed to prepare them to obtain a GED, while others receive only job- search assistance. Some evaluations distinguish among broad categories of services, such as classroom training or job- search assistance,93 but as the examples just listed illustrate, most programs embody substantial heterogeneity even within broad service categories. In addition to heterogeneous services, programs operate in heterogeneous contexts in terms of aggregate labor market conditions, industry and occupation mix, and so on. Lechner and Wunsch (2009) and Heinrich and Mueser (2014), among others, provide evidence that the effect of training varies with local labor market conditions. The literature also offers a long history of estimated differences in impacts between different demographic groups and participants at different sites. For instance, LaLonde (2003) describes the durable finding (in the US literature) that adult women benefit the most from training, followed by adult men, followed by male and female youth. We have more to say about subgroup effects below; for our purposes here, differences by age, sex, and site suggest the presence of differences on other unmeasured dimensions as well. As another argument for heterogeneity in treatment effects, think about 93. For instance, Hotz, Imbens, and Klerman (2006) apply nonexperimental methods to the experimental data from the California Greater Avenues to Independence experiment in order to disentangle the effects of particular service types.

176

Burt S. Barnow and Jeffrey Smith

the relationship between impacts on earnings and employment over some period during and after training. If some participants have zero earnings, but the program has a nonzero mean impact, then some heterogeneity in impacts must exist, as the participants with zero earnings must have had a zero or negative impact (because earnings are bounded below by zero) while some other participants had positive impacts. Finally, and a bit more technically, as noted in Heckman, Smith, and Clements (1997) it is possible to place an empirical lower bound on the impact variance. This lower bound corresponds to the variance of the quantile treatment effects described above. Heckman, Smith, and Clements (1997) calculate this lower bound for the adult women in the Job Training Partnership Act experiment and find that it is statistically and (what is more important) substantively different from zero. Taken together, we find the case for heterogeneous treatment effects that substantively matter quite compelling, and assume them in all that follows. To motivate the problem of nonrandom selection into programs, it helps to think about a simple model of program participation. We draw here on the models in Heckman and Robb (1985) and Heckman, LaLonde, and Smith (1999). We begin with a simple model in which training is available in only one period, lasts exactly one period, and is not announced in advance. This allows us to view the participation choice as a static problem, though one with dynamic implications. Call the period of program availability period k. In periods prior to k, all individuals have the outcome function Y0it = X itbX + hi + ´it for t < k. Here Xit denotes various determinants of outcomes unaffected by treatment, with an associated vector of coefficients bX, hi denotes the time- invariant unobserved component of outcomes for individual “i,” and εit denotes the transitory component of outcomes for person “i” in period “t.” After period k, the same function persists, but with the addition of an additive treatment effect received only by participants. In notation Yit = X itbX + DibDi + hi + ´it for t ≥ k, with Y1it = Y0it + bDi . The “i” subscript on bDi captures heterogeneity in the treatment effect; we assume for simplicity that the heterogeneous effect persists indefinitely. Potential trainees may know their treatment effect, or not know it, or something in between.94 Extending the model to allow the treatment effect to vary with observed characteristics—to capture systematic heterogeneity in 94. See Carneiro, Hansen, and Heckman (2003) for an analysis that estimates the fraction of the variation in treatment effects known ex ante in an educational context.

Employment and Training Programs

177

treatment effects in the terminology of Djebbari and Smith (2008)—via interaction terms follows easily. As unobserved variables that affect outcomes may vary over time at lower frequency than the observed data, we allow for serial correlation in the “error” term; in particular, we assume for convenience an autoregressive form, with ´it = r´i,t−1 + vit where nit is an independently and identically distributed (over time and people) shock and –1 < r < 1 to keep the process from diverging. The participation decision depends on a comparison of costs and benefits. The benefit comes in the form of the discounted present value of the stream of future treatment effects. The costs come in the form of direct costs Ci, which may include tuition, transportation, or books as well as negative direct costs in the form of subsidies to participation, and the opportunity cost of having Yik = 0 during the training period. Formally, the potential participant calculates Di* =

b Di − Ci −Y0ik + ui r

where r denotes the interest rate, ui denotes unobserved factors affecting the net utility of training, and we make the simplifying assumption that the potential trainee lives forever. We do not make costs a function of observed characteristics, but it would be easy and reasonable to do so. If D* > 0 then the individual chooses to participate in period k while if D* ≤ 0 the individual forgoes the opportunity and continues to receive Y0it in all future periods. What do we learn from this model? First, opportunity costs play a key role. The fact that the untreated outcome in period k enters the decision problem along with the direct costs and the discounted impacts means that individuals who choose to train will have differentially low values of Y0k. Those low values in period k can result from a low value of the timeinvariant unobserved component of earnings, a low value of the transitory component, and/or values of X associated with low earnings. Thus, to the extent that the time- invariant unobserved component accounts for a substantial amount of the variation in earnings for the relevant population, we would expect substantively important selection on it, with trainees having a lower average value than nontrainees. This selection will lead to persistent differences in the mean earnings of participants and nonparticipants both before and after period k. If this were the only source of selection into training, researchers would naturally gravitate toward difference- in-differences and related longitudinal estimators of program impact. In practice (more about this below) such estimators play only a minor role in this literature, because of empirically important selection on the transitory unobserved component as well. Due to the assumed serial correlation in the transitory

178

Burt S. Barnow and Jeffrey Smith

unobserved component, selection on this component operating via selection on the opportunity cost of participation leads to the empirically ubiquitous “Ashenfelter dip,” first identified in this literature in Ashenfelter’s (1978) paper on MDTA, the programmatic great- great- grandparent of WIOA. This combination of selection on both transitory and more permanent components of the outcome process complicates credible estimation of the causal effects of training. In addition to opportunity costs, participation depends on direct costs and on the person- specific impact of training. Though some literatures make intensive use of direct cost measures as sources of exogenous variation in participation, the training literature has not seen much work along these lines. The training literature does pay close attention to heterogeneous treatment effects. As noted in Heckman, LaLonde, and Smith (1999), heterogeneous treatment effects uncorrelated with other factors affecting participation and outcomes act like (partial) random assignment and so reduce the difficulty of the selection problem; the empirical relevance of that observation remains largely unexplored. The simple model also illustrates why we would expect participants to have different impacts than nonparticipants. Indeed, with impacts known in advance, and conditional on particular values of direct costs and opportunity costs, everyone who participates has a higher impact than everyone who does not participate. Even impacts estimated without bias but with some uncertainty prior to the training choice imply that the ATET > ATE > ATNT, where the ATNT is the average treatment effect on the nontreated. The ATE is, of course, just a weighted average of the ATET and ATNT and so must lie between them. More ex ante uncertainty about individual impacts weakens this pattern. Also clear from the model is the fact that the ATET provides an upper bound, and possibly a distant upper bound, on the treatment effects that would be experienced by marginal participants enticed into the program by a small reduction in the direct costs C.95 Of course, as we intend the name “simple model” to signal, this model leaves a lot out, even as it also captures quite clearly the key factors that make compelling nonexperimental estimation of the impacts of training programs challenging to evaluators. One omission from the model concerns the underlying behavioral foundations of the Ashenfelter dip. In the model, the dip results from the fact that individuals select into training based in part on having a low opportunity cost; this results, via the serial correlation in ε, in many individual participants having a gentle decline in their mean earnings in the periods prior to period k. In reality, almost no one experiences this gentle decline. Instead, most individuals have a sharp drop in earnings at a discrete point in time due to job loss. The smooth dip observed in the 95. See the Monte Carlo study that builds on this model in section 8 of Heckman, LaLonde, and Smith (1999) for further analyses and intuition.

Employment and Training Programs

179

aggregate results from the averaging of these sharp falls, which become more common as period k approaches. Another omission from the model concerns selection on earnings trajectories, as opposed to just selection on earnings shocks (i.e., on ε) and on persistent differences in earnings levels (i.e., on η and/or X ). By way of illustration, consider two scenarios. In one scenario, some individuals select into training when they decide to get serious about life, or at least about the labor market. In another scenario, some individuals select into training because they have lost a job in which their pay well exceeded the value of their marginal product in the other jobs available to them. In many industries the number of such jobs has declined over time due to freer trade or deregulation; as a result those who lose such jobs often experience persistent earnings decreases, as in Jacobson, LaLonde, and Sullivan (1993).96 We will return to this scenario below when we discuss the evidence on the WIA dislocated worker funding stream.97 Our simple model also omits the justifications for government intervention in the training market discussed at the beginning of the chapter. One could easily have the direct costs C reflect a government subsidy, which would capture the justification from credit constraints, but the model still ignores any of the informational justifications for government intervention. First, the government may have better information on the state of the labor market (the “labor market information” aspect of active labor market programs) for particular occupations. This can help the would-be trainee choose between skill upgrading in their existing occupation (or just more effective search in that market) and investing in human capital associated with a new occupation. The government, via knowledgeable and experienced caseworkers armed with standardized tests, may have a better sense of how a given participant’s skills and interests match up to particular occupations. This allows more effective investments in training and more effective job search. Finally, the government, again via the caseworkers, may provide quality signals to firms looking for workers to hire into subsidized on- thejob training slots. Because programs recommend only a subset of their participants for these slots, and because they are engaged in a repeated game with individual employers in which reputation matters on both sides, it can make these signals credible in a way that the workers themselves cannot. The literature on government- sponsored job training lacks formal models capturing these aspects of the process, though the directed search model in Plesca’s (2010) equilibrium analysis of the Employment Service implicitly captures some of them by having different matching technologies for workers who search via the ES than for workers who search on their own. 96. See Krolikowski (2014) for a new look at displaced workers through the lens of the dynamic treatment effect literature. 97. The data from the JTPA experiment (described below) imply little selection on earnings trajectories for adults, but modest amounts for youth (see Heckman and Smith 1999).

180

Burt S. Barnow and Jeffrey Smith

The simple model above also completely ignores the supply side of the “market” for services provided by government employment and training programs. Instead, it focuses solely on the participation decision facing the potential trainee. We rectify this omission in the model later in our discussion of performance management, which plays a very important role in shaping the supply responses of the major US employment and training programs. Finally, the model discussed here ignores general equilibrium effects (i.e., effects on those not participating in the program); we discuss those later in the context of cost- benefit analysis. The standard theory, along with empirical evidence from both experimental and nonexperimental studies, strongly indicates selection into employment and training programs based on both transitory and relatively more permanent components of outcomes. The literature that evaluates employment and training programs in the United States has adopted a variety of data sources, identification strategies, and econometric estimators to deal with the problem of nonrandom selection into programs. Indeed, as we will have some occasion to note, this literature has played an important role in the evolution of applied econometric methods more broadly. We turn now to a limited methodological review, emphasizing those identification strategies and related estimators and data sets most commonly used in the US literature, namely random assignment and “selection on observed variables.” We focus almost exclusively on impact evaluations (we are economists after all) but note that, in our experience, well- designed and executed process and implementation evaluations are important complements to econometric impact evaluations. Unlike every other country, at least until the last decade or so, the United States sometimes evaluates its employment and training programs using random assignment designs.98 In terms of our notation, random assignment as typically implemented involves taking a sample of would-be participants, that is, D = 1 individuals, and randomly forcing some of them to experience the untreated outcome Y0 (the experimental control group) while randomly allowing others to experience the treated outcome Y1 (the experimental treatment group). Randomly assigning treatment assures (statistically) equivalent distributions of all the relevant variables (i.e., X, η, ε, C, and u) in the two groups. As a result, a simple comparison of means provides a consistent (in the statistical sense) and compelling estimate of the ATET. Running experimental evaluations and meaningfully interpreting the re98. Experimental evaluations of labor market programs outside the United States include the Self-Sufficiency Project in Canada described in Ford et al. (2003), the UK Employment Retention and Advancement Demonstration documented in Hendra et al. (2011), caseworker experiments in Denmark evaluated in Pederson, Rosholm, and Svarer (2012), and a very impressive multilevel randomized evaluation in France recounted in Crépon et al. (2013). White and Lakey (1992), who evaluate the UK RESTART program, provide a rare exception to our general claim.

Employment and Training Programs

181

sulting data in the real world differs from the pleasant but oversimplified description in the preceding paragraph. We briefly note a subset of the issues here, focusing on those most important to the literature whose evidence we review later in this chapter, and starting with randomization bias.99 Randomization bias means bias induced by the presence of an experimental evaluation. It is bias relative to the population value of the impact parameter of interest in a world without randomization, that is, in the world of the program as it normally operates. Consider the following examples: First, the presence of random assignment may change the participant population because potential participants on the margin of participation may find it optimal to pay the fixed cost of attempting to participate in the absence of random assignment, but not in its presence, because the possibility of randomization into the control group reduces the expected value of the attempt. Second, as noted in Heckman and Smith (1995), the presence of randomization may lead individuals to change their behavior even if they do still choose to participate, as when participants reduce pre- random- assignment investments complementary to the treatment due to the uncertainty of receiving it. Third, the institutional trappings associated with randomized evaluations, but not generally with nonexperimental evaluations, may lead to differences between the participant population of interest and that in the evaluation, due, for example, to selective removal of those put off by signing consent forms. Sianesi (2014) documents the empirical importance of this behavior, and of the resulting bias, in the context of the evaluation of a program providing ongoing support for unemployed workers who find a job in the United Kingdom. Fourth, randomization will affect the scale of program intake and thereby lead to differences between the population served by the program as it normally operates and during the randomized evaluation. For example, in the Job Training Partnership Act experiment (described in more detail in the results section), sites were instructed to keep the number of individuals they served the same during the evaluation, so as to avoid randomization bias due to a change in program scale. But this stricture, coupled with a 2:1 random assignment ratio, meant that sites had to recruit a substantially larger number of potential participants than they normally would. Indeed, this requirement played a role in the site selection difficulties we discuss in the next paragraph, because many sites worried about the quality of the marginal participants drawn in as part of the larger pool of potential participants.100 In multisite programs (like JTPA, WIA, WIOA, and the Job Corps), ran99. For more on social experiments see, for example, Ferber and Hirsch (1981), Heckman (1992), Burtless (1995), Heckman and Smith (1995), Orr (1998), and Heckman, LaLonde, and Smith (1999, section 5). 100. One member of the design team (Barnow) for the JTPA evaluation suggested having sites identify the marginal participants; this was not done in that study, but is being done in the WIA experimental evaluation described below.

182

Burt S. Barnow and Jeffrey Smith

dom assignment can increase the per- site costs of the evaluation and can complicate external validity by making sites more reluctant to participate due to the disruptions in normal program operation necessitated by random assignment. For example, as detailed in Hotz (1992) and Doolittle and Traeger (1990), the JTPA experimental evaluation’s attempt to recruit a random sample of sites that would allow compelling generalization to the population of sites failed miserably.101 In the end, and after nontrivial side payments plus some design compromises, a nonrandom sample of sixteen sites was obtained.102 The result was controversy regarding the external validity of the experimental findings (see, e.g., Heckman and Smith [2000] or Heckman and Krueger [2003]). While the literature offers various strategies for generalizing from nonrandom samples of sites, these strategies remain controversial and thus inferior to including all sites or evaluating at a random sample of sites.103 Finally, and the best documented (if not necessarily the most important) empirically, we may have treatment group dropout104 and control group substitution. In an ideal experiment, everyone randomized into the experimental treatment group would receive treatment, and no one in the experimental control group would receive treatment. In this pure case, the experimental contrast clearly represents the causal effect of receipt of treatment rather than no treatment for the relevant population. In practice, because of institutional factors, as well as evaluation design choices and the sometimes chaotic lives of the individuals who participate, or consider participating, in active labor market programs, real experiments rarely look this clean. Heckman et al. (2000) document the empirical relevance of dropout and substitution for a variety of experimental evaluations of active labor market programs, with a particular focus on the JTPA evaluation. Three factors appear particularly important in explaining the extent of 101. Doolittle and Traeger (1990, section 5.11) makes the positive (but weak in our view) case for the representativeness of the sites in the JTPA evaluation. 102. The least attractive design compromise allowed the experimental sites to provide control group members with a list of alternative service providers in the community, thereby increasing substitution and muddying the interpretation of the counterfactual. At one site, this list ran to over ten pages! 103. See, for example, Hotz, Imbens, and Mortimer (2005), Gechter (2014), Muller (2015), or Vivalt (2015). 104. Our usage follows that of Heckman, Smith, and Taber (1998), who denote as “dropouts” those individuals randomly assigned to the treatment group in the JTPA experiment who never enroll in the program. This usage makes more sense in their context than it might seem at first blush because, as documented in Kemple, Doolittle, and Wallace (1993), many treatmentgroup members received (typically low intensity) services without formal enrollment for reasons related to the JTPA performance- management system. More generally, the literature tries to capture variation in the extent of treatment among those with some contact with a program in a variety of ways, such as categorizing individuals who receive no substantive services as “no- shows” or by estimating a “dose- response” function that links outcomes to the amount of service received, or via the related notion of different impacts for different combinations or sequences of services.

Employment and Training Programs

183

dropout and substitution in evaluations of employment and training programs in the United States. The first factor is the extremely decentralized nature of the provision of employment and training programs. Many federal government programs have an employment and training component, as do some state programs and many nonprofit social service organizations. This means that would-be trainees who get randomized out of one training opportunity can easily find others. For instance, because community colleges provide much of the training in WIA (and now WIOA) control group members can easily enroll in the same or similar courses on their own. Second, high intensity, expensive programs tend to have low rates of dropout, presumably because they appear valuable to potential participants, and low rates of substitution because the supply of substitutes is far smaller for expensive services than for inexpensive ones. For example, both the Supported Work Demonstration and the Job Corps evaluation have very low rates of dropout and substitution: 0.05 and 0.11 for the former and 0.28 and 0.02 for the latter.105 Both are quite expensive. In contrast, the more modest services on offer in the JTPA evaluation elicited substantial rates of both substitution and dropout. For example, according to Heckman et al. (2000, table II), among adult women recommended to receive classroom training in occupational skills, 48.8 percent of the treatment group actually did so, compared to 27.4 percent of the control group (who received it from other sources, or from the same sources with different funding). Finally, the experimental design itself can affect the amount of treatment group dropout via its interaction with the process of program participation. For instance, random assignment in the JTPA evaluation took place at the JTPA office rather than at the service provider locations for cost reasons, but doing so introduced a temporal wedge between assignment and service receipt that allowed some treatment group members time to find a job or wind up in jail or just get dissatisfied with the services offered to them. Researchers typically adopt one of two strategies in the presence of dropout and/or substitution. The first strategy redefines the parameter of interest to represent the average effect of the offer of treatment (sometimes called the “intention to treat” or ITT), relative to a (possibly complicated) counterfactual, rather than the ATET. For example, one can think of the experimental contrast in the JTPA study as between a treatment group with access to all of the various treatment options in the community including JTPA, and a control group with access to just the options other than JTPA. This represents a reasonable causal parameter, but also one quite different in substantive interpretation from treatment versus no treatment. The second 105. See Heckman et al.(2000, table 1) for NSW and Schochet, Burghardt, and McConnell (2008, table 2) for the Job Corps study. For the Job Corps, the substitution number includes only crossovers who actually received Job Corps despite randomization into the control group; the fraction of the control group that received some sort of educational treatment was about 72 percent.

184

Burt S. Barnow and Jeffrey Smith

strategy rescales the experimental difference in outcomes by the difference in the probability of treatment between the treatment group and the control group. The resulting estimand represents the mean impact of treatment on the treated when the experiment features dropout but not substitution, and a Local Average Treatment Effect (LATE) when both are present.106 The LATE gives the mean effect of treatment on those induced to participate as a result of ending up in the treatment group rather than the control group, whom the literature calls “compliers.” It says nothing about the mean impact on those who would get treated in either state, the so-called “always takers” in the language of Angrist, Imbens, and Rubin (1996). The LATE is also a reasonable and often interesting causal estimand, but it differs from both the ATET and from the ITT. Again, comparisons with nonexperimental estimates of the ATET require care.107 The decade since LaLonde (2003) has seen a combination of triumphalism and humility among advocates for greater social experimentation. The triumphalism comes from the rapid movement of policy- relevant random assignment designs into development economics and into education, and the broader “credibility revolution” described by Angrist and Pischke (2010) and the related enthusiasm for “evidence- based policy.”108 See, for example, Gueron and Ralston (2013) and Institute for Education Sciences (2008) for more of this view. At the same time, the practice of randomized evaluations has become much more nuanced, with greater attention to the role of dropout and substitution, to the importance of careful definition and interpretation of the estimand in the context of heterogeneous treatment effects, and to the fact that random assignment does not magically overcome the general problems of either empirical research (e.g., outliers and so on; see Heckman and Smith [2000]) or of partial equilibrium evaluation (see our discussion of general equilibrium issues below). Also relevant in this sense is the pendulum in economics swinging back toward a balanced approach that emphasizes both the depth of the economics and the quality of the identification strategy, a view that naturally sees experiments and “structure” as complements as in, for example, Todd and Wolpin (2006) rather than as substitutes. We aim here to walk down a middle road on random assignment, avoiding excessive cheerleading and excessive cynicism, both of which one can find in the current methodological debate in development economics. Instead, we view the deliberate creation of high- quality exogenous variation as an important complement to the other activities that economists and other 106. The LATE is called the “Complier Average Causal Effect” or CACE in some literatures. 107. For more on these issues see, for example, Bloom (1984), Heckman, Smith, and Taber (1998), or Heckman et al. (2000). 108. One always wonders what it was they were doing before “evidence- based policy.” It is probably best not to think too hard about that.

Employment and Training Programs

185

evaluators of programs undertake. We think the literature needs more, and more thoughtful, use of randomization. The continued flourishing of experimental evaluation has coincided with ongoing progress in nonexperimental evaluation, spurred on in part by improvements in the available data (particularly in Europe, but also to some extent in the United States) and in part by developments (and rediscoveries) in the realm of applied econometrics. Methods for solving the selection problem via conditioning on observed covariates consume most of our attention, as they do in most of this literature. We then briefly remark on developments and applications of other identification strategies, such as discontinuity designs. Selection on observed variables identification strategies attempt to solve the problem of nonrandom selection into training (or a program more generally) via conditioning on a sufficiently rich set of observed covariates. Put differently, under this strategy the researcher tries to make the case that they have observed all the variables, or good proxies for all the variables, that affect both participation in training and outcomes in the absence of training. In formal notation, the researcher assumes either E(m + ´ | X ,D) = 0 = 0 in the case of parametric linear regression, or E(m + ´ | X ,D = 1) = E(m + ´ | X ,D = 0) for matching and weighting estimators. Depending on the researcher, this assumption might get called the “conditional independence assumption” (CIA), “exogeneity,” or to use the awkward term contributed by the statistics literature, “unconfoundedness.” Making these assumptions is easy; making a compelling case for them is not. The literature has responded to the task of learning what variables suffice for the conditional independence assumption in three ways. First, some studies implicitly adopt the view that, perhaps because of some benevolent identification deity, the data at hand necessarily include some set of conditioning variables that suffice for the CIA. Indeed, some writers implicitly hold this faith with such fervor that they see no need even to attempt an explicit case for the CIA. More serious researchers make an explicit case for the CIA based on theory, institutions (sometimes helpfully embodied in high- quality process analyses), and existing empirical knowledge. Theory, like our simple model above, suggests the importance of transitory outcome shocks and of fixed characteristics that affect both outcomes and participation. The former signals the importance of conditioning on histories of labor market outcomes in the period prior to the decision to take training or not, and of doing so flexibly and at a relatively fine level of temporal detail. The latter signals the importance of conditioning on things like ability and motivation, or at least compelling proxies for them. Longer lags of labor market outcomes (i.e., before the “dip”) often assume this proxy role. Existing evidence relevant to the justification of conditional independence

186

Burt S. Barnow and Jeffrey Smith

assumptions takes a number of forms. One very common form in this literature arises from “within- study” comparisons that use experiments as benchmarks to learn which conditioning variables lead to nonexperimental estimates based on the CIA that replicate (up to statistical variation) the experimental estimates and which do not. A long series of papers starting with LaLonde (1986) and Fraker and Maynard (1987) and continuing through Dehejia and Wahba (1999, 2002), Heckman et al. (1998), and Smith and Todd (2005a, 2005b) embodies this idea.109 These papers also highlight the importance of conditioning flexibly on labor market outcomes in the period prior to participation. A second form of evidence on conditioning variables comes from studies that take a relatively compelling set of conditioning variables and add in an additional set of more novel conditioning variables. If the impact estimates move upon adding the new variables, they matter and future evaluations should include them. If estimates do not change much, then the new variables do not aid in solving the selection problem at the margin.110 That is very useful knowledge as well, as it helps avoid spending resources collecting data on variables not necessary to solve the selection problem and (not unrelated) increases the credibility of future CIA- based evaluations that do not include them. Lechner and Wunsch (2013) provide a thorough analysis along these lines using the (very) rich German administrative data. Andersson et al. (2013) use the US Longitudinal Employer Household Dynamics (LEHD) data to examine the value of conditioning on the characteristics of the firm at which WIA participants last worked in addition to the usual flexible form in earnings and employment. They find, to their and our surprise, that the firm characteristics do not matter. A final way to think about justifying the CIA centers on the so-called support condition. Semiparametric and nonparametric estimators based on the CIA require variation in training status conditional on observed characteristics. Put another way, for any given set of observed characteristics, the data must include nontrainees to compare to the trainees. Lurking in the background, some unobserved instruments generate this conditional variation in training. Thinking about the nature of these instruments (random information shocks, distance from the training provider, and so on) can aid in making the case for the CIA in a given context. The econometric literature provides a wealth of semiparametric and 109. These papers do not always frame their analysis as we do here. Instead, some studies frame the question as “does matching work?” which in our view represents a very silly question indeed. Matching “works” when you match on variables that suffice for the CIA and it does not work when you do not. What matters is the conditioning set. 110. Note that having the estimates not move (much) when adding new variables does not imply that the old variables suffice for the CIA, though it suggests they do. The key is the absence of an additional unobserved factor, uncorrelated with all of the included covariates, that affects both participation and outcomes. See, for example, the discussion in Heckman and Navarro (2004).

Employment and Training Programs

187

nonparametric estimators that build on the CIA and complement the traditional parametric linear regression model. The most commonly used estimators in applied work in economics undertake nonparametric matching on the conditional probability of training—the so-called propensity score given by Pr(D = 1 | X ). With a parametric (though ideally relatively flexible) propensity score model (typically a logit or probit) this general class of semiparametric estimators balances parametric assumptions with unidimensional nonparametric flexibility. The economics literature sometimes frames this class of estimators as nonparametric regression estimators. Essentially the matching implicitly estimates a nonparametric regression of Y0 on the estimated propensity score and uses predicted values from that estimated regression as estimates of the expected counterfactual for each trainee. A lively Monte Carlo literature that includes Frölich (2004), Huber, Lechner, and Wunsch (2013), and Busso, DiNardo, and McCrary (2014) guides the applied researcher in choosing among the many available estimators.111 A variety of other identification strategies allow the evaluation of active labor market programs but have not attracted wide use in the recent empirical literature on training programs. These include the bivariate normal selection model, instrumental variables, regression discontinuity, and the bias stability assumption that justifies the difference- in-differences estimator. The bivariate normal model has fallen out of favor with labor economists in recent years for several reasons, including a growing aversion to difficultto-justify functional form assumptions and the realization that sensible application of the model requires a hard- to-find exclusion restriction (i.e., a variable affecting outcomes only via its effect on training participation).112 Similarly, instrumental variables methods have seen little use in the training literature due to a paucity of plausible instruments.113 Discontinuity- based methods have run rampant in many quarters of applied economics in the past decade (see Cook [2008] for their history), but they play almost no role in the training literature. We know of only two examples. One is the Urban Institute’s evaluation of the High Growth Job Training Initiative (HGJTI) in Eyster et al. (2010), which lacks compelling results due to the deadly combination of a modest sample size and a high 111. An odd history of applied econometrics aside: some of the CETA evaluations summarized in Barnow (1987) and reconciled in Dickinson, Johnson, and West (1987) anticipate the “Coarsened Exact Matching” (CEM) of Iacus, King, and Porro (2012) that has gained some traction in literatures outside of economics. 112. See Puhani (2000) for a survey of the literature on the bivariate normal model and Bushway, Johnson, and Slocum (2007) for examples of how things can go wrong in practice. 113. Frölich and Lechner (2010), which studies Swiss active labor market programs, is the only example of which we are aware in the recent literature. Back in the dinosaur days, Mallar et al. (1982) used distance to the Job Corps center as an instrument. The marginal treatment effect approach, a semiparametric generalization of the classic bivariate normal selection model well described and illustrated in Carneiro, Heckman, and Vytlacil (2011), also awaits application in the training literature.

188

Burt S. Barnow and Jeffrey Smith

variance outcome variable. The other is the analysis of the WPRS in Black, Galdo, and Smith (2007). This astounding lack of discontinuity designs relative to, say, the evaluation literature in K- 12 education, results from, in our view, two factors. First, even before researchers started thinking along these lines, educational institutions had a lot of policy discontinuities built in. We do not have a compelling technological explanation for this difference across substantive domains, but it meant that researchers had lots of lowhanging fruit to pick when the design became salient in economics. Second, the employment and training world has seen little in the way of attempts to “design in” discontinuities that can serve as the foundation for causal research, even though they seem quite natural for courses that require some level of academic preparation as measured by a test score. Another design common in other literatures and much less common in the evaluation of employment and training programs builds “differencein-differences” or other panel estimators atop assumptions about bias stability. Bias stability holds that, possibly conditional on observed characteristics, the difference in untreated outcomes between participants and nonparticipants is constant over time. As a result of the temporal stability, differencing takes care of bias due to selection on unobserved variables. At the individual level, the bias stability assumption runs afoul of Ashenfelter’s dip, which clearly indicates selection on both time- varying and timeinvariant unobserved variables. Heckman and Smith (1999) show the trouble this causes; in particular, they emphasize that the choice of the “pre” period matters tremendously in the presence of the dip. Another style of difference- in-differences study operates at the jurisdictional level rather than the individual level, and exploits policy changes that occur in some but not all jurisdictions. This type of study faces two difficulties in the training context. First, very little of the policy action in the training world occurs at jurisdictional levels that correspond to standard data sets, such as the state- level analyses often used to study things like the minimum wage or (in the old days) the minimum legal drinking age. Second, data detailing the variation in policies across jurisdictions requires a lot of digging because no one collects it and disseminates it in easy- to-use form, as Huber, Kassabian, and Cohen (2014) do for the TANF program. With new programs a staged roll- out, ideally with the jurisdictional timing randomized, allows the application of this design. We know of one attempt along these lines, namely the Social Security Administration’s evaluation of the Ticket- to-Work voucher program for disability recipients (see Stapleton et al. [2008] for details). 3.7

Data and Measurement Issues

Though it might come as a surprise to some, evaluations of training programs have many dimensions besides the quality of their causal identifica-

Employment and Training Programs

189

tion. These attributes affect the quality of the impact estimates, in the sense that they affect the amount of error they contain relative to the parameter of interest that arises from sources other than sampling variation and selection bias. They also affect the interpretation of the obtained impact estimates and the value of the cost- benefit analysis built around those estimates. This section discusses issues related to the measurement of training and of labor market outcomes, while the following section discusses various issues that arise in the context of cost- benefit analysis. Data on service receipt, service type, and service timing play an important role in evaluating training programs, yet we know very little about how best to measure these variables or about the quality of existing measures. Typically, data on services come from one of two sources: surveys or administrative data. These sources have differing strengths and weaknesses that vary somewhat across contexts. Administrative data avoid issues with a failure to recall receipt of services, particularly low- intensity services, by survey respondents. In some cases, the use of administrative measures of enrollment in performance- management schemes (we say more about these below) may increase the quality of these data, which caseworkers might otherwise have little incentive to enter with care. But use in performance management can cut both ways. In the JTPA evaluation, as documented in Kemple, Doolittle, and Wallace (1993, table 4.5), many treatment- group members not formally enrolled in JTPA nevertheless received JTPA services because caseworkers avoided enrolling potential participants as long as possible because only those actually enrolled counted for performancemanagement purposes. On the other hand, survey data may catch services received at programs other than the one under study, as well as services explicitly not recorded in administrative data systems, such as public core services (e.g., free computers to use to search for jobs) in some WIA programs and the Employment Service. And in contexts where control or comparison group members may receive training or other services from (often quite numerous) other programs and providers, surveys of the individuals in the evaluation may represent the only cost- effective way to characterize the counterfactual. Smith and Whalley (2015) compare data from surveys and administrative records for treatment- group members in the JTPA experiment. They find a substantial amount of underreporting of services received in the survey data relative to the administrative data. Looking at particular services, they find that respondents appear to do a better job of reporting services that happen in classrooms, such as formal training in occupational skills or adult basic education, than they do at reporting services such as job- search assistance or subsidized on- the- job training at private firms. Reported start and stop dates of training also often differ substantially between the two sources, though here it is less clear which source should be viewed a priori as containing less measurement error. More broadly, their paper suggests the value

190

Burt S. Barnow and Jeffrey Smith

of both additional research on training measurement and of paying more attention to the quality of administrative data. The same choice between survey data and administrative data arises when considering how best to measure labor market outcomes such as earnings and employment. As with the measurement of the timing and incidence of training, neither source strictly dominates the other. This is particularly true in the context of the disadvantaged populations served by means- tested government training programs. For example, administrative data typically miss sources of labor income outside the formal labor market and thus not reported to the authorities. These sources may include illegal activities like drug dealing or prostitution, as well as legal but informal activities such as child care, hair care, automotive repairs, and so on. To the extent that training programs move their trainees from such informal work into formalsector jobs, reliance on administrative data on earnings and employment overstate program impacts due to undercounting informal earnings and employment among nontrainees. At the same time, administrative data likely measure earnings and employment in the formal sector with less error than do survey data, particularly as the recall period lengthens. Kornfeld and Bloom (1999) show that measurement differences between survey and administrative data (from state Unemployment Insurance records) matter for the impact estimates obtained for the male youth subgroup in the JTPA experiment. More recently, Barnow and Greenberg (2015) show that measurement differences between survey and administrative data (usually from Unemployment Insurance wage records) often have large effects on estimated earnings impacts in the eight randomized controlled trials examined. Earnings and employment measures within the broad categories of survey and administrative data differ as well. For example, among administrative data sources, state Unemployment Insurance (UI) wage record data do not include the earnings of many government employees or of the selfemployed, while IRS earnings data do. Neither includes the value of fringe benefits, which Hollenbeck and Huang (2014) estimate at about 20 percent of earnings for this population. Smith (1997, table 11) shows nontrivial differences in self- reported annual earnings from a simple summary question versus earnings built up from more detailed information about wages, hours, and weeks worked on individual jobs.114 Whether or not measurement error matters for impact estimates depends on its correlation with treatment status, as with the example above where training moves trainees away from informal work. See, for example, Angrist and Krueger (1999), Bound, Brown, and Mathiowetz (2001), and Hotz and Scholz (2002) for more on measuring labor market outcomes in general, and Kornfeld and Bloom (1999), Wallace and Haveman (2007), Schochet, Burghardt, and McCon114. The exact wording of the question from the background information form is “In the past year (twelve months), how much did you earn (before taxes and deductions)?”

Employment and Training Programs

191

nell (2008), and Barnow and Greenberg (2015) for discussions specific to an evaluation context. 3.8

Issues for Cost-Benefit Analysis

Cost- benefit analysis provides a framework for combining impacts on a variety of outcomes, expressing them in common (i.e., dollar) units, and comparing their discounted present values to the present costs of training. Such analyses add substantial value to impact estimates for that subset of programs that produce positive impacts on at least some outcomes of interest. In our view, the cost- benefit analysis produced as part of the National Job Corps Study and documented in McConnell and Glazerman (2001) represents the best among the evaluations we survey here; we also draw inspiration from the analyses in Heckman, LaLonde, and Smith (1999, section 10.1) and in Andersson et al. (2013, section 12). All of these exercises compare average costs to average impacts of treatment on the treated; the literature would also benefit from attempts to compare marginal costs to benefits on marginal participants. In many cost- benefit analyses, the magnitude, and sometimes even the sign, of the net present value will depend on a number of important choices about which the researcher may have only limited knowledge. Like Heckman, LaLonde, and Smith (1999), we favor thoughtful sensitivity analysis in such cases, so that the consumer of the cost- benefit analysis comes away with a clear understanding of the amount and sources of sensitivity in the calculations. One common limitation concerns the duration of follow-up data. Because training programs typically exhibit “lock- in” effects—negative impacts during the training period due to labor market withdrawal—any hope of finding enough positive impacts to pass a cost- benefit test depends on having follow-up for a reasonably long period after training. At the same time, evaluation delayed may mean policy influence denied, which argues for not waiting around too long for more follow-up. If the evidence suggested that positive impacts always persisted once they started, this issue would become less important, as researchers could feel relatively confident when projecting impacts out beyond the data. Sadly, what evidence we have provides a mixed picture about impact persistence. For instance, Couch (1992) shows that impacts on earnings from the National Supported Work Demonstration remain rock solid for many years; similarly, the US General Accounting Office (1996) finds that earnings impact from the JTPA experiment also appear relatively consistent over five years. In contrast, Schochet, Burghardt, and McConnell (2008) show that earnings impacts from the Job Corps experiment fade out over time. Greenberg, Michalopoulos, and Robins (2004) find some evidence of sex differences in impact persistence between men and women in their meta-analysis but lament, as do we, the

192

Burt S. Barnow and Jeffrey Smith

absence of many evaluations with more than three years of follow-up data. Longer- term follow-up using administrative data for both past and future experiments would add to our knowledge base on this dimension at relatively low cost. Another common limitation concerns impacts on outcomes other than employment and earnings and/or on household members other than the trainee. We might, for example, expect job training to affect criminal activity, both by consuming the time of the trainee (idle hands . . .) and, in the event that training leads to employment, by increasing the opportunity cost of getting caught. We might also expect job training to affect health. For female participants, training might affect the timing or incidence of fertility. Finally, we might expect training to have effects on other household members’ choices regarding schooling and work, and possibly regarding divorce or coresidence as well. Though often hypothesized, we know of only two US studies of general training programs (as opposed to programs specifically for ex-convicts, for example) that have actually attempted to measure such effects, namely the National Job Corps Study experimental evaluation (discussed in more detail below) and the earlier nonexperimental evaluation of the same program. Both of these studies devote a fair amount of effort to estimating the impact of the Job Corps on the criminal activities of participants, monetizing the resulting impacts, and then incorporating them into their cost- benefit analyses (see Mallar et al. [1982] and McConnell and Glazerman [2001]). In both cases, they find that a substantively important component of program benefits, particularly in the first year, comes from reductions in criminal behavior, reductions that presumably result from the residential nature of the program, which separates participants from both dubious friends and opportunities for profitable misbehavior. Elsewhere in the world, Lechner and Wiehler (2011) find some effect of Austrian training programs on the fertility of female participants. In our view, further work along these lines, whether via survey data or matched administrative data, would provide a richer view of the overall effects of training programs.115 A dollar of government revenue to spend on training costs society more than a dollar in lost output due to costs associated with collecting the revenue. These include the direct costs of operating the tax collection system (e.g., the Internal Revenue Service and all the tax preparers and accountants) as well as the indirect costs due to the use of distortionary taxes. For example, income taxes distort choices between labor and leisure in ways 115. In addition to their value in a thorough cost- benefit analysis, examination of outcomes beyond just earnings and employment levels also informs our understanding of the mechanisms by which programs bring about any impacts on earnings and employment. For instance, studies that examine the effects of employment and training programs on the durations of subsequent employment and unemployment spells, such as Ham and LaLonde (1996) and Eberwein, Ham, and LaLonde (1997) both illuminate causal mechanisms and provide guidance on the likely persistence of impacts on employment and earnings beyond the available data.

Employment and Training Programs

193

that reduce welfare relative to a world with (nondistortionary) lump sum taxes. Not surprisingly, calculating the direct cost per dollar of government revenue proves relatively uncontroversial, while estimating the indirect costs proves quite complex and controversial, enough to generate a large literature and even a book, namely Dahlby (2008). The public finance literature calls one plus the sum of the direct and indirect costs of the marginal tax dollar the marginal social cost of public funds (MSCPF). We do not take a stand here on the correct value for the MSCPF other than that it exceeds one. Rather, we note that even otherwise very nice cost- benefit analyses such as that in the National Job Corps Study err by leaving it out, and we recommend the sort of sensitivity analysis using a range of MSCPF values drawn from the literature that appears in Andersson et al. (2013). Another puzzling lacuna in many cost- benefit analyses concerns the value of the “leisure” of the participants. Consider an individual who would receive training for six months and then work for eighteen months in the first two years after random assignment to the treatment group in an evaluation of a training program, but who would stay at home and care for their children for two years if assigned to the control group. The standard analysis values the employment based on the earnings received and implicitly assigns a value of zero to caring for the children at home. The latter appears in the cost- benefit analysis only indirectly if the child care used in the treated state receives a government subsidy (and the analysis sweats such details). As discussed in Greenberg and Robins (2008), the standard analysis gets the economics wrong by omitting the value of the participants’ counterfactual nonmarket time. This omission leads to a systemic overstatement of programs’ cost- benefit worthiness, as illustrated by Greenberg and Robins (2008) for the case of the Canadian Self-Sufficiency Project earnings supplement. Finally, doing a good cost- benefit analysis requires good data on costs. In the case of evaluations estimating the ATET of the program as a whole, this means data on average per- participant costs. In the case of evaluations comparing difference services, it requires data broken down by service type. For evaluations that focus on marginal participants, it requires data on marginal costs. As discussed, for example, in Andersson et al. (2013) for WIA, most US programs lack any useable data on marginal costs as well as lacking data on average costs broken down by service type, client type, or geographic location.116 3.9

General Equilibrium Effects

None of the evaluations we consider in this chapter accounts for general equilibrium effects. In the context of training programs, equilibrium effects 116. Heinberg et al. (2005) and Barnow and Trutko (2015) document the conceptual and empirical challenges associated with cost measurement in the context of employment and training programs.

194

Burt S. Barnow and Jeffrey Smith

typically take two forms: displacement and changes in relative skill prices.117 Displacement, the focus of most papers in this literature that attend to equilibrium effects at all, occurs when program participants take jobs that others would have taken in the absence of the program. This could result from their leaping ahead in the queue due to enhanced qualifications or due to changes in optimal search effort. In either case the control or comparison group used in a partial equilibrium evaluation will likely contain only a very small fraction of those displaced, meaning that such an evaluation will overstate the social impacts of the program. Changes in skill prices result when training programs increase and decrease the supply of particular types of skills in local labor markets. For example, if a training program trains many nail technicians in a particular locality, we expect the relative wages of nail technicians to fall due to increased supply (and doubters of this sort of scenario should read Boo [2004]). Again, such effects will lead a partial equilibrium evaluation, whether experimental or nonexperimental, to overstate the overall economic benefits to the training program. General equilibrium evaluations typically take one of two approaches. The first makes use of spatially distinct local labor markets that have plausibly exogenous variation in program scale. Different outcomes for the nontreated in localities with a large program presence relative to those with a small program presence indicate equilibrium effects. Examples of such studies outside the United States include Forslund and Krueger (1997) in Sweden and the astounding two- level random assignment study in France by Crépon et al. (2013). We know of no such studies for US programs. The second strategy writes down a complete equilibrium model and estimates or calibrates the model to obtain estimates of the size and nature of any equilibrium effects. Though we know of no US training programs evaluated using this strategy,118 Davidson and Woodbury (1993) uses a calibrated search model to estimate the equilibrium effects of UI bonuses (lump- sum payments to UI claimants who end their claim early) on the search effort of unemployed workers not eligible for the bonuses. Along similar lines, Lise, Seitz, and Smith (2004) calibrate a search model to examine the equilibrium effects of the Canadian Self-Sufficiency Project. In contrast, Heckman, Lochner, and Taber (1998) estimate a dynamic, stochastic, general equilibrium model in their study of the equilibrium effects of a $500 subsidy to university tuition. In their model, the equilibrium effects work through changes in the relative skill prices of high school- educated and universityeducated labor. All three studies find substantively important equilibrium effects; in the Lise, Seitz, and Smith (2004) paper they suffice to overturn 117. Deterrent effects may matter for mandatory programs; see, for example, Black et al. (2003) and the broader European literature surveyed in McCall, Smith, and Wunsch (2016). We follow the Office of Management and Budget (1992) in passing on “magic” multiplier effects. 118. Johnson (1979) considers displacement effects in an early (i.e., pre-search) equilibrium framework.

Employment and Training Programs

195

the positive verdict of a partial- equilibrium, cost- benefit calculation. More work along these lines, including greater emphasis on the potential for equilibrium effects and some thinking about the contexts where equilibrium effects will and will not matter very much, would improve our understanding of the effects of training programs and of their fiscal worthiness. 3.10

Systematic Evaluation and Aggregation of Evaluations

Another important development since the publication of LaLonde’s (2003) survey centers on the systematic evaluation and aggregation of evidence across evaluation studies. The meta- analyses of evaluations of active labor market programs from many developed countries summarized in Card, Kluve, and Weber (2010, 2015) provide a fine example of this. Greenberg, Michalopoulos, and Robins (2003) undertake a similar meta- analysis restricted to US evaluations. Meta- analysis in this context means estimating so-called “meta-regressions” in which impact estimates from various evaluation studies (often for particular subgroups) form the dependent variable and various characteristics of the evaluation (e.g., was it experimental or not) of the program (e.g., classroom training or job- search assistance, program duration), of the participants (e.g., men or women, youth or adults), and of the context (e.g., the unemployment rate) comprise the independent variables. This differs from the original use of meta- analytic techniques in the medical literature to combine multiple underpowered studies of the same treatment applied to the same population. Here the (quite different) goal consists in accounting for the variation across studies. One perhaps surprising result from the Card, Kluve, and Weber (2010) meta- analysis is that, conditional on controlling for other features of the evaluation, the estimates provided by experimental and nonexperimental methods do not differ very much on average. The Clearinghouse for Labor Evaluation and Research (CLEAR) website represents another flavor of evaluative aggregation. Inspired by the US Department of Education’s What Works Clearinghouse (colloquially known, with some justification, as the “Nothing Works Clearinghouse”), and thus indirectly by the Cochrane Collaboration in health and the Campbell Collaboration in social policy, CLEAR grades evaluations of labor market programs relative to fixed standards of quality, and also provides summaries of evidence.119 The latter take two main forms: one comprises quality- weighted reviews of the literature related to specific programs or classes of programs and the other represents, essentially, summary translations (from research speak into regular English) of evaluations for prac119. See clear.dol.gov for CLEAR, ies.ed.gov/ncee/wwc for the WWC, www.cochrane.org for the Cochrane Collaboration, and www.campbellcollaboration.org for the Campbell Collaboration.

196

Burt S. Barnow and Jeffrey Smith

titioner and policymaker audiences. The CLEAR differs from the WWC along several dimensions, some of which result from the much smaller size of the relevant literature, some from the generally lower financial stakes facing researchers who write the evaluations on the site, and some from the fact that the quality of the literature on employment and training programs has historically far exceeded that on educational interventions, with the implication that CLEAR does not, unlike WWC, explicitly see part of its mission as raising an entire field out of the research muck. While we acknowledge the difficulties in coming up with generally applicable and reasonably objective standards for evaluations, we think that CLEAR plays a very useful role in publicly grading studies relative to a good shot at such standards. Importantly, both CLEAR and WWC include mechanisms for updating the grading standards as applied econometrics moves forward over time, though neither site has successfully dealt with the problem of studies that exceeded the methodological standards of their day but fall short of the standards of the present. 3.11

Review of Research on Program Impacts

Rather than repeat earlier summaries in the literature of pre- 2000 evaluations such as those in LaLonde (2003) and Heckman, LaLonde, and Smith (1999), we focus our energies here primarily on recent, high- quality evaluations of major federal programs, namely WIA, the Job Corps, and TAA. 3.11.1

Workforce Investment Act

We consider the four closely related papers that examine the Workforce Investment Act (WIA): Hollenbeck (2009), Heinrich et al. (2013), Andersson et al. (2013), and Heinrich and Mueser (2014). These evaluations share a common basic design, in part because they share a common foundation of administrative data sources. Each of these papers combines administrative data from the WIA program—formally the WIA Standard Record Database (WIASRD) data—with data on earnings by calendar quarter from state Unemployment Insurance records. The WIASRD data, in addition to program- related information on enrollment and termination dates and services received, also include basic demographic information as well as limited information on schooling.120 All four papers focus their estimation energies on one or both of two parameters: the mean effect of receiving any WIA services relative to not receiving any WIA services on those who receive them (hereinafter the “W-ATET”) and the mean effect of receiving WIA training, and possibly other WIA services, compared to receiving one or more WIA core or inten120. See Decker and Berk (2011) and Van Horn, Krepcio, and Wandner (2015) for broader surveys of recent research on WIA.

Employment and Training Programs

197

sive services, but not training, on those who receive the training (hereinafter the “T-ATET”). Both parameters answer interesting policy questions, though we note the absence (necessarily, given the data) of any attempts to estimate impacts on marginal participants, those most relevant to thinking about the effects of small expansions or contractions in the WIA budget. In the WIA context, the two parameters present somewhat different challenges to the researcher. Andersson et al. (2013) argue that the T-ATET estimand embodies an easier selection problem than the W-ATET. We can think of two versions of this argument. First, due to the interplay of the economics and the institutions, WIA participants may differ from WIA nonparticipants more strongly in terms of observed and unobserved characteristics than do WIA trainees and WIA nontrainees. Second, we may just know more about how the WIA trainees differ from the WIA nontrainees via institutional knowledge about the service assignment process. Andersson et al. (2013) present evidence for the first claim by showing that pre-program mean earnings patterns differ only very modestly between the WIA trainees and nontrainees in their data relative to the differences found in other papers for WIA participants versus WIA nonparticipants. Bell et al. (1995) advance a closely related view in making the case for program dropouts as a comparison group for program participants (see also Heckman, Ichimura, and Todd [1997, section 15]). Another difference between the T-ATET and W-ATET estimands concerns comparison- group selection and the related problem of temporal alignment: that is, what time period to use as a baseline when coding up time- varying conditioning variables. The comparison group for the T-ATET is clear: it is the WIA participants who do not receive training. The temporal alignment problem for the T-ATET has a similarly straightforward solution: the natural choice aligns the WIA trainees and the WIA nontrainees based on their dates of WIA enrollment. All of the papers that estimate the T-ATET follow this course. In contrast to the T-ATET, the choice of comparison group for the W-ATET requires some thought and some trade- offs. Due to their reliance on administrative data, the WIA papers lack a version of the “ideal” comparison group of eligible nonparticipants collected as part of the JTPA experiment. Instead, data limitations require choosing among various candidate comparison groups based on their participation in other programs, as administrative data become available only via such participation. Rhetorically, the choice gets presented either as a practical alternative to the desired but too- expensive- to-obtain (because of the large number of screening interviews required) sample of eligible nonparticipants, with a case then made about the nature and size of the resulting bias, or as a particular way of defining the counterfactual of interest, so that the treatment contrast becomes WIA versus another program rather than WIA versus no WIA. Neither contrast necessarily dominates in terms of policy interest, but they

198

Burt S. Barnow and Jeffrey Smith

do differ in terms of the mix of related services received by comparison group members, a difference that affects interpretation and comparisons with other studies. In practice, the choice for researchers seeking to estimate the W-ATET boils down to either Employment Service (ES) participants or Unemployment Insurance (UI) claimants. Consider UI claimants first. This comparison group has the disadvantage that many WIA participants lack UI eligibility because they lack sufficient work experience to qualify for UI. This problem can be (and is, in these papers) “solved” by comparing UI claimant WIA participants to UI claimant nonparticipants. This is not an uninteresting comparison, but it does leave aside many important components of the WIA participant population, including welfare recipients and low- skill workers with spotty employment histories. Using UI claimants has the advantage that it simplifies the problem of temporal alignment as WIA participant and WIA nonparticipant UI claimants can be aligned based on their UI claim’s start date. As described in more detail earlier in the chapter, the ES dates back to the Wagner-Peyser Act of 1933 and provides labor- exchange services. The UI program requires virtually all claimants (other than those awaiting recall) to register with the ES.121 The ES also serves many other job seekers, including some currently employed but looking for a better match. The extent of ES integration with WIA varies substantially across states. Relative to using UI claimants as a comparison group, using ES registrants has the advantage of capturing a broader population, one that overlaps with more of the WIA participant population. The costs are twofold. First, the process that leads some job seekers who are not UI claimants to register for the ES and others not to do so is not well understood, but has implications for the interpretation of the ES comparison group counterfactual. Second, and not unrelated, while UI claimants typically register for the ES shortly after becoming unemployed, other job seekers may wait until initial job- search efforts fail before seeking help from the ES. This process complicates temporal alignment, as aligning WIA nonparticipant ES registrants with WIA participants using the ES registration date may do a bad job of implicitly conditioning on the duration of job search, something the literature suggests matters because it proxies for otherwise unobserved characteristics.122 The four nonexperimental WIA papers also share common identification strategies, as they all assume one or both of the conditional independence assumption and the bias stability assumption. The available data and institutional variation essentially force these choices. Unlike many educational institutions, WIA does not provide helpful discontinuities in treatment 121. For more on the ES see, for example, Balducchi, Johnson, and Gritz (1997). 122. In many European countries, centralized labor market institutions that link formal registration as unemployed to benefit receipt greatly simplify the temporal alignment problem.

Employment and Training Programs

199

assignment that depend on observed, difficult- to-manipulate running variables like test scores. Hence, an RD analysis would require purposive institutional changes. One could imagine using variation in services received due to exogenous variation in caseworker assignment (whether explicitly random or just “first available”), but the data typically available lack information on caseworkers and on the process that matches clients to caseworkers.123 Similarly, one can imagine an analysis that attempts to use distance to the One-Stop as an instrument in an analysis of WIA versus no WIA, but the available data lack residential addresses for comparison group members. No other credible instruments suggest themselves. At the same time, as discussed above, the literature provides some support for the idea that the available conditioning variables, particularly the lagged labor market outcomes provided by the UI data, may suffice to make identification of causal effects based on the Conditional Independence Assumption (CIA) or the Bias Stability Assumption (BSA) plausible. That is, these papers can, and sometimes do, make a positive case for a causal interpretation of impact estimates based on the CIA or BSA. Consider the case for the CIA first. As mentioned above, this case rests on claims about having a sufficiently rich set of exogenous conditioning variables to make it plausible that participation (i.e., in WIA or in training within WIA) is conditionally unrelated to the untreated outcome. To make this case, all four papers start out by forcing exact matches on particularly important covariates. Hollenbeck (2009) employs exact matching by sex and by region within Indiana. Heinrich et al. (2013) match exactly on sex and on state. Andersson et al. (2013) match exactly on state, but find similar estimates for men and women and so pool them in their preferred specifications. Heinrich and Mueser (2014) match exactly on sex and on calendar time. Exact matching identifies particular conditioning variables thought to have such a strong effect on both treatment choice and outcomes that allowing the inexact matches implicit in the application of propensity score methods in finite samples could lead to nontrivial bias. As discussed in LaLonde (2003), the earlier literature found consistent differences in the mean impacts of employment and training programs on men and women; combined with the broader evidence that men and women experience the labor market differently, this motivates exact matching by sex. The clear finding that local labor markets matter in Heckman et al. (1998) motivates exact matching on geography.124 For the reasons just noted, all the studies we consider include sex, calendar time, and geography at the substate level as conditioning variables, even if they do not match on them exactly. 123. Such a strategy would mimic the literature in criminology and the economics of crime that relies on randomly assigned judges as instruments for aspects of punishment severity (e.g., see Mueller-Smith 2015). 124. In some cases, such as sex, a desire to present subgroup estimates also motivates the exact matching.

200

Burt S. Barnow and Jeffrey Smith

All the studies also include education (categories for years of schooling), veteran status, and disability status. Education has an extremely welldocumented correlation with labor market outcomes, and also should affect participation via the opportunity cost. It should also matter for whether or not enrollees train or not as it signals the ability to successfully absorb complicated material presented in a classroom format as well as proxying for the participant’s taste, or distaste, for such activities. The two remaining major categories of conditioning variables represent recent histories of labor market outcomes (earnings and employment) in all four papers and recent histories of participation in various programs, including some or all of the ES, WIA, UI, and TANF in the Heinrich et al. (2013) and Andersson et al. (2013) papers. Heinrich et al. (2013) have the richest specification of recent program participation. Hollenbeck (2009) has a somewhat less flexible specification in terms of earnings and employment than the other two papers.125 The flexibility in all the papers builds on the notions that, first, zero earnings is different, so that indicators for zero earnings in a quarter should be included; second, that dynamics matter, so that strings of zeros and/or job loss just prior to participation matter; and third, that variability in earnings likely matters, which motivates inclusion of the earnings variance directly or of measures of particular types of changes in employment and earnings. Andersson et al. (2013) compare conditioning sets that include eight and twelve quarters of pre-program earnings information and find little difference in their T-ATET estimates, though given the modest differences in pre-program mean earnings they find for WIA trainees and WIA nontrainees we would hesitate to generalize this finding to the W-ATET. The UI administrative data do not allow these researchers to distinguish between zero earnings due to unemployment and zero earnings due to absence from the labor force, which Heckman and Smith (1999) find important. They also do not allow the finer level of temporal detail—namely, monthly rather than calendar quarter labor market outcomes—in the “pre” period emphasized in that paper. The empirical importance of these (relative) weaknesses in the data remains unknown. Andersson et al. (2013) do examine the value of conditioning on a set of variables related to the firm at which WIA participants most recently worked in their estimation of the T-ATET and find, to their and our surprise, that they add essentially nothing in terms of reducing selection bias (as indicated by the fact that the estimates hardly budge). Relative to the CIA, the BSA allows for the existence of selection into WIA, or into WIA training, based on time- invariant unobserved vari125. The full list of conditioning variables appears in table A- 1 for both Hollenbeck (2009) and Andersson et al. (2013). The conditioning variables for Heinrich et al. (2013) appear in Table A- 1 of the report that underlies their published paper, Heinrich, Mueser, and Troske (2008).

Employment and Training Programs

201

ables. The simple model we presented above comports with the BSA, but a more general model of selection on outcome trends would not. The JTPA experimental data suggest selection on trends for some demographic groups. Coincidence between estimates based on the CSA and estimates based on BSA suggests that the available conditioning variables suffice to solve the problem of selection on time- invariant characteristics. The four WIA evaluation papers apply somewhat different econometric estimators. Heinrich et al. (2013) apply many- to-one caliper matching followed by a linear regression bias- correction step. Andersson et al. (2013) use inverse propensity weighting (IPW) and single- nearest- neighbor matching with replacement. Hollenbeck (2009) uses single- nearest- neighbor matching with replacement and a caliper. Heinrich and Mueser (2014) also use IPW. The papers that assume both the CIA and the BSA simply replace the outcome level as the dependent variable under the CIA case with the beforeafter outcome difference as the dependent variable under the BSA. The methodological literature provides reasons to prefer some estimators over others. For example, Hirano, Imbens, and Ridder (2003) show conditions under which IPW attains the semiparametric efficiency bound. Inverse propensity weighting also avoids the troublesome bandwidth choices associated with nearest- neighbor and kernel- matching estimators. The Monte Carlo literature (e.g., Frölich 2004; Huber, Lechner, and Wunsch 2013; Busso, DiNardo, and McCrary 2014), reveals that single- nearest- neighbor matching with replacement typically has very low bias but a high enough variance that it typically performs poorly in mean- squared- error horse races. Bias correction via ex post linear regression using the IPW or matching weights can, but need not, improve finite sample performance. At the same time, in actual applications, variation in estimates due to different econometric estimators typically pales in comparison to variation due to, for example, changes in the conditioning set (see Plesca and Smith 2007, table 7). The four nonexperimental WIA evaluations do vary on two important dimensions: the states included in their data and the calendar time period during which the WIA participants they study participated in the program. Heinrich et al. (2013) attempted, with assistance from the US Department of Labor, to recruit all fifty states. They ended up with a nonrandom sample of twelve. Andersson et al. (2013) attempted to recruit nine states (selected based on size and ex ante likelihood of cooperation) and ended up with just two. In both studies, the states declined to have their names attached to statespecific impact estimates, which of course makes it difficult to even casually link those impacts to features of state programs and economic contexts. The unwillingness of many states to provide data for high- quality evaluations provided at very low cost, or to have their state- specific impacts identified when they do, provides stark evidence of the importance of issues of monitoring and control between taxpayers as principals and state program administrators as their misbehaving agents. It also limits what studies such

202

Burt S. Barnow and Jeffrey Smith

as these can add to our store of policy- relevant knowledge. In contrast, Hollenbeck (2009) and Heinrich and Mueser (2014) examine single, identified states, namely Indiana and Missouri, respectively. The Andersson et al. (2013) paper has the earliest sample, which includes WIA registrants from calendar years 1999– 2005, inclusive, with the bulk in 2000– 2004. Their study thus includes the “dot com” recession of the early twenty- first century. Heinrich et al. (2013) study WIA registrants from July 2003 to June 2005, and Hollenbeck studies program exiters from July 2003 to June 2005; both papers thus focus exclusively on program performance in good economic times. Finally, Heinrich and Mueser (2014) focus by design on the Great Recession period by studying WIA registrants from June 2007 to June 2010. There is some European evidence from Lechner and Wunsch (2009) indicating that training programs have larger impacts in slack labor markets (due to worse comparison group outcomes), while the meta- analysis of US programs in Greenberg, Michalopoulos, and Robins (2003) suggests the reverse. Either way, the time period may matter in comparing estimates among the WIA studies. We now summarize the estimated earnings impacts from three of the four WIA nonexperimental studies.126 Given the focus of this chapter on training, the T-ATET impacts occupy most of our attention. We begin with those. Heinrich et al. (2013) present separate estimates for men and women and, within those groups, for the adult and dislocated worker funding streams. They produce separate estimates by state and quarter; within each quarter they produce an overall impact estimate by weighting the state- specific estimates by each state’s overall contribution to the trainee sample. As shown in their figure 5, for women in the adult stream they find a modest lock-in effect that lasts for three quarters followed by impacts that increase to around $800 per quarter and persist until the end of their sixteen quarters of post-enrollment data. For men in the adult stream they find essentially no lock-in effect, perhaps because men who receive subsidized on- the- job training at private firms have positive impacts in early quarters that cancel out the lock-in effect (on average) of the men receiving classroom training. In later quarters, positive impacts stabilize at around $500 per quarter; this lower absolute impact represents a much lower impact in percentage terms due to the higher average earnings of men in this population. Their figure 8 shows that both men and women in the dislocated worker stream have large and long- lasting lock-in effects and no clear positive impacts even at the end of the sample period. All estimates of any magnitude attain conventional levels of statistical significance. Guided by a specification test, Heinrich et al. (2013) report cross- sectional matching estimates of the T-ATET; the difference- in-differences estimates of the T-ATET in their report (Heinrich et al. 2008) tell the same story. 126. We do not present numerical estimates from Heinrich and Mueser (2014) as it has not yet been published or appeared in a formal working paper series.

Employment and Training Programs

203

The findings from Andersson et al. (2013) turn out to be similar in the large but differ in important ways in the small. Unlike Heinrich et al. (2013), they pool men and women but report separate estimates for their two states. Like them, they also separate out the adult and dislocated worker funding streams within states. The relevant estimates appear in their tables 4A to 4D. In their state A adults experience a three- quarter lock-in effect and then see impacts that gradually rise, stabilizing at around $300 per quarter by the time the data end at twelve calendar quarters after enrollment. In contrast, displaced workers in state A (a medium- sized state on the Atlantic seaboard) experience earnings losses of around $900 per quarter initially, trailing off to “only” about $125 per quarter. In their state B (a large, Midwestern state), the adults experience a quite similar pattern of impacts, but stabilizing at around $400 per quarter, while the displaced workers do much better: following a very long lock-in period their impacts rise to about $300 per quarter at the very end of the data. In addition to not finding clear differences in impacts between men and women, Andersson et al. (2013) also report looking for differential impacts by race/ethnicity and by years of schooling and not finding much difference on those dimensions either. They find that quite similar estimates emerge from their cross- sectional and difference- in-differences estimators; like them, we highlight the cross- sectional estimates.127 We can compare, in a very broad sense, the estimates of the T-ATET from these two studies to the estimates of the effect of training obtained by Heckman et al. (2000) by applying various nonexperimental estimators to the experimental data from the JTPA experiment on individuals recommended prior to random assignment to receive classroom training in occupational skills (and possibly other services, not including subsidized on- the- job training), the so-called “classroom- training treatment stream.” The JTPA experiment randomized adult participants and not dislocated workers (JTPA having the same distinction between these as WIA). Their table IV presents instrumental variables estimates while their table V presents cross- sectional and before- after estimates. In a very broad sense, and one should not push farther than that given the differences in programs, geographic locations, and identification strategies, they tell the same story here of substantively important but not completely implausible impacts of training on earnings following a lock-in effect. The W-ATET estimates in Heinrich et al. (2013) for the adult stream show positive impacts for women that start around $500 per quarter and rise to about $600 per quarter, and impacts for men that start around $800 per quarter and then sink fairly rapidly to around $500 per quarter. In stark contrast, the results for the dislocated worker stream reveal large and persistent lock-in effects that last about two years, followed by approximate zero 127. Most of the impact estimates of more than $300 in absolute value in Andersson et al. (2013) easily attain conventional levels of statistical significance, but with imperfect (and likely somewhat too small) standard errors. See their note 11 for additional details.

204

Burt S. Barnow and Jeffrey Smith

impacts for men and approximately $100 per quarter impacts for women. All of the estimates not approximately zero attain conventional statistical significance. Based on specification tests looking at differences in pre-period earnings, the authors present cross- sectional matching estimates for the adults and difference- in-differences matching estimates for the dislocated workers, though the cross- sectional results in Heinrich, Mueser, and Troske (2008) exhibit the same basic patterns. Hollenbeck (2009) presents estimates of the W-ATET from Indiana using ES registrants as the comparison group. Besides being from a different state, these estimates differ in their construction from those in Heinrich et al. (2013) because Hollenbeck (2009) measures outcomes from program exit (whether WIA or ES) rather than from program start. This not only omits an important part of the lock-in period for the WIA participants—one would not expect lock-in from the employment- focused ES—but also changes relative timing, as ES tends to have shorter enrollment spells than WIA. Hollenbeck’s (2009) analysis shows similar post- lock-in W-ATET impacts for adults as found in Heinrich et al. (2013), with relatively precise point estimates of $549 in the third quarter after exit and $463 in the seventh quarter after exit. In contrast, dramatic differences emerge in regard to the W-ATET for participants served under the dislocated worker funding stream. Here Hollenbeck (2009) finds relatively precise (the reader, unfortunately, receives stars rather than standard errors) estimates of $410 in the third quarter after exit and $310 in the seventh quarter after exit (the last quarter available for the full sample). Hollenbeck reports that in his analysis, as in Andersson et al. (2013), the conditional difference- in-differences estimates closely resembled those from cross- sectional matching; it is the latter that he anoints as his preferred estimates and which we highlight here. Where do the earnings impacts estimated in these studies come from? Do they result from increases in wages, from “intensive margin” increases in hours worked, or from “extensive margin” increases in employment? What about increases in the duration of employment spells via higher match quality and/or matches to “better” firms? The administrative outcome data used in the WIA studies allow only modest insights into the mechanisms underlying realized earnings impacts. Basically, they only allow the construction of impacts on employment, defined as nonzero earnings, and then only at the level of the calendar quarter. In each of the studies considered here, the employment estimates parallel the earnings estimates in the sense that positive earnings impacts coincide with positive employment impacts. The magnitudes relative to the earnings impacts do vary somewhat, with particularly large employment impacts relative to earnings impacts for the displaced worker W-ATET in Heinrich et al. (2013) and for both funding streams’ W-ATET in Hollenbeck (2009). Linking the usual administrative data to the Census Bureau’s Longitudinal Employer Household Dynamics (LEHD) data allows Andersson et al.

Employment and Training Programs

205

(2013) to estimate impacts of WIA training on the characteristics of the firms at which participants end up. They consider standard characteristics from the literature, including the LEHD firm “fixed effect” (bigger is better), firm turnover (less is better), and firm size (bigger is again better). They find (see their table 6) impacts of modest size that parallel the earnings impacts discussed above. Thus, for state by funding stream combinations with positive earnings impacts, trainees have a net improvement in employer quality in the twelfth quarter after WIA registration. The nonexperimental literature on WIA offers the reader methodological insights, useful findings for policy, and (at least) two puzzles. For adults, both W-ATET and T-ATET turn out positive and of reasonable magnitude in every study that presents them. Those findings justify continuing to provide similar services to a similar clientele under WIOA. In contrast, the literature offers heterogeneous findings for displaced workers. This leads to the first of the two puzzles: in the Heinrich et al. (2013) and Andersson et al. (2013) papers, why do the adult and dislocated worker programs have such astoundingly different impact estimates? The puzzle only becomes more complicated upon noting that almost all dislocated worker participants could have received services under the adult stream, while many adult participants could have received services under the dislocated worker stream. Second, whence the positive impacts for dislocated workers in Indiana in Hollenbeck (2009)? A Hoosier might argue that Indiana is just special, or perhaps especially well run, but the fact that Hollenbeck (2009) obtains similar results in two other state analyses not discussed in detail here (see his table 5) suggests some feature of his methodology as the culprit. Aligning participants and comparison group members relative to the timing of exit rather than the timing of enrollment represents an obvious candidate, but table 6 of Hollenbeck (2011) yields no smoking gun. Satisfactory resolution of both puzzles awaits future research. In addition to the four nonexperimental evaluations, the US Department of Labor presently has an experimental evaluation of WIA, called the “WIA Adult and Dislocated Worker Programs Gold Standard Experiment,” in the field. This evaluation compares three treatment arms for participants in the adult and dislocated worker funding streams: eligible just for core services, eligible for core and intensive services, and eligible for all services, including training. The comparison between the second and third arms will provide a benchmark of sorts for the nonexperimental evaluations that estimate the T-ATET, once adjusted for whatever level of treatment group dropout (from WIA) arises in the experiment. The WIA experiment will also provide the first experimental impact estimates for dislocated workers, who were omitted from the JTPA evaluation. As a result, it should shed some light on the puzzling difference in impacts between participants in the two funding streams in the nonexperimental studies. In sharp contrast to the site recruitment difficulties in the JTPA experi-

206

Burt S. Barnow and Jeffrey Smith

ment that led to serious concerns regarding external validity, the WIA experiment has done quite well on this dimension, apparently because it imposes a lower burden on sites by randomly assigning a smaller fraction of the intake to the control group. Its twenty- eight sites include twenty- six from an initial random sample of thirty, plus two additional randomly chosen replacement sites. Taken together, the sites will provide a sample size of around 35,000, substantially more than the 20,601 in the JTPA experimental sample. Results from the experiment, for which follow-up data collection is in progress as we write, should become public in 2016. When they do, they will contribute to both our substantive and methodological knowledge in important ways.128 3.11.2

Job Corps

We have very good evidence on the labor market effects of the Job Corps program thanks to an extensive experimental evaluation conducted in the mid- 1990s. In particular, the experiment randomly assigned eligible applicants at (almost) all Job Corps centers around the United States to either a treatment group eligible to receive Job Corps or to a control group excluded from Job Corps for three years. Random assignment took place from November 1994 through December 1995. The design of the (formally titled) National Job Corps Study (NJCS) overcomes two of the main issues that raised concerns about external validity in the JTPA experiment. First, by conducting random assignment at (almost) every Job Corps center, it removed concerns about nonrandom site selection; the fact that the Job Corps, unlike JTPA or WIA, is run directly at the federal level enabled this strategy. Second, on average, the experiment assigned only about 7 percent of applicants to the control group. As a result, sites did not have to recruit many additional potential participants in order to maintain the size of their operation while still filling in the control group. This reduces site burden and also reduces concerns about external validity; put differently, the NJCS can make a credible claim that the experimental impact estimates apply to the program as it normally operates. The research sample in the NJCS includes about 6,000 in the control group and about 9,400 in the treatment group; for cost reasons the evaluation collected data on only a random subset of those randomly assigned to the treatment group. The NJCS presents an interesting treatment contrast and, in so doing, highlights issues that arise in dealing with control group substitution. Around 73 percent of the treatment group enrolled in the Job Corps, with an average enrollment duration of about eight months. Only 1.4 percent of the control group defeated the experimental protocol by enrolling in the program during the embargo period. At the same time, and not at all surprisingly given the age of the applicants and their expressed interest in programs 128. See http://www.mathematica- mpr.com/our- capabilities/case- studies/evaluating- the - effectiveness- of-employment- and- training- services for more.

Employment and Training Programs

207

to improve their human capital, 71.7 percent of the control group enrolled in some sort of education or training program during the forty- eight months after random assignment. Some treatment group members also enrolled in programs other than the Job Corps, so that in total 92.5 percent received some sort of education and training. Thus, focusing strictly on incidence, the treatment increases receipt of some education and training by about 21 percentage points. At the same time, incidence misses much of the story here due to the substantial difference in intensity. The options facing control- group members do not include long- duration residential programs like Job Corps. As a result, the difference in mean hours of education and training between the treatment and control groups (including all the zeros) equals 710, or about eighteen weeks of full- time activity. We focus here on the “intent to treat” (ITT) impacts estimated using matched earnings records from the Social Security Administration. Schochet, Burghardt, and McConnell (2008) document nontrivial differences between these estimates and those obtained using survey data and using administrative data from state UI systems. The ITT require careful interpretation in light of the nature of the treatment contrast presented by the experiment as described above. As expected given the timing of random assignment, estimated annual impacts for calendar years 1995 and 1996 equal –$270 and –$179, respectively, reflecting a lock-in effect due to reduced job search, and thus reduced employment, while treatment group members engage with the Job Corps. The estimated annual impacts turn positive in 1997 and 1998, equaling $173 and $218, respectively.129 All four estimates achieve conventional levels of statistical significance. Consistent with the earnings impacts, the evaluation finds positive impacts on measures of job quality as of the sixteenth quarter after random assignment. Finally, the Job Corps also affected criminal behavior, measured as arrest and conviction rates.130 The headline: the Job Corps, nearly alone among employment and training programs for youth, has positive and substantial impacts on labor market outcomes. Comparison with the JOBSTART program found ineffective by Cave et al. (1993), which provided (more or less) a nonresidential version of the Job Corps, suggests the importance of the residential aspect of the program. McConnell and Glazerman (2001) present a careful and comprehensive cost- benefit analysis. Job Corps costs a lot: about $16,500 per participant in 1995 dollars. As a result, because the earnings impacts fade out over time as control group earnings catch up to treatment group earnings, it fails to pass a social cost- benefit test despite having positive impacts on both labor 129. See Schochet, Burghardt, and Glazerman (2001) for discussion of the finding of larger impacts for older participants and Flores-Lagunes, Gonzales, and Neumann (2008) for discussion of the lack of strong impacts among Hispanic participants. 130. The big picture findings from the NJCS echo those of the earlier nonexperimental evaluation documented in Long, Mallar, and Thornton (1981) and Mallar et al. (1982).

208

Burt S. Barnow and Jeffrey Smith

market and criminal justice outcomes. It does (easily) pass a cost- benefit test from the perspective of participants. Thus, the Job Corps presents a glass half full, but in a desert of dismal evaluation results for youth, that means something, and at the least suggests directions for future innovations in program design. 3.11.3

Trade Adjustment Assistance (TAA)

The TAA recently received a thorough nonexperimental evaluation using a “selection on observed variables” identification strategy building on a combination of survey and administrative data. The survey data allow a (somewhat) richer, and thus more compelling, set of conditioning variables than those in the WIA evaluations. On the other hand, the complicated structure of the TAA program makes the nonexperimental evaluation task substantially more challenging than for WIA. In the end, Schochet et al. (2012) have produced valuable evidence by optimizing within the design constraints, but substantial uncertainty remains. The evaluation focuses primarily on the impact of receiving “significant TAA services” for a sample of workers certified under TAA between November 1, 2005, and October 31, 2006, from twenty- six states and with UI claims starting in a wider window around that year as allowed in the law in effect at that time.131 The UI claimants from the same time periods and the same local labor markets not certified under TAA constitute the comparison group. Significant TAA services means more than just “light- touch TAA services or One-Stop core services provided through WIA or ES”; the evaluation measures service receipt using both administrative data and survey reports. Not surprisingly, given that TAA provides UI benefits and trade readjustment allowances (TRA) over a longer time period than for the comparison group and encourages longer- term training, TAA participants experience relatively long- lasting lock-in effects. In particular, in the first four quarters, Schochet et al. (2012, table 1) shows that the matched comparison group averaged 19.4 weeks of employment and $12,674 in earnings more than the participants. The negative impacts fade out over time, but never entirely disappear during the four- year follow-up period. For example, in quarters thirteen to sixteen, the matched comparison group averages 2.0 more weeks of work and $3,273 more in earnings than the participants. Subgroup analyses reveal less negative effects for younger TAA participants, and no substantive difference between men and women. While the evaluation includes a (truly) extensive collection of sensitivity analyses on many dimensions, the question of whether the job loss that leads the participants into TAA might have more persistent consequences than the job losses among the comparison group 131. The evaluation calls this the “certified- worker participant sample.” Analyses using alternative definitions of the TAA treatment (and thus alternative samples of treated individuals) reach similar substantive conclusions.

Employment and Training Programs

209

lingers, though it would require an implausibly large difference to save the TAA program in a cost- benefit sense. 3.11.4

Other Programs

A variety of other programs, some large and most small, exist and have received some evaluative attention.132 We have chosen to focus on larger programs with relatively high- quality evaluations and on programs operated via the Department of Labor. Our focus leaves out the many welfare- to-work programs discussed in Ziliak (chapter 4 in volume I of this project) and in Greenberg and Robins (2011), as well as the food stamp/SNAP employment and training programs evaluated in Puma and Burstein (1994). It also omits “sectoral training” programs under which taxpayers provide training for particular firms or small groups of firms, as in Maguire et al. (2010), as well as studies of vocational training provided by the community college system not financed by WIA or TAA, as in Jacobson, LaLonde, and Sullivan (2005). Finally, we also omit many evaluations with methodological, data, or sample size issues such as the Eyster et al. (2010) evaluation of the High Growth Job Training Initiative (HGJTI).133 3.12

Program Operation Issues

As we have noted along the way, in our view the literature spends relatively too much effort on estimating the ATET for programs that will, for various political reasons, never go away no matter what their ATET looks like, and relatively too little time providing compelling evidence on ways to operate the programs so that they will have larger ATETs than they presently do. In this section, we discuss some of what we do and do not know about program operation issues under three broad headings: performance management, program participation (i.e., how potential participants find their way to programs), and how participants get matched to particular services within programs and to jobs after they finish programs. 3.12.1

Performance Management

The Department of Labor’s flagship employment and training programs have played an important role in the intellectual and institutional development of federal performance management, starting with initial efforts under the CETA program. The JTPA and WIA programs featured quantitative performance- management systems operating at both the state and local levels that included financial incentives for good performance, as well as potential penalties for poor performance; WIOA retains the WIA system 132. See http://wdr.doleta.gov/research/keyword.cfm for a partial list as well as the discussion around table 7. 133. Of course, the authors of these evaluations typically have a very clear sense of these issues, which often arise from institutional, political, and data limitations beyond their control.

210

Burt S. Barnow and Jeffrey Smith

with some modest modifications.134 Courty and Marschke (2011a) provide a detailed description of the JTPA system, and Heinrich (2004) does the same for WIA. One can think about the performance- management systems for US government- sponsored training programs as trying to accomplish two things: (a) provide quick and inexpensive proxies for impact estimates that would otherwise take a long time and cost a lot of money, and (b) motivate program staff to work harder (i.e., apply more effort) and to work smarter (i.e., to figure out how to make a given amount of effort yield a higher payoff via changes in how the program operates). Success on the second task requires success on the first, for if the performance measures do not proxy effectively for impacts (i.e., changes in labor market outcomes relative to a counterfactual) then pressing programs to do well on them may reduce, rather than increase, their economic efficiency. Concerns about performance measures in the economics literature center on three issues. The first is the correlation between the performance measures and program impacts. Here, the available evidence suggests concern, if not alarm, as the literature provides essentially no evidence of such a correlation; see in particular Barnow (2000) and Heckman, Heinrich, and Smith (2002) for studies that make use of the data from the JTPA experiment and Schochet and Burghardt (2008) for evidence from the Job Corps experiment. The second concern springs from the literature on principal agent models when agents have multiple tasks; see, for example, Dixit (2002) for an overview in a public- sector context. This literature teaches that what gets rewarded gets done. If the government, acting on behalf of the taxpayer, wants training program staff to do five tasks, but the performancemanagement system rewards only two of them, then we would expect to see training centers do a lot of those two and not much of the other three. Thus, for example, performance measures based on labor market outcomes in the relatively short run (e.g., any time in the first year after participation) should lead programs away from services that have long- run impacts at the cost of short- run reductions in outcomes, such as training in a new occupation, and toward services that improve short- run outcomes, such as job- search assistance, regardless of their effect on long- run impacts. The third concern centers on strategic responses to performance management. These include cream skimming, the literature’s term for selecting participants based on their expected outcome with training (i.e., Y1) rather than based on expected impacts from training (i.e., Y1 – Y0), where the latter 134. The most notable change concerns the reinstatement of regression adjustment of the performance measures based on participant characteristics. The JTPA used such adjustments but WIA did not. Intuitively, regression adjustment aims to present local training centers with a level playing field, though one might argue that conditioning on the characteristics of the eligible population, rather than of the chosen participants, would do this better. See the discussion in Eberts, Bartik, and Huang (2011).

Employment and Training Programs

211

maximizes the (economic) efficiency of the program. This concern follows immediately from the fact that, as described earlier, existing performance measures consist entirely of variants of Y1. Other potential strategic responses include manipulating the timing and incidence of formal enrollment as well as the timing of formal termination from the program in response to performance measures that include only those formally enrolled and which measure outcomes over defined program years. Under the nonlinear reward functions common in job- training programs, it can make sense to reallocate weak trainees over time to particular periods by manipulating the timing of enrollment and termination. Suppose, for example, that a training center gets rewarded for an entered employment rate that exceeds 0.80 by any amount in a given program year, but that, absent a strategic response, it has a rate of 0.78 in every program year. If it can manipulate the program year in which marginal trainees count toward the performance measure so as to alternate its entered employment rate between 0.76 and 0.80, it becomes better off under the performancemanagement system, but without actually improving labor market outcomes in any way (and perhaps with an expenditure of real resources on the strategic response). The literature provides a wealth of compelling empirical evidence on both crude and also remarkably subtle responses to the incentives implicit in the performance- management systems of US job- training programs: see Courty and Marschke (2011b) for an overview, as well as Barnow and King (2005). See Barnow and Smith (2004) and Heckman et al. (2011) for more extensive summaries of the literature on performance management in US employment and training programs, Radin (2006) for a critique from outside economics that emphasizes different concerns than we do here, and Wilson (1989) for a thoughtful presentation of the underlying problems of public management that motivate performance management. 3.12.2

Program Participation

Studies of program participation consider how individuals come to participate in social programs. Such studies have interest for several reasons. First, program participation represents a choice, and economists (and other social scientists) like to understand the choices individuals make. Second, understanding how individuals choose to participate in programs aids in program design and targeting. Third, an understanding of the participation process provides the foundation for credible nonexperimental evaluation. Fourth, it also informs discussions of external validity to the set of eligible nonparticipants. Fifth, program operators (and voters) may care about the equity with which programs services get distributed to particular identifiable groups within the eligible population. Currie (2006) reviews the literature on program participation. In an institutional sense, participants find government- sponsored training programs in a variety of ways. They may get a referral from a friend or

212

Burt S. Barnow and Jeffrey Smith

neighbor or from a social worker or caseworker in another program. They may, as the government hoped when it mandated colocation, head to the One-Stop center for some other purpose and, once there, find the lure of the current employment and training program impossible to resist. They may get referred by service providers, as when individuals seeking vocational training at their local community college get sent to the WIOA office to try to obtain funding for that endeavor. In contrast, some participants participate due to a requirement rather than a choice. For example, 9.5 percent of those randomized in the JTPA experiment report that a welfare program required them to participate and 0.5 percent report a court doing so. The Worker Profiling and Reemployment Services (WPRS) program and the Reemployment and Eligibility Assessment (REA) program require some UI claimants to participate in reemployment services (sometimes, but not often, including training) or risk disqualification for benefits. Finally, among those who make it into a program, actual enrollment depends in part on caseworker behavior. They may, perhaps out of goodwill and perhaps out of a desire to improve their measured performance, discourage some potential participants from enrolling by requiring additional visits to the One-Stop center or by referring them to alternative services, while encouraging others. Heckman, Smith, and Taber (1996) find that caseworkers at the JTPA center in Corpus Christi, Texas, appear to emphasize equity concerns rather than performance concerns in the process by which applicants became enrollees, a process that also includes applicant self- selection. Standard economic models of participation tend not to emphasize these institutional features. Instead, as with the simple model we discussed earlier, they focus on more abstract notions of opportunity costs and expected benefits. Individuals participate when they face low costs to doing so, due to either ongoing skills deficits or transitory labor market shocks such as job loss, and when they expect large impacts from doing so.135 They may also view participation as a form of assisted job search, either literally, as when receiving job search assistance or subsidized on- the- job training at a firm, or figuratively, as when new skills learned in classroom training improve the frequency or quality of job offers. The literature also includes some informal discussion of the possible importance of credit constraints due to the absence of stipends or other payments for training participants in most current programs—the Job Corps is an exception in providing room and board—and the resulting value of alternative sources of financial support such as transfers or family support during training. The potentially crucial role of information, both in making the possibility of participation salient enough to induce explicit choice and in the sense of forming ideas about 135. Ashenfelter (1983) emphasizes that for particularly attractive means- tested programs, potential participants may choose to reduce their opportunity cost of participation (e.g., by quitting a job) in order to qualify, while Moffitt (1983) adds stigma to the participation costbenefit calculation. We suspect that neither factor plays much role in the training context.

Employment and Training Programs

213

potential benefits, has played little role in the theoretical literature on training participation and only a very modest role in the empirical literature, as we describe next. The empirical literature consists primarily of multivariate studies of the observed determinants of participation, with the determinants including demographics, human- capital variables, past labor market outcomes, and so on. The estimated reduced form effects of these variables then get interpreted in light of the sorts of theories just described. For example, a negative coefficient on age would suggest that younger workers perceive a higher benefit to participation due to more time over which to realize any earnings gain the training provides. In some cases, the participation model functions mainly as an input into estimation of treatment effects via some estimator based on the propensity score, rather than as the primary object of interest in the study. Several such studies look at the JTPA program. Anderson, Burkhauser, and Raymond (1993) examine participation in JTPA in Tennessee by comparing program records on enrollees with a sample of eligibles constructed from the Current Population Survey—a very imperfect enterprise for reasons outlined in Devine and Heckman’s (1996) study of the JTPA- eligible population. Their multivariate analysis reveals blacks, high school dropouts, and individuals with disabilities as underrepresented among participants, which they interpret as evidence of cream skimming resulting from the JTPA performance standards. Heckman and Smith (1999) study the JTPA participation process using rich data on experimental control group members and eligible nonparticipants at four of the sites in the JTPA experiment. Their headline findings concern the importance of labor force status transitions in the months leading up to the participation decision in determining participation, especially transitions to unemployment. These transitions need not entail a simultaneous change in earnings, as when an individual goes from “out of the labor force” to “unemployed” by initiating job search. This finding in turn suggests that analyses that rely solely on earnings and employment may miss an important part of the participation picture (and so may end up with biased impact estimates as well). Their analysis also highlights the importance of family factors, including marital status and family income, in determining participation, along with the usual suspects identified in other studies, such as age (declining) and education (hill- shaped). Finally, Heckman and Smith (2004) combine the data from the National JTPA Study with data from the Survey of Income and Program Participation (SIPP) to decompose the process that leads from JTPA eligibility to JTPA enrollment into a series of stages: eligibility, awareness, application, acceptance (defined to mean reaching random assignment), and enrollment. Though descriptive in nature, the analysis reveals a number of important findings. First, decomposing the steps from eligibility to enrollment reveals that for some groups the key stage is program awareness, rather than enroll-

214

Burt S. Barnow and Jeffrey Smith

ment conditional on application or acceptance. This adds nuance to the findings in the Anderson, Burkhauser, and Raymond (1993) paper and signals that substantively important differences in participation conditional on eligibility among groups arise from factors other than the incentives implicit in the performance management system. Second, looking at the stage from acceptance to enrollment—the stage over which program staff has the most influence—does suggest some role for the performance management system as individuals with characteristics that predict relatively weak labor market outcomes have lower probabilities of enrollment. Finally, simply making a particular group eligible for a program does not mean that they will take it up. We know of only one such study for WIA, namely the analysis in Andersson et al. (2013). Andersson et al. (2013) present both univariate and multivariate analyses (see their table 3) of the characteristics of WIA enrollees that predict receipt of training. In particular, they find that younger enrollees have a greater chance of receiving training, which makes sense in terms of the basic advice of the life cycle human- capital model. They also find a hill- shaped conditional pattern by years of schooling, with those in the middle of the distribution, that is, those with a high school diploma or some college, having the highest probability of training. This makes sense as well. Many training courses require high school completion and, even if they do not, they may require mastery of relatively technical written material. At the upper end of the distribution, college graduates likely have little need for further training in general (or may have other issues that training will not fix). Finally, while Andersson et al. (2013) find differences in univariate training chances between whites and nonwhites, these largely disappear in the multivariate analyses. In our view, participation in both employment and training programs in general, and in the training components of those programs in particular, remains fertile ground for additional research. In particular, the role of information in leading to program awareness and then to participation, the formation of ex ante beliefs about likely program impacts, the determinants of the timing of training within spells of unemployment or nonemployment, and the role of other family members merit further researcher attention. 3.12.3

Matching Participants to Services

Large general employment and training programs such as JTPA, WIA, and WIOA face the complicated problem of matching particular participants to particular services. Even within broad service types, such as classroom training or subsidized on- the- job training, this represents a nontrivial problem. A given program office may have several different classroom training providers offering programs of varying lengths and varying skill prerequisites that aim to prepare trainees for a variety of different occupations, as well as an array of heterogeneous employers willing to consider program

Employment and Training Programs

215

participants for subsidized on- the- job training slots. This section briefly reviews the (remarkably) small extant literature in economics that considers different ways to match participants with services. Caseworkers play a pivotal role in matching participants to services in the major US employment and training programs (as they do elsewhere in the developed world). Typical motivations for this practice revolve around information asymmetries between the caseworker and the participant due to the caseworker’s superior knowledge of local service providers, of local labor market conditions (e.g., occupations in demand), and (more speculatively) of the best matches, in terms of earnings and employment impacts, between participant characteristics and preferences and particular services and occupations.136 We have only very limited evidence in the United States (and not much more elsewhere) regarding how and how well caseworkers assign participants to services. The JTPA experiment and Andersson et al.’s (2013) WIA observational study both provide some information regarding what caseworkers believe about optimal service assignment rules. For example, Kemple, Doolittle, and Wallace (1993) find a number of ex ante reasonable patterns in univariate analyses for adults using the JTPA experimental data: (a) participants without a high school diploma or GED have a higher probability of assignment to adult basic education and a lower probability of assignment to classroom training in occupational skills; (b) participants receiving cash assistance, who thus have a source of income during training other than work, are more likely to receive classroom- based services; and (c) participants with limited work experience have a lower probability of assignment to job- search assistance and subsidized on- the- job training (the latter of which requires a willing employer). Smith (1992) and Plesca and Smith (2007) provide further analyses using the JTPA data, while the Andersson et al. (2013) findings described in detail in our discussion of the determinants of participation in training provide evidence for the WIA program. Taken together, the analyses from the JTPA and WIA programs suggest that caseworkers have some reasonable ideas about service assignment as a function of participant characteristics, with the caveat that in both programs caseworkers take client interests and preferences into account, so that the observed patterns reflect the views of both groups. A different line of research estimates heterogeneous treatment effects as a function of observed participant characteristics using experimental or observational variation and then uses those estimates to examine how well, or how poorly, existing caseworker service assignment patterns do relative to 136. Caseworkers also perform a number of other functions, including referring participants to other services such as substance abuse programs, transfer programs, and so on, helping participants clarify their interests and abilities, providing informal instruction in job search, monitoring eligibility and search intensity, and so on. Bloom, Hill, and Riccio (2003) investigates some of these other aspects of the caseworker role.

216

Burt S. Barnow and Jeffrey Smith

the minimum and maximum impacts possible given the estimates. Plesca and Smith (2005) undertake this exercise using the JTPA experimental data and consider assignment to the three experimental “treatment streams” based on services recommended prior to random assignment. They find benefits from assigning treatment stream using a statistical treatment rule based on estimated impacts relative to caseworker assignment. Lechner and Smith (2007) perform a similar exercise using observational data (with larger samples) from Switzerland and find that caseworkers do about as well as random assignment to treatment, and thus leave substantial potential gains on the table. Their paper emphasizes the importance of respecting capacity constraints under alternative allocation schemes. McCall, Smith, and Wunsch (2016) summarize the broader European literature, which reaches an overall conclusion similar to that of Lechner and Smith (2007). A pair of experiments provides further evidence on caseworker performance at the service assignment task. Bell and Orr (2002) analyze the AFDC Homemaker-Home Health Aide Demonstrations. In that study, caseworkers predicted both the untreated outcome and the impact for each experimental sample member prior to random assignment. Interacting the treatment indicator with the impact prediction in the impact estimation reveals that caseworkers in this context have no idea who will benefit from training as a homemaker/home health aide. They do a much better job at predicting untreated outcome levels. This experiment shows what caseworkers know about the impact of one particular treatment, which is related to, but not the same as, picking the service with the highest expected impact. We think more experiments should undertake exercises like this one. The second experiment, reported in Perez-Johnson, Moore, and Santillano (2011) compares alternative administrative models for delivering ITAs using a sample of WIA enrollees determined eligible for ITAs in eight sites in six states.137 The experiment included three treatment arms: structured choice, guided choice, and maximum choice, which differed primarily on three dimensions. First, under structured choice, but not the other two arms, the caseworker had veto power over training choices. Second, under structured choice, but not the other arms, the caseworker had discretion over the dollar value of the ITA. Third, the amount of counseling regarding the training choice varied from mandatory and substantial under structured choice, to mandatory and less intensive under guided choice, to optional under maximum choice. In all treatment arms, the eligible training provider list and any local rules about in-demand occupations constrained the training choices. Operationally, the caseworkers were reluctant to be as directive regarding client- training choices as envisioned in the original design for the structured 137. See also the earlier reports by McConnell et al. (2006) and McConnell, Decker, and Perez-Johnson (2006).

Employment and Training Programs

217

choice treatment arm. Instead, according to Perez-Johnson, Moore, and Santillano (2011, xxvii) caseworkers “tended to award Structured Choice customers’ ITAs that enabled them to attend their preferred training programs.” For this reason, program costs for the structured choice arm proved higher than for the other two arms. Potential trainees in the maximum choice arm largely opted out of counseling, providing a revealed preference evaluation of that service at the margin. A larger fraction of those in the maximum choice arm used ITAs, but overall training rates (including both ITA- funded and other training) and the occupational mix of training differed little across the three treatment arms. Enrollees in the structured choice and maximum choice arms had substantively and statistically larger probabilities of completing a training course and of earning a credential. Earnings and employment outcomes differ somewhat between the survey data and the administrative data from state UI records. The report gives (somewhat unusually, relative to the literature) greater weight to the survey data, while we lean toward giving them equal weight. In the survey data, the structured choice arm shows the highest earnings over all post-program periods, with a difference of about $500 per quarter in the final two years of follow-up (roughly 2008– 2009) relative to the guided choice arm and about $250 per month relative to the maximum choice arm, though the latter difference fails to attain traditional levels of statistical significance. In contrast, the administrative data reveal only small differences in labor market outcomes: for example, in the final two years of the follow-up period (calendar years 2008– 2009), average quarterly earnings equal $4,818, $4,713, and $4,734 for the structured choice, guided choice, and maximum choice arms, respectively, with none of the differences statistically significant. Overall, Perez-Johnson, Moore, and Santillano (2011) conclude that the stronger impact performance of the structured choice arm has more to do with the larger dollar value of the ITAs in that arm than with caseworker value added. At the same time, the marginally better performance of the maximum choice arm relative to the guided choice arm, a contrast that highlights the value added of the caseworkers as these arms both included the same relatively low cap on ITA value, suggests that caseworkers add little if any value in their informational role.138 The leading alternative to having caseworkers assign participants to services consists of allowing participants to assign themselves to services, typically via some form of voucher, such as the ITAs under WIA. The literature refers to this as demand- driven assignment. Arguments in favor of demanddriven assignment include (a) participants likely have private information about their tastes and abilities that allow them to make better matches than 138. An additional and less direct way to evaluate the match between trainees and training measures the extent to which trainees end up in jobs directly related to their training as in Park (2012). The key issue in this approach relates to the benchmark—How much mismatch is too much, given that the optimum is not 100 percent?

218

Burt S. Barnow and Jeffrey Smith

caseworkers; (b) participants may work harder and be more likely to complete programs and courses they choose for themselves; and (c) participant choice may put more competitive pressure on providers to do a good job. As noted in our discussion of the ITA experiment just above, ITAs under WIA typically embody a combination of caseworker input and participant choice, within the constraints of the eligible provider list. The literature offers only limited evidence regarding vouchers in the training context. The ITA experiment described just above represents the best we have. Reframed from the voucher perspective, it shows that more flexible vouchers (i.e., vouchers less constrained by caseworkers and program rules) increase training incidence somewhat, do not change the mix of training very much, and marginally improve outcomes relative to the status quo of guided choice. Barnow (2009) provides a survey of the older US literature that emphasizes thinking about a continuum of options with varying degrees of customer control and program guidance and limitation. McCall, Smith, and Wunsch (2016) include the somewhat larger European literature in their survey. Based on our reading, the literature suggests surprisingly modest effects of additional customer choice on impacts but some impact on customer satisfaction. Additional research on how participants use information in making choices, and on the effects of additional types of information on choices and outcomes, represents a logical next step. In addition to participants and caseworkers, institutional factors also play an important role in determining service assignments. First, the law typically encourages programs to offer training in occupations actually in demand in the local labor market; under WIA, local programs vary in how, and how enthusiastically, they implement this aspect of the law. Second, the availability of local service providers constrains the set of available options; as a result, for example, WIA programs in urban areas typically offer a broader array of training options than those in rural areas. The reluctance of some providers to jump through the hoops required to get on the eligible provider list described earlier in our discussion of WIA implementation further limits the available options in some areas. Finally, broader institutional enthusiasm for particular services or service sequences, as with the “core then intensive then training” sequence in the WIA program, have an influence on service patterns.139 The literature suggests that caseworkers do not add much value in directing participants into particular services or trainees into particular training courses. This does not mean that they could not do better, and it could just mean that they seek to maximize something else, such as equity or measured performance, instead of value added. It also does not mean that they do 139. A small literature considers, with a combination of theory and calibration, the optimal mix of broad service categories and their interaction with the design of social insurance and transfer programs. See, for example, Wunsch (2013) and the references therein.

Employment and Training Programs

219

not add value in their other roles—see, for example, Rosholm (2014) and the broader discussion in McCall, Smith, and Wunsch (2016). We still have much to learn regarding this dimension of the training provision process. 3.13

Summary and Conclusions

The United States continues to spend relatively little on employment and training programs in general, or on government- sponsored training more narrowly, than most other developed countries. It remains unclear which countries (if any) have found the optimum. The years since LaLonde (2003) have seen some valuable research on employment and training programs in the United States, but the quantity of high- quality work remains low. We conjecture that this lack results from both the relatively small budgetary footprint of this program category as well as from data and data- access limitations. Taken together, the recent evidence presents a mixed but somewhat disheartening picture. Both WIA training and WIA overall have fairly robust positive earnings effects for both men and women served under the adult funding stream, effects that tend to pass cost- benefit tests under reasonable assumptions. In contrast, WIA training and WIA overall appear to have a negative effect on individuals served under the dislocated worker funding stream. We find the available nonexperimental evidence a bit more compelling for WIA training versus WIA without training than for WIA versus no-WIA, and the findings for adults appear more robust to mildly different design decisions and/or to the set of states studied than the findings for dislocated workers. More attention to explaining the differences across states and streams would have great value; perhaps the ongoing WIA experimental evaluation will shed some light. The TAA analysis reveals that we should perhaps seek a more efficient way to compensate workers who suffer individually while the public benefits from reduced trade barriers. The Job Corps experiment highlights the potential value of immersive, residential treatments in changing the outcomes of youth, while at the same time the fact that any positive impacts, even ones that end up not passing a cost- benefit test, elicit cheers from the audience reinforces the difficulty of the underlying task. Given the demonstrated inability of the US political system to kill even programs with dismal evaluation track records stretching over decades, future evaluation research should focus relatively more on impacts on marginal participants, which would inform decisions to increase or decrease program budgets at the margin, and on ways to improve program design, implementation, and performance management, as with the WIA ITA experiment. The last two decades have seen a major “data gap” emerge between the United States and various central and northern European countries. The administrative data available for research on government- subsidized train-

220

Burt S. Barnow and Jeffrey Smith

ing programs in the United States pales in comparison to that, for example, available in Germany, Sweden, or Denmark in its quality (i.e., richness of individual characteristics, temporal fineness of outcome variables, lack of measurement error in the timing and incidence of service receipt and enrollment, and so forth), the ease with which serious researchers can gain access to it, and the ease with which they can use it if they do gain access. These limitations associated with administrative data in the United States mean that much policy- relevant research that would improve our understanding of training programs does not get done. This research would often cost the government little or nothing as graduate students and professors would do it in order to generate publishable papers for which they receive indirect compensation. At the same time, it remains essentially impossible to undertake evaluations of job training- programs using standard social science data sets in the United States due to sample- size issues in the major panel data sets (e.g., the Panel Study of Income Dynamics) and due to measurement issues (especially poor measurement of program participation) in both the crosssectional data sets and the panel data sets. Matching of administrative data on participants to one or more of the major surveys—we suggest the SIPP, which combines relatively large sample sizes with a short panel and detailed information on earnings and program participation—could address the measurement issues at relatively low cost, and allow the generation of important new knowledge about how the citizenry interacts with these programs. Other areas where data remain weak in the US context could be addressed with less controversy. While the Department of Labor provides some information about variation in state UI programs over time, similar (and, ideally, more comprehensive) information on many other programs such as WIA (and now WIOA), the Worker Profiling and Reemployment Services System (WPRS), and the Reemployment and Eligibility Assessment (REA) program does not exist to our knowledge.140 Providing it would facilitate research on these specific programs and on the system of active and passive labor market programs as a whole. Also valuable, as noted earlier, would be improved information on program costs, on average and at the margin, for different types of services, for different types of clients, and in different locations. The intersection between community colleges and employment and training programs would also benefit from improved data; at present, community college data do not indicate which students have their courses paid for by programs such as WIA and neither the aggregated WIA data available to the public nor the WIA administrative records typically provided to researchers indicate the identity of individual service providers. The intersection between workforce development programs and the community 140. See http://www.unemploymentinsurance.doleta.gov/unemploy/statelaws.asp for the DOL information on state UI laws.

Employment and Training Programs

221

college system has great substantive importance; having the data required for serious research would allow evidence- based policy to improve it. On the methods side, the United States continues to lead the world in the evaluation of government- sponsored training programs via large- scale social experiments. Both the Job Corps experiment and the WIA experiment solve important problems regarding site selection and external validity that arose in the earlier JTPA experiment. The nonexperimental evaluations of WIA and TAA reflect, to the extent allowed by the data, recent advances in the literature on nonparametric and semiparametric estimation of treatment effects. European studies of the value of particular conditioning variables have served to make these US studies more credible by showing that some of the variables absent in the United States do not add that much in terms of bias reduction. On the negative side, the tidal wave of compelling studies of educational interventions using regression discontinuity designs over the past decade has no analogue in the job- training literature due to the ongoing failure to “design in” usable discontinuities in this policy domain. Similarly, the federal government often misses opportunities for staged rollouts of programs, which would allow the application of standard panel- data estimation methods. Finally, we note the potential for institutional reform in the broad sense, designed to embody an alternative vision of what Smith (2011) calls “evaluation policy.” The success of the Department of Education’s Institute for Education Sciences (IES) at generating truly remarkable improvements in the quality of official evaluations of educational interventions (and, indeed, in the entire academic literature that evaluates educational interventions) suggests consideration of a similar institution in the world of active labor market programs.141 Similarly, the success of the requirement that tied rigorous evaluation to the granting of waivers under the old AFDC program in the 1980s and 1990s suggests a similar scheme for allowing states to innovate in their workforce systems in exchange for providing the public good of high- quality evidence.

References Abadie, Alberto, Joshua Angrist, and Guido Imbens. 2002. “Instrumental Variables Estimates of the Effects of Subsidized Training on the Quantiles of Trainee Earnings.” Econometrica 70 (1): 91– 117. Anderson, Kathryn, Richard Burkhauser, and Jennie Raymond. 1993. “The Effect of Creaming on Placement Rates under the Job Training Partnership Act.” Industrial and Labor Relations Review 46:613– 24. Andersson, Fredrik, Harry Holzer, Julia Lane, David Rosenblum, and Jeffrey Smith. 141. See Institute of Education Sciences (2008) for more on the IES success story.

222

Burt S. Barnow and Jeffrey Smith

2013. “Does Federally-Funded Job Training Work? Nonexperimental Estimates of WIA Training Impacts Using Longitudinal Data on Workers and Firms.” NBER Working Paper no. 19446, Cambridge, MA. Angrist, Joshua, Guido Imbens, and Donald Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91 (434): 444– 55. Angrist, Joshua, and Alan Krueger. 1999. “Empirical Strategies in Labor Economics.” In Handbook of Labor Economics, vol. 3A, edited by Orley Ashenfelter and David Card, 1277– 366. Amsterdam: North-Holland. Angrist, Joshua, and Jorn-Steffen Pischke. 2010. “The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con out of Econometrics.” Journal of Economic Perspectives 24 (2): 3– 30. Ashenfeltcr, Orley. 1978. “Estimating the Effect of Training Programs on Earnings.” Review of Economics and Statistics 6 (1): 47– 57. ———. 1983. “Determining Participation in Income-Tested Social Programs.” Journal of the American Statistical Association 78:517– 25. Balducchi, David, Terry Johnson, and Mark Gritz. 1997. “The Role of the Employment Service.” In Unemployment Insurance in the United States: Analysis of Policy Issues, edited by Christopher O’Leary and Stephen Wandner, 457– 504. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Barnow, Burt. 1987. “The Impact of CETA Programs on Earnings: A Review of the Literature.” Journal of Human Resources 22 (2): 157– 93. ———. 1993. “Thirty Years of Changing Federal, State, and Local Relationships in Employment and Training Programs.” Publius: The Journal of Federalism 23 (3): 75– 94. ———. 2000. “Exploring the Relationship between Performance Management and Program Impact: A Case Study of the Job Training Partnership Act.” Journal of Policy Analysis and Management 19 (1): 118– 41. ———. 2009. “Vouchers in US Vocational Training Programs: An Overview of What We Have Learned.” Journal for Labor Market Research 42:71– 84. ———. 2011. “Lessons from the WIA Performance Measures.” In The Workforce Investment Act: Implementation Experiences and Evaluation Findings, edited by Douglas Besharov and Phoebe Cottingham, 209– 31. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Barnow, Burt, and David Greenberg. 2015. “Do Estimated Impacts on Earnings Depend on the Source of the Data Used to Measure Them? Evidence from Previous Social Experiments.” Evaluation Review 39 (2): 179– 228. Barnow, Burt, and Richard Hobbie, eds. 2013. The American Recovery and Reinvestment Act: The Role of Workforce Programs. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Barnow, Burt, and Christopher King. 2005. “The Workforce Investment Act in Eight States.” Report prepared for US Department of Labor, Employment and Training Administration, Washington, DC. Barnow, Burt, and Demetra Nightingale. 2007. “An Overview of US Workforce Development Policy in 2005.” In Reshaping the American Workforce in a Changing Economy, edited by Harry Holzer and Demetra Nightengale. Washington, DC: Urban Institute Press. Barnow, Burt, and Jeffrey Smith. 2004. “Performance Management of US Job Training Programs.” In Job Training Policy in the United States, edited by Christopher O’Leary, Robert Straits, and Stephen Wandner, 21– 56. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research.

Employment and Training Programs

223

Barnow, Burt, and John Trutko. 2015. “The Value of Efficiency Measures: Lessons from Workforce Development Programs.” Public Performance and Management Review 38:487– 513. Baum, Sandy, Diane Cardenas Elliott, and Jennifer Ma. 2014. Trends in Student Aid 2014. New York: College Board. Bell, Stephen, and Larry Orr. 2002. “Screening (and Creaming?) Applicants to Job Training Programs: The AFDC Homemaker-Home Health Aide Demonstrations.” Labour Economics 9:279– 301. Bell, Stephen, Larry Orr, John Blomquist, and Glen Cain. 1995. Program Applicants as a Comparison Group in Evaluating Training Programs. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Betsey, Charles, Robinson Hollister, and Mary Papageorgiou, eds. 1985. Youth Employment and Training Programs: The YEDPA Years. Washington, DC: National Academies Press. Bitler, Marianne, Jonah Gelbach, and Hilary Hoynes. 2005. “Distributional Impacts of the Self-Sufficiency Project.” NBER Working Paper no.11626, Cambridge, MA. ———. 2006. “What Mean Impacts Miss: Distributional Effects of Welfare Reform Experiments.” American Economic Review 96:988– 1012. Black, Dan, Jose Galdo, and Jeffrey Smith. 2007. “Evaluating the Worker Profiling and Reemployment Services System Using a Regression Discontinuity Design.” American Economic Review Papers and Proceedings 97 (2): 104– 07. Black, Dan, Jeffrey Smith, Mark Berger, and Brett Noel. 2003. “Is the Threat of Reemployment Services More Effective than the Services Themselves? Evidence from Random Assignment in the UI System.” American Economic Review 93 (4): 1313– 27. Blank, Diane, Laura Heald, and Cynthia Fagnoni. 2011. “An Overview of WIA.” In The Workforce Investment Act: Implementation Experiences and Evaluation Findings, edited by Douglas Besharov and Phoebe Cottingham, 49– 78. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Bloom, Howard. 1984. “Accounting for No-Shows in Experimental Evaluation Designs.” Evaluation Review 8 (2): 225– 46. Bloom, Howard, Carolyn Hill, and James Riccio. 2003. “Linking Program Implementation and Effectiveness: Lessons from a Pooled Sample of Welfare- to-Work Experiments.” Journal of Policy Analysis and Management 22 (4): 551– 75. Boo, Katherine. 2004. “Letter from South Texas: The Churn.” New Yorker, March 29. Borden, William. 2011. “The Challenges of Measuring Performance.” In The Workforce Investment Act: Implementation Experiences and Evaluation Findings, edited by Douglas Besharov and Phoebe Cottingham, 177– 208. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Borus, Michael, and Daniel Hamermesh. 1978. “Estimating Fiscal Substitution by Public Service Employment Programs.” Journal of Human Resources 13 (4): 561– 65. Bound, John, Charles Brown, and Nancy Mathiowetz. 2001. “Measurement Error in Survey Data.” In Handbook of Econometrics, vol. 5, edited by James Heckman and Edward Leamer, 3705– 843. Amsterdam: North-Holland. Bradley, David. 2013. “The Workforce Investment Act and the One-Stop Delivery System.” CRS Report for Congress, Congressional Research Service, Washington, DC. Burtless, Gary. 1995. “The Case for Randomized Field Trials in Economic and Policy Research.” Journal of Economic Perspectives 9 (2): 63– 84.

224

Burt S. Barnow and Jeffrey Smith

Bushway, Shawn, Brian Johnson, and Lee Ann Slocum. 2007. “Is the Magic Still There? The Use of the Heckman Two-Step Correction for Selection Bias in Criminology.” Journal of Quantitative Criminology 23:151– 78. Busso, Matias, John DiNardo, and Justin McCrary. 2014. “New Evidence on the Finite Sample Properties of Propensity Score Reweighting and Matching Estimators.” Review of Economics and Statistics 96 (5): 885– 97. Butler, Wendell, and Richard Hobbie. 1976. Employment and Training Programs. Washington, DC: Congress of the United States, Congressional Budget Office. Card, David, Jochen Kluve, and Andrea Weber. 2010. “Active Labour Market Policy Evaluations: A Meta-Analysis.” Economic Journal 120:F452– 77. ———. 2015. “What Works? A Meta Analysis of Recent Active Labor Market Program Evaluations.” NBER Working Paper no. 21431, Cambridge, MA. Carneiro, Pedro, Karsten Hansen, and James Heckman. 2003. “Estimating Distributions of Treatment Effects with an Application to the Returns to Schooling and Measurement of the Effects of Uncertainty on College Choice.” International Economic Review 44 (2): 361– 422. Carneiro, Pedro, James Heckman, and Edward Vytlacil. 2011. “Estimating Marginal Returns to Education.” American Economic Review 101 (6): 2754– 81. Cave, George, Hans Bos, Fred Doolittle, and Cyril Toussaint. 1993. “JOBSTART: Final Report on a Program for School Dropouts.” MDRC Report, Manpower Demonstration Research Corporation, New York. http://www.mdrc.org/sites /default/files/full_416.pdf. Center on Budget and Policy Priorities. 2012. Policy Basics: An Introduction to TANF. Washington, DC: Center on Budget and Policy Priorities. Clague, Ewan, and Leo Kramer. 1976. Manpower Policies and Programs: A Review. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Cook, Robert, Charles Adams, and Lane Rawlins.1985. Public Service Employment: The Experience of a Decade. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Cook, Thomas. 2008. “ ‘Waiting for Life to Arrive’: A History of the RegressionDiscontinuity Design in Psychology, Statistics and Economics.” Journal of Econometrics 142 (2): 636– 54. Couch, Kenneth. 1992. “New Evidence on the Long-Term Effects of Employment Training Programs.” Journal of Labor Economics 10 (4): 380– 88. Courty, Pascal, and Gerald Marschke. 2011a. “The JTPA Incentive System.” In The Performance of Performance Standards, edited by James Heckman, Carolyn Heinrich, Pascal Courty, Gerald Marschke, and Jeffrey Smith, 65– 93. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. ———. 2011b. “Measuring Government Performance: An Overview of Dysfunctional Responses.” In The Performance of Performance Standards, edited by James Heckman, Carolyn Heinrich, Pascal Courty, Gerald Marschke, and Jeffrey Smith, 203– 29. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Crépon, Bruno, Esther Duflo, Marc Gurgand, Roland Rathelot, and Philippe Zamora. 2013. “Do Labor Market Policies have Displacement Effects? Evidence from a Clustered Randomized Experiment.” Quarterly Journal of Economics 128 (2): 531– 80. Currie, Janet. 2006. “The Take-Up of Social Benefits.” In Poverty, the Distribution of Income, and Public Policy, edited by Alan Auerbach, David Card, and John Quigley, 80– 148. New York: Russell Sage. Dahlby, Bev. 2008. The Marginal Social Cost of Public Funds. Cambridge, MA: MIT Press. D’Amico, Ronald, Kate Dunham, Jennifer Henderson-Frakes, Deborah Kogan,

Employment and Training Programs

225

Vinz Koller, Melissa Mack, Micheline Magnotta, Jeffrey Salzman, Andrew Wiegand, Gardner Carrick, and Dan Weissbein. 2004. The Workforce Investment Act after Five Years: Results from the National Evaluation of the Implementation of WIA. Oakland, CA: Social Policy Research Associates. D’Amico, Ronald, and Jeffrey Salzman. 2004. “Implementation Issues in Delivering Training Services to Adults under WIA.” In Job Training Policy in the United States, edited by Christopher O’Leary, Robert Straits, and Stephen Wandner, 101– 34. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Davidson, Carl, and Stephen Woodbury. 1993. “The Displacement Effect of Reemployment Bonus Programs.” Journal of Labor Economics 11 (4): 575– 605. Decker, Paul, and Gillian Berk. 2011. “Ten Years of the Workforce Investment Act (WIA): Interpreting the Research on WIA and Related Programs.” Journal of Policy Analysis and Management 30 (4): 906– 26. Dehejia, Rajeev, and Sadek Wahba. 1999. “Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs.” Journal of the American Statistical Association 94 (448): 1053– 62. ———. 2002. “Propensity Score Matching Methods for Non- Experimental Causal Studies.” Review of Economics and Statistics 84 (1): 151– 61. Devine, Theresa, and James Heckman. 1996. “The Structure and Consequences of Eligibility Rules for a Social Program.” In Research in Labor Economics, vol. 15, edited by Solomon Polachek, 111– 70. Greenwich, CT: JAI Press. Dickinson, Katherine, Terry Johnson, and Richard West. 1987. “An Analysis of the Sensitivity of Quasi-Experimental Net Impact Estimates of CETA Programs.” Evaluation Review 11 (4): 452– 72. Dixit, Avinash. 2002. “Incentives and Organizations in the Public Sector: An Interpretative Review.” Journal of Human Resources 37 (4): 696– 727. Djebbari, Habiba, and Jeffrey Smith. 2008. “Heterogeneous Program Impacts: Experimental Evidence from the PROGRESA Program.” Journal of Econometrics 145 (1– 2): 64– 80. Doolittle, Fred, and Linda Traeger. 1990. Implementing the National JTPA Study. New York: MDRC. Eberts, Randall, Timothy Bartik, and Wei-Jang Huang. 2011. “Recent Advances in Performance Measurement of Federal Workforce Development Programs.” In The Workforce Investment Act: Implementation Experiences and Evaluation Findings, edited by Douglas Besharov and Phoebe Cottingham, 233– 75. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Eberts, Randall, and Harry Holzer. 2004. “Overview of Labor Exchange Policies and Services.” In Labor Exchange Policy in the United States, edited by David Balducchi, Randall Eberts, and Christopher O’Leary, 1– 31. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Eberts, Randall, and Stephen Wandner. 2013. “Data Analysis of the Implementation of the Recovery Act Workforce Development and Unemployment Insurance Provisions.” In The American Recovery and Reinvestment Act: The Role of Workforce Programs, edited by Burt Barnow and Richard Hobbie, 267– 307. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Eberwein, Curtis, John Ham, and Robert LaLonde. 1997. “The Impact of Classroom Training on the Employment Histories of Disadvantaged Women: Evidence from Experimental Data.” Review of Economic Studies 64 (4): 655– 82. Eyster, Lauren, Demetra Smith Nightengale, Burt Barnow, Carolyn O’Brien, John Trutko, and Daniel Keuhn. 2010. Implementation and Early Training Outcomes of the High Growth Job Training Initiative. Washington, DC: The Urban Institute. Federal Register. 2014. March 27. 79 (59): 17184– 7188.

226

Burt S. Barnow and Jeffrey Smith

Ferber, Robert, and Werner Hirsch. 1981. Social Experimentation and Economic Policy. New York: Cambridge University Press. Fisher, Ronald. 1935. The Design of Experiments. Edinburgh: Oliver and Boyd. Flores-Lagunes, Alfonso, Arturo Gonzalez, and Todd Neumann. 2008. “Learning But Not Earning? The Impact of Job Corps Training on Hispanic Youth.” Economic Inquiry 48 (3): 651– 67. Ford, Reuben, David Gyarmati, Kelly Foley, Doug Tattrie, and Liza Jimenez. 2003. “Can Work Incentives Pay for Themselves? Final Report on the Self-Sufficiency Project for Welfare Applicants.” Ottawa, Ontario: Social Research and Demonstration Corporation. Forslund, Anders, and Alan Krueger. 1997. “An Evaluation of Swedish Active Labor Market Policy: New and Received Wisdom.” In The Welfare State in Transition: Reforming the Swedish Model, edited by Richard Freeman, Robert Topel, and Birgitta Swedenborg, 267– 98. Chicago: University of Chicago Press. Fraker, Thomas, and Rebecca Maynard. 1987. “The Adequacy of Comparison Group Designs for Evaluations of Employment Related Programs.” Journal of Human Resources 22 (2): 194– 227. Franklin, Grace, and Randall Ripley. 1984. CETA Politics and Policy, 1973–1982. Knoxville: University of Tennessee Press. Frölich, Markus. 2004. “Finite-Sample Properties of Propensity-Score Matching and Weighting Estimators.” Review of Economics and Statistics 86:77– 90. Frölich, Markus, and Michael Lechner. 2010. “Exploiting Regional Treatment Intensity for the Evaluation of Labor Market Policies.” Journal of the American Statistical Association 105 (491): 1014– 29. Frost, Robert. 1920. “The Road Not Taken.” In Mountain Interval, edited by Robert Frost. New York: Henry Holt. Garfinkel, Irwin. 1973. “Is In-Kind Redistribution Efficient?” Quarterly Journal of Economics 87 (2): 320– 30. Gechter, Michael. 2014. “Generalizing the Results from Social Experiments.” Unpublished Manuscript, Boston University. Greenberg, David, Charles Michalopoulos, and Philip Robins. 2003. “A MetaAnalysis of Government-Sponsored Training Programs.” Industrial and Labor Relations Review 57 (1): 31– 53. Greenberg, David, Charles Michalopoulos, and Philip Robins. 2004. “What Happens to the Effects of Government-Funded Training Programs over Time?” Journal of Human Resources 39 (1): 277– 93. Greenberg, David, and Philip Robins. 2008. “Incorporating Nonmarket Time into Benefit-Cost Analyses of Social Programs: An Application to the Self-Sufficiency Project.” Journal of Public Economics 92:766– 94. Greenberg, David, and Philip Robins. 2011. “Have Welfare- to-Work Programs Improved over Time in Putting Welfare Recipients to Work?” Industrial and Labor Relations Review 64 (5): 920– 30. Greenberg, David, and Mark Shroder. 2004. The Digest of Social Experiments, 3rd ed. Washington, DC: Urban Institute Press. Greenberg, David, Mark Shroder, and Matthew Onstott. 1999. “The Social Experiment Market.” Journal of Economic Perspectives 13 (3): 157– 72. Gueron, Judith, and Howard Ralston. 2013. Fighting for Reliable Evidence. New York: Russell Sage Foundation. Ham, John, and Robert LaLonde. 1996. “The Effect of Sample Selection and Initial Conditions in Duration Models: Evidence from Experimental Data.” Econometrica 64 (1): 175– 205. Heckman, James. 1992. “Randomization and Social Policy Evaluation.” In Evaluat-

Employment and Training Programs

227

ing Welfare and Training Programs, edited by Charles Manski and Irwin Garfinkel, 201– 30. Cambridge, MA: Harvard University Press. Heckman, James, Carolyn Heinrich, Pascal Courty, Gerald Marschke, and Jeffrey Smith. 2011. The Performance of Performance Standards. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Heckman, James, Carolyn Heinrich, and Jeffrey Smith. 2002. “The Performance of Performance Standards.” Journal of Human Resources 37 (4): 778– 811. Heckman, James, Neil Hohmann, Jeffrey Smith, and Michael Khoo. 2000. “Substitution and Dropout Bias in Social Experiments: A Study of an Influential Social Experiment.” Quarterly Journal of Economics 115 (2): 651– 94. Heckman, James, Hidehiko Ichimura, Jeffrey Smith, and Petra Todd. 1998. “Characterizing Selection Bias Using Experimental Data.” Econometrica 66 (5): 1017– 98. Heckman, James, Hidehiko Ichimura, and Petra Todd. 1997. “Matching as an Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme.” Review of Economic Studies 64 (4): 605– 54. Heckman, James, and Alan Krueger. 2003. Inequality in America. Cambridge, MA: MIT Press. Heckman, James, Robert LaLonde, and Jeffrey Smith. 1999. “The Economics and Econometrics of Active Labor Market Programs.” In Handbook of Labor Economics, vol. 3A, edited by Orley Ashenfelter and David Card, 1865– 2097. Amsterdam: North-Holland. Heckman, James, Lance Lochner, and Christopher Taber. 1998. “GeneralEquilibrium Treatment Effects: A Study of Tuition Policy.” American Economic Review 88 (2): 381– 86. Heckman, James, and Salvador Navarro. 2004. “Using Matching, Instrumental Variables, and Control Functions to Estimate Economic Choice Models.” Review of Economics and Statistics 86 (1): 30– 57. Heckman, James, and Richard Robb. 1985. “Alternative Methods for Evaluating the Impacts of Interventions: An Overview.” Journal of Econometrics 30 (1– 2): 239– 67. Heckman, James, and Jeffrey Smith. 1995. “Assessing the Case for Social Experiments.” Journal of Economic Perspectives 9 (2): 85– 110. ———. 1999. “The Pre-Program Earnings Dip and the Determinants of Participation in a Social Program: Implications for Simple Program Evaluation Strategies.” Economic Journal 109 (457): 313– 48. ———. 2000. “The Sensitivity of Experimental Impact Estimates: Evidence from the National JTPA Study.” In Youth Employment and Joblessness in Advanced Countries, edited by David Blanchflower and Richard Freeman, 331– 56. Chicago: University of Chicago Press. ———. 2004. “The Determinants of Participation in a Social Program: Evidence from the Job Training Partnership Act.” Journal of Labor Economics 22 (2): 243– 98. Heckman, James, Jeffrey Smith, and Nancy Clements. 1997. “Making the Most Out of Programme Evaluations and Social Experiments: Accounting for Heterogeneity in Programme Impacts.” Review of Economic Studies 64 (4): 487– 535. Heckman, James, Jeffrey Smith, and Christopher Taber. 1996. “What Do Bureaucrats Do? The Effects of Performance Standards and Bureaucratic Preferences on Acceptance into the JTPA Program.” In Advances in the Study of Entrepreneurship, Innovation and Economic Growth: Reinventing Government and the Problem of Bureaucracy, vol. 7, edited by Gary Libecap, 191– 217. Greenwich, CT: JAI Press. ———. 1998. “Accounting for Dropouts in Evaluations of Social Programs.” Review of Economics and Statistics 80 (1): 1– 14.

228

Burt S. Barnow and Jeffrey Smith

Heinberg, John, John Trutko, Burt Barnow, Mary Farrell, and Asaph Glosser. 2005. “Unit Costs of Intensive and Training Services for WIA Adults and Dislocated Workers: An Exploratory Study of Methodologies and Estimates in Selected States and Localities: Final Report.” Report Prepared for US Department of Labor, Employment and Training Administration, Washington, DC. Heinrich, Carolyn. 2004. “Improving Public-Sector Performance Management: One Step Forward, Two Steps Back?” Public Finance and Management 4 (3): 317– 51. Heinrich, Carolyn, and Peter Mueser. 2014. “Training Program Impacts and the Onset of the Great Recession.” Unpublished Manuscript, University of Missouri. Heinrich, Carolyn, Peter Mueser, and Kenneth Troske. 2008. Workforce Investment Act Non-Experimental Net Impact Evaluation: Final Report. Washington, DC: IMPAQ International. Heinrich, Carolyn, Peter Mueser, Kenneth Troske, Kyung-Seong Jeon, and Daver Kahvecioglu. 2013. “Do Public Employment and Training Programs Work?” IZA Journal of Labor Economics 2:6. Hendra, Richard, James Riccio, Richard Dorsett, David Greenberg, Genevieve Knight, Joan Phillips, Philip Robins, Sandra Vegeris, and Johana Walter. 2011. “Breaking the Low-Pay, No-Pay Cycle: Final Evidence from the UK Employment Retention and Advancement (ERA) Demonstration.” Report no. 765, UK Department for Work and Pensions Research. Herman, Alexis. 1998. Implementing the Workforce Investment Act of 1998. US Department of Labor, Employment and Training Administration. Accessed November 9, 2014. www.doleta.gov/usworkforce/documents/misc/wpaper3.cfm. Hirano, Keisuke, Guido Imbens, and Geert Ridder. 2003. “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score.” Econometrica 71 (4): 1161– 89. Hollenbeck, Kevin. 2009. “Return on Investment Analysis of a Selected Set of Workforce System Programs in Indiana.” Report submitted to the Indiana Chamber of Commerce Foundation, Indianapolis, IN. http://research.upjohn.org/reports/15. ———. 2011. “Short-Term Net Impact Estimates and Rates of Return.” In The Workforce Investment Act: Implementation Experiences and Evaluation Findings, edited by Douglas Besharov and Phoebe Cottingham, 347– 70. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Hollenbeck, Kevin, and Wei-Jang Huang. 2014. “Net Impact and Benefit-Cost Estimates of the Workforce Development System in Washington State.” Technical Report no. 13-029, Upjohn Institute. Hotz, V. Joseph. 1992. “Designing an Evaluation of the Job Training Partnership Act.” In Evaluating Welfare and Training Programs, edited by Charles Manski and Irwin Garfinkel, 76– 114. Cambridge, MA: Harvard University Press. Hotz, V. Joseph, Guido Imbens, and Jacob Klerman. 2006. “Evaluating the Differential Effects of Alternative Welfare- to-Work Training Components: A Reanalysis of the California GAIN Program.” Journal of Labor Economics 24 (3): 521– 66. Hotz, V. Joseph, Guido Imbens, and Julie Mortimer. 2005. “Predicting the Efficacy of Future Training Programs Using Past Experiences at Other Locations.” Journal of Econometrics 125:241– 70. Hotz, V. Joseph, and Karl Scholz. 2002. “Measuring Employment and Income Outcomes for Low-Income Populations with Administrative and Survey Data.” In Studies of Welfare Populations: Data Collection and Research Issues, 275– 315. Washington, DC: National Research Council: National Academy Press. Huber, Erika, David Kassabian, and Elissa Cohen. 2014. Welfare Rules Databook: State TANF Policies as of July 2013, OPRE Report 2014–52. Washington, DC: Office of Planning, Research and Evaluation, Administration for Children and Families, US Department of Health and Human Services.

Employment and Training Programs

229

Huber, Martin, Michael Lechner, and Conny Wunsch. 2013. “The Performance of Estimators Based on the Propensity Score.” Journal of Econometrics 175:1– 21. Iacus, Stefano, Gary King, and Giuseppe Porro. 2012. “Causal Inference without Balance Checking: Coarsened Exact Matching.” Political Analysis 20 (1): 1– 24. Institute of Education Sciences (IES), US Department of Education. 2008. Rigor and Relevance Redux: Director’s Biennial Report to Congress. (IES 2009– 6010). Washington, DC: IES. Jacobson, Louis, Robert LaLonde, and Daniel Sullivan. 1993. “Earnings Losses of Displaced Workers.” American Economic Review 83 (4): 685– 709. ———. 2005. “Estimating the Returns to Community College Schooling for Displaced Workers.” Journal of Econometrics 125:271– 304. Johnson, George. 1979. “The Labor Market Displacement Effect in the Analysis of the Net Impact of Manpower Training Programs.” In Evaluating Manpower Training Programs: Research in Labor Economics (supplement 1), edited by F. E. Bloch, 227– 54. Greenwich, CT: JAI Press. Johnson, George, and James Tomola. 1977. “The Fiscal Substitution Effect of Alternative Approaches to Public Service Employment Policy.” Journal of Human Resources 12 (1): 3– 26. Johnston, Janet. 1987. The Job Training Partnership Act: A Report by the National Commission for Employment Policy. Washington, DC: Government Printing Office. Kemple, James, Fred Doolittle, and John Wallace. 1993. The National JTPA Study: Site Characteristics and Participation Patterns. New York: MDRC. Kesselman, Jonathon. 1978. “Work Relief Programs in the Great Depression.” In Creating Jobs: Public Employment Programs and Wage Subsidies, edited by John Palmer. Washington, DC: The Brookings Institution. King, Christopher. 1999. “Federalism and Workforce Policy Reform.” Publius: The Journal of Federalism 29 (2): 53– 71. King, Christopher T., and Burt S. Barnow. 2011. “The Use of Market Mechanisms.” In The Workforce Investment Act: Implementation Experiences and Evaluation Findings, edited by Douglas Besharov and Phoebe Cottingham, 81– 111. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Koenker, Roger, and Gilbert Basset. 1978. “Regression Quantiles.” Econometrica 46:33– 50. Kornfeld, Robert, and Howard Bloom. 1999. “Measuring Program Impacts on Earnings and Employment: Do Unemployment Insurance Wage Reports of Employers Agree with Surveys of Individuals?” Journal of Labor Economics 17 (1): 168– 97. Krolikowski, Pawel. 2014. “Reassessing the Experience of Displaced Workers.” Unpublished Manuscript, University of Michigan. LaLonde, Robert. 1986. “Evaluating the Econometric Evaluations of Training Programs with Experimental Data.” American Economic Review 76 (4): 604– 20. ———. 2003. “Employment and Training Programs.” In Means-Tested Transfer Programs in the United States, edited by Robert Moffitt, 517– 85. Chicago: University of Chicago Press. Lechner, Michael, and Jeffrey Smith. 2007. “What is the Value Added by Caseworkers?” Labour Economics 14 (2): 135– 51. Lechner, Michael, and Stephan Wiehler. 2011. “Kids or Courses? Gender Differences in the Effects of Active Labor Market Policies.” Journal of Population Economics 24 (3): 783– 812. Lechner, Michael, and Conny Wunsch. 2009. “Are Training Programs More Effective When Unemployment is High?” Journal of Labor Economics 27 (4): 653– 92. ———. 2013. “Sensitivity of Matching-Based Program Evaluations to the Availability of Control Variables.” Labour Economics 21:111– 21.

230

Burt S. Barnow and Jeffrey Smith

Levitan, Sar, and Frank Gallo.1988. A Second Chance: Training for Jobs. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Lise, Jeremy, Shannon Seitz, and Jeffrey Smith. 2004. “Equilibrium Policy Experiments and the Evaluation of Social Programs.” NBER Working Paper no. 10283, Cambridge, MA. Long, David, Charles Maller, and Craig Thornton. 1981. “Evaluating the Benefits and Costs of the Job Corps.” Journal of Policy Analysis and Management 1 (1): 55– 76. Lower-Basch, Elizabeth. 2014. SNAP E&T. Washington, DC: Center for Law and Social Programs. Maguire, Sheila, Joshua Freely, Carol Clymer, Maureen Conway, and Deena Schwartz. 2010. Tuning In to Local Labor Markets: Findings From the Sectoral Employment Impact Study. Philadelphia: Public/Private Ventures. Mallar, Charles, Stuart Kerachsky, Craig Thornton, and David Long. 1982. “Evaluation of the Economic Impact of the Job Corps Program: Third Follow-Up Report.” Mathematica Policy Research, Princeton, NJ. Mangum, Garth. 1968. MDTA: The Foundation of Federal Manpower Policy. Baltimore: Johns Hopkins University Press. McCall, Brian, Jeffrey Smith, and Conny Wunsch. 2016. “Government-Sponsored Vocational Training.” In Handbook of the Economics of Education, vol. 5, edited by Eric Hanushek, Stephen Machin, and Ludger Woessman, 479–652. Amsterdam: North Holland. McConnell, Sheena, Paul Decker, and Irma Perez-Johnson. 2006. “The Role of Counseling in Voucher Programs: Findings from the Individual Training Account Experiment.” Unpublished Manuscript, Mathematica Policy Research. McConnell, Sheena, and Steven Glazerman. 2001. National Job Corps Study: The Benefits and Costs of Job Corps. Princeton, NJ: Mathematica Policy Research. McConnell, Sheena, Elizabeth Stuart, Kenneth Fortson, Paul Decker, Irma PerezJohnson, Barbara Harris, and Jeffrey Salzman. 2006. “Managing Customers’ Training Choices: Findings from the Individual Training Account Experiment: Final Report.” Mathematica Policy Research, Princeton, NJ. Mikelson, Kelly, and Demetra Nightingale. 2004. Estimating Public and Private Expenditures on Occupational Training in the United States. Washington, DC: The Urban Institute. Mirengoff, William, and Lester Rindler. 1978. CETA: Manpower Programs under Local Control. Washington, DC: National Academy of Sciences. Moffitt, Robert. 1983. “An Economic Model of Welfare Stigma.” American Economic Review 73 (5): 1023– 35. Mueller-Smith, Michael. 2015. “The Criminal and Labor Market Impacts of Incarceration.” Unpublished Manuscript, University of Michigan. Muller, Seán. 2015. “Interaction and External Validity: Obstacles to the Policy Relevance of Randomized Evaluations.” World Bank Economic Review 29 (Supp. 1): S226– 37. Musgrave, Richard. 1959. The Theory of Public Finance: A Study in Public Economy. New York: McGraw-Hill. Musgrave, Richard, and Peggy Musgrave. 1989. Public Finance in Theory and Practice (5th ed.). New York: McGraw-Hill. Nathan, Richard, Robert Cook, Lane Rawlins, et al. 1981. Public Service Employment: A Field Evaluation. Washington, DC: The Brookings Institution. National Skills Coalition. 2014. Side-by-Side Comparison of Occupational Training and Adult Education & Family Literacy Provisions in the Workforce Investment Act and the Workforce Innovation and Opportunity Act. Washington, DC: National Skills Coalition.

Employment and Training Programs

231

Neyman, Jerzy. 1923. “Statistical Problems in Agricultural Experiments.” Journal of the Royal Statistical Society 2:107– 80. O’Leary, Christopher, Robert Straits, and Stephen Wandner. 2004. “US Job Training: Types, Participants, and History.” In Job Training Policy in the United States, edited by Christopher O’Leary, Robert Straits, and Stephen Wandner, 1– 20. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Orr, Larry. 1998. Social Experiments: Evaluating Public Programs with Experimental Methods. New York: Sage Publications. Park, Jooyoun. 2012. “Does Occupational Training by the Trade Adjustment Assistance Program Really Help Reemployment? Success Measured as Matching.” Review of International Economics 20 (5): 999– 1016. Patel, Nisha, and Steve Savner. 2001. Implementation of Individual Training Account Policies under the Workforce Investment Act: Early Information from Local Areas. Washington, DC: Center for Law and Social Policy. Pederson, Jonas, Michael Rosholm, and Michael Svarer. 2012. “Experimental Evidence on the Effects of Early Meetings and Activation.” IZA Discussion Paper no. 6970, Institute for the Study of Labor. Perez-Johnson, Irma, Quinn Moore, and Robert Santillano. 2011. Improving the Effectiveness of Individual Training Accounts: Long-Term Findings from an Experimental Evaluation of Three Service Delivery Models: Final Report. Princeton, NJ: Mathematica Policy Research. Perry, Charles, Bernard Anderson, Richard Rowan, and Herbert Northrup. 1975. The Impact of Government Manpower Programs in General, and on Minorities and Women. Philadelphia: Industrial Research Unit, the Wharton School, University of Pennsylvania. Plesca, Miana. 2010. “A General Equilibrium Analysis of the Employment Service.” Journal of Human Capital 4 (3): 274– 329. Plesca, Miana, and Jeffrey Smith. 2005. “Rules Versus Discretion in Social Programs: Empirical Evidence on Profiling in Employment and Training Programs.” Unpublished Manuscript, University of Maryland. ———. 2007. “Evaluating Multi-Treatment Programs: Theory and Evidence from the US Job Training Partnership Act Experiment.” Empirical Economics 32 (2– 3): 491– 528. Puhani, Patrick. 2000. “The Heckman Correction for Sample Selection and its Critique.” Journal of Economic Surveys 14 (1): 53– 68. Puma, Michael, and Nancy Burstein. 1994. “The National Evaluation of the Food Stamp Employment and Training Program.” Journal of Policy Analysis and Management 13 (2): 311– 30. Quandt, Richard. 1972. “Methods of Estimating Switching Regressions.” Journal of the American Statistical Association 67:306– 10. Radin, Beryl. 2006. Challenging the Performance Movement: Accountability, Complexity and Democratic Values. Washington, DC: Georgetown University Press. Rosholm, Michael. 2014. “Do Caseworkers Help the Unemployed? Evidence for Making a Cheap and Effective Twist to Labor Market Policies for Unemployed Workers.” IZA World of Labor 2014: 72. doi:10.15185. http://wol.iza.org/articles /do- case- workers- help- the- unemployed.pdf. Roy, A. D. 1951. “Some Thoughts on the Distribution of Earnings.” Oxford Economic Papers 3:135– 46. Rubin, Donald. 1974. “Estimating Causal Effects of Treatments in Randomized and Non- Randomized Studies.” Journal of Educational Psychology 66:688– 701. Schochet, Peter, Ronald D’Amico, Jillian Berk, Sarah Dolfin, and Nathan Wozny. 2012. Estimated Impacts for Participants in the Trade Adjustment Assistance

232

Burt S. Barnow and Jeffrey Smith

(TAA) Program under the 2002 Amendments. Princeton, NJ: Mathematica Policy Research. Schochet, Peter, and John Burghardt. 2008. “Do Job Corps Performance Measures Track Program Impacts?” Journal of Policy Analysis and Management 27 (3): 556– 76. Schochet, Peter, John Burghardt, and Steven Glazerman. 2001. National Job Corps Study: The Impacts of Job Corps on Participants’ Employment and Related Outcomes. Princeton, NJ: Mathematica Policy Research. Schochet, Peter, John Burghardt, and Sheena McConnell. 2008. “Does Job Corps Work?” Impact Findings from the National Job Corps Study.” American Economic Review 98 (5): 1864– 86. Sianesi, Barbara. 2014. “Dealing with Randomization Bias in a Social Experiment: The Case of ERA.” IFS Working Paper no. W14/10, Institute for Fiscal Studies. Smith, Jeffrey. 1992. The JTPA Selection Process: A Descriptive Analysis. Report Submitted to the US Department of Labor as part of the National JTPA Study. ———. 1997. “Measuring Earnings Levels among the Poor: Evidence from Two Samples of JTPA Eligibles.” Unpublished Manuscript, University of Western Ontario. ———. 2011. “Improving Impact Evaluation in Europe.” In The Workforce Investment Act: Implementation Experiences and Evaluation Findings, edited by Douglas Besharov and Phoebe Cottingham, 473– 94. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Smith, Jeffrey, and Petra Todd. 2005a. “Does Matching Overcome LaLonde’s Critique of Nonexperimental Methods?” Journal of Econometrics 125 (1– 2): 305– 53. ———. 2005b. “Rejoinder.” Journal of Econometrics 125 (1– 2): 365– 75. Smith, Jeffrey, and Alexander Whalley. 2015. “How Well Do We Measure Public Job Training?” Unpublished Manuscript, University of Michigan. Social Policy Research Associates. 2013. PY 2012 WIASRD Data Book. Oakland, CA: Social Policy Research Associates. Spaulding, Shayne. 2001. “Performance-Based Contracting under the Job Training Partnership Act.” Master’s thesis, Johns Hopkins University. Stapleton, David, Gina Livermore, Craig Thornton, Bonnie O’Day, Robert Weathers, Krista Harrison, So O’Neil, Emily Sama Martin, David Wittenburg, and Debra Wrig. 2008. Ticket to Work at the Crossroads: A Solid Foundation with an Uncertain Future. Princeton, NJ: Mathematica Policy Research. Taggart, Robert. 1981. A Fisherman’s Guide: An Assessment of Training and Remediation Strategies. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Todd, Petra, and Kenneth Wolpin. 2006. “Assessing the Impact of a School Subsidy Program in Mexico Using a Social Experiment to Validate a Dynamic Behavioral Model of Child Schooling and Fertility.” American Economic Review 96 (5): 1384– 417. Trutko, John, and Burt Barnow. 1997. Implementation of the 1992 Job Training Partnership Act (JTPA) Amendments: Report to Congress. Washington, DC: US Department of Labor, Employment and Training Administration. ———. 1999. Vouchers under JTPA: Lessons for Implementation of the Workforce Investment Act. Arlington, VA: James Bell Associates. ———. 2007. Variation in Training Rates across States and Local Workforce Investment Boards: Final Report. Arlington, VA: Capital Research Corporation. ———. 2010. Implementing Efficiency Measures for Employment and Training Programs: Final Report. Arlington, VA: Capital Research Corporation. US Bureau of the Budget. 1966. Appendix of the Budget of the United States Government for Fiscal Year 1967. Washington, DC: US Government Printing Office.

Employment and Training Programs

233

———. 1967. Appendix of the Budget of the United States Government for Fiscal Year 1968. Washington, DC: US Government Printing Office. ———. 1968. Appendix of the Budget of the United States Government for Fiscal Year 1969. Washington, DC: US Government Printing Office. ———. 1969. Appendix of the Budget of the United States Government for Fiscal Year 1970. Washington, DC: US Government Printing Office. ———. 1970. Appendix of the Budget of the United States Government for Fiscal Year 1971. Washington, DC: US Government Printing Office. US Department of Agriculture. 2014. Food and Nutrition Service: 2015 Explanatory Notes. Accessed March 29, 2015. http://www.obpa.usda.gov/32fns2015notes.pdf. US Department of Education. 2014. Programs: Adult Education: Basic Grants to States. Accessed January 28, 2015. http://www2.ed.gov/programs/adultedbasic /funding.html. ———. 2014.1 Programs: Federal Pell Grant Program. Accessed February 1, 2015. http://www2.ed.gov/programs/fpg/index.html. ———. 2015. Federal Student Aid: Federal Pell Grants. Accessed February 1, 2015. https://studentaid.ed.gov/types/grants- scholarships/pell. US Department of Health and Human Services. 2014. TANF Financial Data—FY 2013. Accessed January 28, 2015. http://www.acf.hhs.gov/programs/ofa/resource /tanf- financial- data- fy- 2013. US Department of Labor. No date. Budget Authority from 1948–1989. Unpublished internal document. ———. 1973. Manpower Report of the President: A Report on Manpower Requirements, Resources, Utilization, and Training. Washington, DC: US Government Printing Office. ———. 2000. One-Stop Partners. Accessed November 15, 2014. http://www.doleta .gov/programs/factsht/pdf/onestoppartners.pdf. ——— 2010a. Wagner-Peyser/Labor Exchange. Accessed November 1, 2014. http:// www.doleta.gov/programs/wagner_peyser.cfm. ———. 2010b. Workforce Investment Act One-Stop Partners. Accessed November 15, 2014. http://www.doleta.gov/usworkforce/onestop/partners.cfm. ———. 2013. About Job Corps. Accessed November 23, 2014. http://www.jobcorps .gov/AboutJobCorps.aspx. ———. 2014a. Budget Authority Tables: Training and Employment Programs. Access February 15, 2015. http://www.doleta.gov/budget/bahist.cfm. ———. 2014b. ETA Programs for Migrant and Seasonal Farmworkers. Accessed November 23, 2014. http://www.doleta.gov/Farmworker/. ———. 2014c. FY 2015 Department of Labor Budget in Brief. Washington, DC: US Department of Labor. ———. 2014d. Quarterly Workforce System Results. Accessed November 22, 2014. http://www.doleta.gov/performance/results/eta_default.cfm#wiastann. ———. 2014e. Reintegration of Ex-Offenders (RExO). Accessed November 23, 2014. http://www.doleta.gov/RExO/. ———. 2014f. Senior Community Service Employment Program. Accessed November 24, 2014. http://www.doleta.gov/seniors/. ———. 2014g. The Trade Adjustment Assistance Program Brochure. Accessed November 24, 2014. http://www.doleta.gov/tradeact/docs/program_brochure2014 .pdf. ———. 2014h. Training and Employment Guidance Letter No. 36–10 (TEGL 36– 10). Accessed November 22, 2014. http:// wdr.doleta.gov/directives/corr_doc .cfm?DOCN=3052. ———. 2014i. VETS Employment Services Fact Sheet 1. Accessed November 23, 2014. http://www.dol.gov/vets/programs/empserv/employment_services_fs.htm.

234

Burt S. Barnow and Jeffrey Smith

———. 2014j. VETS HVRP Fact Sheet. Accessed November 23, 2014. http://www .dol.gov/vets/programs/fact/Homeless_veterans_fs04.htm. ———. 2014k. WIA Youth Formula Funded Program. Accessed November 23, 2014. http://www.doleta.gov/youth_services/wiaformula.cfm. ———. 2014l. Workforce Innovation and Opportunity Act (WIOA) Factsheet. Accessed November 16, 2014. http://www.doleta.gov/wioa/pdf/WIOA-Factsheet .pdf ———. 2014m. Workforce Investment Act: Adults and Dislocated Workers Program. Accessed November 15, 2014. http://www.doleta.gov/programs/general_info.cfm. ———. 2014n. YouthBuild. Accessed November 24, 2014. http://www.doleta.gov /Youth_services/Youth_Build.cfm ———. 2015. Budget Authority Tables: Training and Employment Programs. Accessed May 15, 2015. http://www.doleta.gov/budget/bahist.cfm. ———. 2015. Summary of Appropriation Budget Authority, Fiscal Year 2014. Accessed April 18, 2015. http://www.doleta.gov/budget/docs/14_final_appropria tion_action.pdf. US General Accounting Office (GAO). 1996. Job Training Partnership Act: Long Term Earnings and Employment Outcomes. Report HEHS- 96– 40. Washington, DC: US General Accounting Office. ———. 2002. Workforce Investment Act: Youth Provisions Promote New Service Strategies, but Additional Guidance Would Enhance Program Development. Report GAO- 02– 413. Washington, DC: US General Accounting Office. US Office of Management and Budget. 1992. Guidelines and Discount Rates for Benefit-Cost Analysis of Federal Programs. Circular no. A- 94 Revised. https:// www.whitehouse.gov/omb/circulars_a094/. Van Horn, Carl, and Aaron Fichter. 2011. “Eligible Training Provider Lists and Consumer Report Cards.” In The Workforce Investment Act: Implementation Experiences and Evaluation Findings, edited by Douglas Besharov and Phoebe Cottingham, 153– 72. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research. Van Horn, Carl, Kathy Krepcio, and Stephen Wandner. 2015. Identifying Gaps and Setting Strategic Priorities for Employment and Training Research (2014–2019). Report prepared for US Department of Labor, Employment and Training Administration. Vivalt, Eva. 2015. “How Much Can We Generalize from Impact Evaluations?” Unpublished Manuscript, Stanford University. Wallace, Geoffrey, and Robert Haveman. 2007. “The Implications of Differences between Employer and Worker Employment/Earnings Reports for Policy Evaluation.” Journal of Policy Analysis and Management 26 (4): 737– 54. White, Michael, and Jane Lakey. 1992. The Restart Effect: Does Active Labour Market Policy Reduce Unemployment? London: Policy Studies Institute. Wilson, James. 1989. Bureaucracy: What Government Agencies Do and Why They Do It. New York: Basic Books. Wunsch, Conny. 2013. “Optimal Use of Labor Market Policies: The Role of Job Search Assistance.” Review of Economics and Statistics 95 (3): 1030– 45. YouthBuild. https://youthbuild.org/.

4

Early Childhood Education Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

4.1

Introduction

Recent research demonstrates that the effects of adverse early childhood environments persist over a lifetime (Knudsen et al. 2006). Substantial gaps between the environments of advantaged children and those of disadvantaged children raise serious concerns about the life prospects of disadvantaged children and the state of social mobility in America.1 The proliferation of single- parent households—especially households where children have never had a father present—is a major contributor to Sneha Elango is a research professional at the University of Chicago. Jorge Luis García is a PhD candidate in the Department of Economics at the University of Chicago. James J. Heckman is the Henry Schultz Distinguished Service Professor of Economics and Director of the Center for the Economics of Human Development at the University of Chicago, a senior research fellow at the American Bar Foundation, and a research associate of the National Bureau of Economic Research. Andrés Hojman is a PhD candidate in the Department of Economics at the University of Chicago. This research was supported in part by the American Bar Foundation; the Pritzker Children’s Initiative; the Buffett Early Childhood Fund; NIH grants NICHD R37HD065072, NICHD R01HD054702, and NIA R24AG048081; an anonymous funder; and Successful Pathways from School to Work, an initiative of the University of Chicago’s Committee on Education funded by the Hymen Milgrom Supporting Organization. We are very grateful to Joshua Ka Chun Shea, Matthew C. Tauzer, and Anna Ziff for research assistance and useful comments. We thank Robert Moffitt, David Blau, the other authors of this volume, and Raquel Bernal, Avi Feller, Marianne Haramoto, Fernando Hoces, Michael Keane, Patrick Kline, Sylvi Kuperman, and Rich Neimand for valuable comments. The views expressed in this chapter are those of the authors and not necessarily those of the funders or persons named here or the official views of the National Institutes of Health. A web appendix for this chapter can be found at https://cehd.uchicago.edu/ECE-US. For acknowledgments, sources of research support, and disclosure of the authors’ material financial relationships, if any, please see http://www.nber .org/chapters/c13489.ack. 1. McLanahan (2004); Duncan and Murnane (2011).

235

236

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

the growth in inequality in childhood environments.2 In the United States, single parenthood is strongly correlated with child poverty. As a group, the children of single parents are less likely to succeed in life than children from stable two- parent households.3 This evidence and the evidence that gaps in advantage are growing across generations4 has prompted growing interest in improving the early- life opportunities of disadvantaged children.5 Concerns about the quality of childhood environments are fueled by growth in the labor force participation of women with children.6 This growth raises concerns about the supply of child care and its quality. Disadvantaged parents often lack access to high- quality child care and single- parent families are especially vulnerable.7 The percentage of children who grow up in poverty has increased from 16 percent in 2000 to 21 percent in 2013.8 These dual concerns have stimulated interest in public provision of early childhood education programs to ease the burden of child care for working mothers and to enhance the opportunities available to disadvantaged children. High- quality early childhood education programs enrich the learning and nurturing environments of disadvantaged children. An accumulating body of evidence shows the beneficial effects of these programs. They are the topic of many discussions among academics, mainstream media, and policymakers. The Obama administration has promoted programs like Head Start as vehicles of opportunity and social mobility and has called for increased federal investment in high- quality programs developed and administered by states (The White House 2014a). This chapter organizes and synthesizes the evidence on a variety of early childhood programs. We consider the evidence on means- tested programs.9 Eligibility for these programs is determined by a measure of childhood poverty (either family income or close surrogates for it). We also consider the evidence on universal preschool programs.10 We gather the evidence on the programs with the most rigorous evaluations for which the reported results can be replicated. We also devote some 2. McLanahan (2004); Heckman (2008). 3. McLanahan and Percheski (2008). 4. Putnam (2015). 5. Office of the Mayor, New York City (2014). 6. Calculations using the Current Population Survey indicate that, between 1960 to 2010, maternal labor market attachment increased from 41 percent to 65 percent for single mothers (with children) and 20 percent to 60 percent for married mothers. Most of these single mothers had children residing with them—in 1960, 91 percent of children in single- parent families lived with their mothers; this fell slightly to 87 percent in 2010. 7. Blau (2003). 8. Rates of child poverty are calculated using the Current Population Survey. Poverty is defined as growing up in a household below the federal poverty line. 9. “Means tested” in this chapter refers to programs with eligibility criteria based on income, socioeconomic status, or other measures of disadvantage. 10. Universal programs have age requirements for children but are not means tested. However, many advocate universal programs with sliding- fee schedules are based on family income, which effectively make them means tested.

Early Childhood Education

237

attention to the evidence from programs with flawed or limited evaluations, but do not place much weight on it. We compare the treatments, treated populations, and treatment effects across a broad range of programs. We go beyond the standard, often very limited, discussions of the benefits of early childhood education. We consider a richer collection of outcome measures, in addition to the scores on IQ or achievement tests that receive so much attention in the literature. We consider multiple outcomes across the life cycle, for example, physical and mental health, criminal activity, earnings, and social engagement. We assess the economic and social rates of return for programs that have the necessary data. We do not rely exclusively on evidence from randomized control trials. We use credible causal evidence from a broad range of studies using different methodologies. The evidence we assemble shows agreement across studies: there is a strong case for high- quality early childhood education for disadvantaged children. It improves the early- life environments of disadvantaged children, which in turn boost a variety of early- life skills and later- life achievements. We address two distinct questions that are frequently conflated. The first is whether or not early childhood programs are effective. The second is whether or not these programs should be subsidized by governments. The answer to the first question depends on the quality of the program being offered and the alternatives available and their costs. Any measure of effectiveness is a relative statement. The proper question is: Effective relative to what? Affluent families have better alternatives and generally do not benefit from the public provision of early childhood education aimed at median or disadvantaged populations. In contrast, high- quality versions of such programs are consistently found to benefit disadvantaged children and have substantial economic and social rates of return.11 Failure to account for the quality of child- care alternatives and the quality of home environments leads analysts to make misleading statements about program effectiveness. A recent example is the Head Start Impact Study (HSIS).12 Analyses that fail to account for the child- care alternatives available to control participants understate the effects of Head Start. Analyses that account for these alternatives show that Head Start actually has moderate to strong effects on measures of cognitive and noncognitive skills13 compared to home care, but not necessarily when compared with other quality center- based child care. The answer to the second question is that the evidence in hand supports 11. This conclusion is consistent with previous studies that argue that disadvantaged children greatly benefit from early childhood education. See, for example, Blau and Currie (2006), Duncan and Magnuson (2013), and Yoshikawa et al. (2013). We differ from these studies because we consider evidence from a broader range of studies using diverse but competent evaluation methodologies. 12. Puma et al. (2012). 13. Feller et al. (2014); Kline and Walters (2014); Zhai, Brooks-Gunn, and Waldfogel (2014).

238

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

public subsidy of high- quality programs targeted to disadvantaged populations. At current quality levels and costs, their social benefits exceed their social costs. There is little direct evidence on the effectiveness of the programs we study on the children of affluent families. This chapter does not address the general question of what the optimal provision of child care should be for persons in different economic strata. The answer to this question would take us too far afield. The economic case for universal early childhood programs is weak.14 The case often made for them is political in nature. Universality is sometimes sought to avoid stigma and to promote inclusion. The costs of offering such programs are diminished because, at the levels of quality usually proposed, the affluent are much less likely to use them.15 The programs discussed in this chapter are less attractive to them because they have better alternatives. Table 4.1 summarizes the programs we discuss and their basic features. We present detailed descriptions of these programs in sections 4.3– 4.5 and appendices A and B. Section 4.3 discusses the evidence from four experimental evaluations of demonstration programs: (a) the Perry Preschool Project (PPP); (b) the Carolina Abecedarian Project (ABC); (c) the Infant Health and Development Program (IHDP); and (d) the Early Training Project (ETP). Instead of just reporting estimates from the literature or doing a meta- analysis, we conduct a primary analysis of each program using a standardized format. We could not discuss the Chicago Parent-Child Program (Reynolds et al. 2011) in our analysis because we do not have access to the most updated and complete data for this program on which claims about its effectiveness are based. The PI has not cooperated to help us replicate its reported findings. Our access to data for the Nurse Family Partnership (NFP; Olds 2006) is similarly restricted. We consider the evidence on Head Start in section 4.4. Eligibility for it is means tested primarily on the basis of family income. Centers are free to pick their curricula and there is a lot of variety in the programs offered. We also discuss the evidence from a recently evaluated means- tested statewide program that shares some features in common with Head Start.16 The evidence on the benefits of universal programs discussed in section 4.5 comes from: (a) national programs in Canada and Norway, (b) state programs in Oklahoma and Georgia, and (c) a recent universal program in 14. Universal programs are defined as programs available to all children in a geographical area with only age as an eligibility criteria. Because they are voluntary, participation in universal programs is far from universal. For example, the take-up of the two major universal state programs in Georgia and Oklahoma for the years they are studied is 59 percent and 74 percent, respectively (Cascio and Schanzenbach 2013). Within these programs, 65 percent and 66 percent of participating children were low income as measured by eligibility for free or reducedprice lunch, which is offered to children whose families are at or below 185 percent of the federal poverty line. We discuss preschool take-up by socioeconomic status further in section 4.5. 15. Program costs would be diminished further if the affluent who used them were charged user fees, as some have proposed (Heckman 2008). 16. The Tennessee Pre-Kindergarten Program (Lipsey, Farran, and Hofer 2015).

✓ ✓

— — — —



— —

— — — —



✓ ✓

— — — —



Means-tested

✓ ✓ ✓ —

High disadvantage

✓ ✓ ✓ —

Low income

✓ ✓ ✓ —a

Criteria narrowly defined



— — — —

— —

✓ ✓ ✓ —

Homogeneous treatment —

— — — —

—b —

✓ ✓ — ✓

Medical services —

— — — —

✓ ✓

✓ — — ✓



— — — —

✓ ✓

— ✓ ✓ ✓

Home visiting

Parent involvement —

— — — —

✓ ✓

✓ ✓ ✓ ✓



— — — —

✓ —

✓ ✓ ✓ ✓

— ✓d

✓c ✓c ✓c —

✓ ✓

— — — — — —

✓ — — ✓

✓ ✓ ✓ —

Age of follow–ups —

— — — —

9 9 6 33 6

✓ —

✓ ✓ ✓ ✓

IQ

8 21

34 40 20 18

Achievement ✓

✓ ✓ ✓ ✓

✓ ✓

✓ ✓ ✓ ✓

Noncognitive ✓

— — ✓ —

✓ ✓

✓ ✓ — ✓

Measures available



— ✓ — ✓

— —

✓ — — ✓



— — — ✓

— ✓

✓ ✓ — —

Educational attainment —

— — — ✓

— ✓

✓ ✓ ✓ ✓

Use of public transfers —

— — — —

— ✓

✓ ✓ — —



— — — —

— ✓

✓ ✓ — —

Crime



— — — —

— ✓

✓ — — —

Notes: This table compares the programs from which we draw evidence: ABC = Carolina Abecedarian Project; PPP = Perry Preschool Project; ETP = Early Training Project; IHDP = Infant Health and Development Program; HSIS = Head Start Impact Study; TN-VPK = Tennessee Voluntary Prekindergarten Program; and Boston = Boston Public School Prekindergarten Program. “High disadvantage” refers to inclusion of home environment and other family characteristics in the eligibility criteria. “Criteria narrowly defined” indicates that the program serves a population that is narrowly defined in terms of eligibility on the basis of socioeconomic status or race. While Head Start serves predominately low-income children, the populations served vary greatly across sites in other important characteristics. “Homogeneous treatment” refers to approximately equivalent quality across sites or cohorts. a IHDP limited participation to low birth weight, premature children (≤ 2,500 grams, ≤ 37 weeks) who lived at most forty-five minutes away from treatment centers. b Although there are curricular guidelines and performance standards for Head Start, individual centers have flexibility in curriculum implementation and offer different services that are intended to meet the needs of the local population. Thus, we consider Head Start to have heterogeneous treatment, though there are similarities in treatment. Own calculations with HSIS data indicate that 30 percent of HSIS centers use a version of the HighScope curriculum, which was developed in the Perry Preschool Project. “Control contamination” refers to the use by control children of other programs. There is some information on the nature of control contamination for almost all of the programs. c These programs are not randomized control trials. There is evidence that a substantive part of the comparison groups in Boston and Oklahoma had access to center-based care. We assume that this can be extrapolated for the case of Georgia, where the information is less clear. d There is not much known about control contamination in TN-VPK; however, control children were not prohibited from enrolling in other programs. “Sample characteristics” describe the features of the study design and data that impact evaluation. “Measures available” describes the data available from our cited studies.

Demonstration programs ABC PPP ETP IHDP Head Start HSIS NLSY79/CNLSY Universal programs State pre-K: OK State pre-K: GA Local pre-K: Boston Reform in Norway Other programs TN–VPK

Randomized control trial

Sample characteristics

Small sample

Content

Control contamination

Eligibility

Parenting skills

Comparing demonstration programs, Head Start, and universal preschool programs

Subject employment

Table 4.1

Health

240

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

Boston. Section 4.6 discusses nonexperimental evidence on the importance of quality environments in promoting child development. We summarize our findings in section 4.7. The goal of this chapter is to distill general lessons from the literature that can guide policy and not to endorse or attack any particular program. The literature is often marred by a “treatment effect” mentality that sees evaluation research as an up or down statement about whether a particular program “works” and not why it works or does not work. Our approach is to understand the mechanisms underlying successful early childhood education programs with an eye toward designing future approaches that improve on current practice. With this goal in mind, we next present a framework for interpreting the evidence within a general model of human development. 4.2

A Framework for Interpreting the Evidence

Before turning to our review of the literature, we present the guiding principles of this chapter. We first discuss a dynamic model of skill formation based on Cunha and Heckman (2007, 2009). It provides a framework for understanding the effectiveness of early interventions for disadvantaged children. We next consider arguments for public provisions of interventions. We then discuss how the availability of alternative child- care options affects the interpretation of the evidence from interventions. 4.2.1

The Formation of Skills over the Life Cycle

Cunha and Heckman (2007, 2009) develop a model of the evolution of skills over the life cycle. The central ingredient of this model is the technology of skill formation, graphically represented in figure 4.1. At life-cycle stage t, parental skills (tP), investment It, and child skills (t) determine the skills in the next period t + 1(t+1).17 Parents affect their children in multiple ways. Parents with greater parenting skills (tP) create warm, supportive, nourishing environments independent of their financial resources, the volume of time spent with children in direct instruction, or child development. Parents with greater financial and time resources can invest more in goods (e.g., tuition for pre-K) and time (e.g., taking a child to the zoo) captured by vector It. Whether they choose to do so depends in part on their preferences.18 Income is often used as a measure of child poverty, but it is a very crude one. An affluent but indifferent parent can provide an impoverished early childhood environment. Financially strapped families can nonetheless provide strong family environments through their attachment, warmth, and 17. t = –1 corresponds to the prenatal years. 18. See, for example, the review of the literature on parental preferences for child outcomes in Heckman and Mosso (2014).

Early Childhood Education

Fig. 4.1

241

Graphical representation of the technology of skill formation

Note: This figure illustrates the technology of skill formation, where links in the technology are represented by arrows. Dots represent periods that are not depicted in the diagram.

investment in time and caring. Public programs attempt to bolster both It and tP and also to provide information to parents. While this chapter focuses on “means- tested” programs, readers should recognize the inadequacy of equating childhood poverty with poverty in money income.19 The process of skill formation is dynamic and builds on itself. In the technology of skill formation, current stocks of skills help create future stocks of skills over the life cycle, and future skills have intergenerational impacts. These dynamic relationships make early life an important period because it lays the foundation for building skills later in life. The following points are established in the recent literature. 1. Skills are multiple. Individuals have many life- relevant skills beyond 19. See Mayer (1997) and Heckman and Mosso (2014).

242

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

the cognitive skills measured by IQ and achievement tests. These additional skills are variously referred to as noncognitive skills or character skills. They also include health and mental health. They are important predictors of successful lives. These skills are important to different degrees in different life tasks. Early education programs promote these skills. In assessing the success or failure of any intervention, a full inventory of the skills affected is an essential part of any reliable evaluation.20 2. Skills are self-productive and complement each other. Between any two periods in the life of a child, t and t + 1, a child’s stock of skills builds on itself (“skills beget skills”). Skills are not only self- productive, but also promote the production of other skills. Skills are said to complement each other in period t when together they promote skills in period t +1 more than each skill alone. Cognitive skills, noncognitive skills, and health in period t complement each other and produce cognitive skills, noncognitive skills, and health in period t + 1.21 3. Skills complement investment. By fostering early- life skills, early childhood education establishes a foundation that facilitates the accumulation of skills later in life.22 Early childhood education promotes life- cycle skill development by increasing the stock of future skills that promote the productivity of future investment. This feature of life- cycle investment is called dynamic complementarity. Under conditions confirmed empirically in Cunha, Heckman, and Schennach (2010), it is more productive to invest in disadvantaged children early in life than to remediate disadvantage later in life. This arises from the complementarity between later- life skills (acquired by earlylife investment) and later- life investments. Enriched, early- life investment helps disadvantaged children capture many of the same benefits of laterlife investment that are experienced by their more advantaged peers. The flip side of dynamic complementarity is that it is harder to remediate early disadvantage at older ages. Investment at later ages in adolescents lacking a strong early skill base is often much less productive than investment at early ages.23 These features of the technology of skill formation help to explain why supplementing parenting skills and the quality of investment offered to disadvantaged young children are socially fair and economically efficient strategies.24 4.2.2

Arguments for Subsidizing Early Childhood Education Programs

Many arguments have been made for subsidizing early childhood programs for disadvantaged families. Heckman and Mosso (2014) summarize the literature. 20. Heckman and Kautz (2012, 2014). 21. See, for example, Heckman and Mosso (2014). 22. Cunha and Heckman (2008); Cunha, Heckman, and Schennach (2010). 23. See Heckman and Kautz (2014). 24. Heckman and Mosso (2014).

Early Childhood Education

243

All of the arguments build on the evidence that early childhood environments have profound consequences on the lives of children, and affect the entire society through reduced crime, enhanced health, greater educational attainment, and greater social engagement. Adverse early childhood environments create externalities—effects on society as a whole—that parents (for whatever reason) do not act on or internalize. The exact reasons for deficits in early investment are debated. There are three classes of arguments. Some point to borrowing constraints facing disadvantaged families that have become more pronounced in recent decades with declining real wages for less educated workers and that are exacerbated by rising tuition costs (see Caucutt and Lochner 2012; Duncan and Murnane 2014). Under this argument, parents underinvest in children because their cost of investing is greater than the social cost of funds. With the growth in single- parent families and the need for women to work to support their families, time constraints on parents have also increased. The evidence on the importance of borrowing constraints is hotly debated (see, e.g., Mayer 1997; Heckman and Mosso 2014). As previously noted, more than money is involved in creating nourishing, productive child environments. The evidence that cash transfers to disadvantaged families have important effects on child development is weak. Other information-based arguments have been advanced that note the importance of family knowledge of best- practice child rearing.25 There is considerable evidence that disadvantaged parents lack the information required to be effective parents. Many programs (ETP, IHDP, PPP) are based on this premise and it is one reason for home- visiting programs. It is a justification for using in-kind transfers of information and direct supplements to parenting, rather than simple cash transfers. More controversial is the argument that parents lack sufficient altruism/ concern for their children. This paternalistic argument has evident merit in the case of abusive parents, or parents who deny children access to opportunities that would give them options the parents do not wish them to exercise (e.g., high school education for Amish children). This chapter does not evaluate the merits of these separate arguments. But the evidence shows that in contemporary American society, disadvantaged children face adverse child- rearing environments, and high- quality, targeted, in-kind policies that have been implemented are effective. 4.2.3

Two Policy Evaluation Questions

In evaluating program impacts on skill development, researchers must be careful in interpreting the evidence. Families differ in terms of the quality of the early environments offered to their children. Researchers need to dis25. See Cunha, Elo, and Culhane (2013) and Cunha (2015) for recent evidence on this question.

244

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

tinguish between two questions when evaluating program effectiveness. The first question is: “What is the causal effect of an early childhood education program relative to a particular child- care alternative, where one of these alternatives might be no treatment at all?” The second question is: “What is the causal effect of adding a program to the available choice set?”26 The first question addresses the effectiveness of a policy that offers a particular early education program compared to a particular alternative, for example, home care. The second question addresses the effectiveness of expanding the choice set available to parents, that is, adding one more alternative. Most of the evaluations we consider answer the second question, even though answers to it are often treated as answers to the first.27 These questions are often confused. In particular, estimating the causal effect of expanding the availability of choices—making a new program available—and interpreting such estimates as statements about the effectiveness of that program compared to no program at all, might suggest that a program is ineffective. If the control group of a study has access to alternatives that are good substitutes for the program being studied, and if the researcher erroneously assumes that the relevant alternative to the program being evaluated is home child care and not some higher- quality alternative, then there would appear to be no causal effect of the program’s availability—even though the program may be highly effective compared to home child care.28 This type of error is made in many evaluations of Head Start—particularly, in evaluations that use data from the Head Start Impact Study (HSIS). The control group in HSIS had access to treatment substitutes, which sometimes include other Head Start centers. Studies that ignore the availability of program substitutes find weak effects.29 Studies that account for the substitutes available find moderate- to-strong effects of Head Start compared to no program at all on measures of cognitive skills and noncognitive skills.30 We discuss this evidence in detail in section 4.4 after discussing the evidence from demonstration programs. A discussion of these programs is relevant to our analysis of Head Start. The curricula of these programs are embedded in versions of the curricula used in Head Start centers, although they are funded at lower levels than in the original programs. Our evidence on demonstration programs offers indirect evidence on the possibilities for success of an enriched Head Start program. 26. See Heckman and Vytlacil (2007). 27. Heckman et al. (2000) discuss these problems under the rubric of “substitution bias.” See also Heckman (1992). 28. See Heckman et al. (2000). 29. Puma et al. (2010, 2012). 30. Feller et al. (2014), Kline and Walters (2014), and Zhai, Brooks-Gunn, and Waldfogel (2014).

Early Childhood Education

4.3

245

Evidence from Demonstration Programs

This section analyzes the evidence from the demonstration programs listed in table 4.1. We conduct a new primary analysis of the four programs listed there rather than just a meta- analysis of existing studies. We first present the common features of the demonstration programs we analyze and our criteria for selecting them. We then describe them in subsection 4.3.2. We discuss common methodological issues that arise when analyzing these programs in subsection 4.3.3. In subsection 4.3.4 we present evidence on the short- term effects from these programs. We present evidence on long- term effects in subsection 4.3.5. Subsection 4.3.6 relates the short- term findings to the long- term findings. Subsection 4.3.7 discusses cost- benefit analyses for two major demonstration programs, PPP and ABC. Subsection 4.3.8 summarizes the discussion. 4.3.1

The Characteristics of the Demonstration Early Childhood Programs

The early childhood demonstration programs we consider are targeted social experiments designed to bolster various aspects of the early lives of disadvantaged children. Assignment to treatment is randomized, although noncompliance and attrition can compromise the inference from any randomization. These programs are all means tested, though they have different eligibility criteria. The evidence on demonstration programs is not always comparable across programs because they differ in terms of data availability, eligibility, quality, duration of treatment, length of follow-up, and other characteristics. Careful analysis is required in making valid cross- program comparisons of program effects. We discuss program differences and identify common components. The demonstration programs considered here have the following common features: 1. They are center based. This section focuses on four center- based programs: (a) the Perry Preschool Project (PPP); (b) the Carolina Abecedarian Project (ABC); (c) the Infant Health and Development Program (IHDP); and (d) the Early Training Project (ETP).31 31. We do not consider three important programs outside of the United States: the Mauritius Study, due to its excessive attrition by age forty (58 percent) (Raine et al. 2010); the Turkey Early Enrichment Program, also due to its excessive attrition by age twenty- six (49 percent) (Kagitcibasi et al. 2009); and the Jamaica Study (Gertler et al. 2014), which focused primarily on nutrition and home visits. We do not consider the Nurse Family Partnership program because it focused mainly on prenatal care (Olds et al. 1986; Olds, Henderson, and Kitzman 1994; Eckenrode et al. 2010; Heckman et al. 2014). Other programs in the United States that we do not consider include the following: the Milwaukee Project because data are unavailable (Page 1972; Sommer and Sommer 1983; Garber 1988; Gilhousen et al. 1990); the Even Start Program (Ricciuti et al. 2004); and the Comprehensive Child Development Program (St. Pierre et al. 1997, 1999) because of lack of information on child- care alternatives. (We describe the program in appendix C.3.)

246

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

2. They are means tested. The programs we consider are all means tested, although they use different eligibility criteria. The evidence on universal programs discussed in section 4.5 shows that early childhood education is particularly effective for disadvantaged children. 3. The programs considered collect measurements on multiple skills and outcomes over long periods of the life cycle. It is a common but mistaken practice to evaluate programs based on outcomes only measured at early ages. Uninformed analysts sometimes assume that programs are ineffective due to the fadeout in IQ in the short- term evaluations that ignore multiple capacities. We evaluate programs using a diverse set of long- term outcomes that matter for success in life, such as health, education, earnings, and participation in crime. 4. We discuss, where necessary, the consequences of compromised randomization, attrition of participants from programs or from study samples, the availability of good substitutes in the control group, and other challenges in conducting evaluations. Compromises of the initial randomization protocols occur when subjects assigned to treatment or control status in an experimental protocol switch their initially assigned status or leave the program or the follow-up surveys. Despite challenges in analyzing the data, we show that valid, policy- relevant information can be derived from these studies. 4.3.2

Overview of Programs Discussed in This Section

Table 4.2 presents an overview of the programs we study. We discuss their most prominent characteristics in the next few paragraphs and present a more detailed discussion in appendix A. The oldest programs we study are ETP and PPP. They began in 1962 and continued until 1964 and 1967, respectively. ABC is also relatively old, beginning in 1972 and continuing until 1982. The most recent program is IHDP, implemented from 1985 to 1988. PPP and ABC have high- quality data with long- term follow-ups. IHDP and ETP only have follow-ups into young adulthood. ETP, PPP, and ABC shared a common goal of preventing “mental retardation” and promoting school readiness (Weikart 1967; Gray, Ramsey, and Klaus 1982; Ramey et al. 1982; Zigler and Muenchow 1994).32 The researchers who implemented ETP, PPP, and ABC also created the curricula for these programs. The staff adapted and improved them while they were being conducted (Heckman, Kuperman, and Cheng 2015). All three curricula have elements in common: promotion of play- based and child- directed learning, emphasis on language development, and emphasis on developing noncognitive and problem- solving skills. The curricula in IHDP was adapted from the curricula of both ABC and a spin- off program, 32. Note that the clinical understanding of mental retardation was once associated with disadvantages that hindered early- life development (Noll and Trent 2004).

a

Controlc Home visits (per month) Center care (weeks per year) Center care (hours per week) Nutrition Diapers (no other health-care goods) Well-child health care Ill-child health care Counseling Parenting instruction — — — — — — — — —

4 30 12–15 ✓ — — — — — ✓

5 123 (58 : 65) 3–4 1–2 years

1962–1967 Ypsilanti, Michigan

PPP

Summary table of demonstration programs

Treatment Home visitsb (per month) Center care (weeks per year) Center care (hours per week) Parent involvement Nutrition Diapers/child-care goods Well-child health care Ill-child health care Counseling Parenting instruction

No. cohorts N (treatment : control) Age of entry Duration

Program overview Years implemented Site

Table 4.2

— — — ✓(Formula up to 15 mo.) ✓(Up to 15 mo.) ✓(Cohort 1, up to age 1) — — —

0 50 45 — ✓ ✓ ✓ ✓ ✓ —

1972–1982 Chapel Hill, North Carolina (UNC) 4 111 (57 : 54) 0 5 years

ABC

— — — — — ✓ — — —

4 (up to age 1); 1–2 (after age 1) 50 20+ ✓ ✓ ✓ ✓ ✓ ✓ ✓

1985–1988 Eight sites selected after competitive review 1 985 (377 : 608) 0 3 years

IHDP

— — — — — — — — —

4 10 20 — — — — — — ✓

(continued)

1962–1964 Segregated black schools in Abbotsfield, Tennessee 2 88 (43 : 45) 4–5 2–3 years

ETP

1:5–1:6

BAg Special ed. teachersg

Staff and certifications Teachers Specialists

Cultural deprivation scale < 11 Low IQ (< 85) African American No physical handicap

Program eligibilitye

Curriculum Adult-child ratio

Stay at home or with friends or relatives (Few substitutes)

Counterfactual

1. Rank by initial IQ of child 2. Group evens and odds 3. Balance gender, SES, etc. 4. Randomize whole group

PPP

Enrolled siblings receive same assignment Working moms switched to control

(continued)

Compromises

Randomization protocol Steps

Table 4.2

Alternative programs available

College grads College grads f Clinical staff

MAf

1:3–1:4

HS grads, mixed f Physician, nurse

1:3 (age 0–1); 1:4–5 (age 1–4); 1:5–6 (age 4–5)

Live within 45 min. from center Birth weight < 2,500 g Gestational age < 37 weeks No severe illnesses or neurological defects

Alternative programs available HRI ≥ 11 Biologically healthy No signs of mental retardation

Stay at home or child care

17 families refused to participate in the study after assignment

1. Stratify on birth weight and site 2. Randomize

IHDP

Stay at home or child care

2 extremely needy switched to treatment 4 refused random assignment 4 abandoned treatment 2 considered ineligible after randomization

1. Match on HRId 2. Adjust by gender, maternal IQ, siblings 3. Randomize pairs

ABC

Teaching assistants, college and PhD students Home visitorsf g

1:4–1:6

Home environment: Education of parents Parent occupation semi- or unskilled African American Parent edu. ≤ high school

Stay at home or with friends or relatives (Few substitutes)

n/a

Simple randomization into 2 treatment and 1 control groups

ETP

✓ — ✓ ✓ — — ✓

✓ ✓ ✓ ✓ ✓ ✓ ✓

✓ ✓ ✓ ✓ — — ✓

✓ — ✓ ✓ ✓ — ✓

Source: All details and sources are extensively discussed in appendix A. a In IHDP, an additional 105 twins were also followed in the study, but are not analyzed in the literature. These twins were assigned to the same treatment group as their siblings. For each site, the program lasted until the youngest child turned thirty-six months old, correcting for prematurity. b In PPP, home visits were intended to involve the mother in educating the child, increase her understanding of the educational process, and to extend the curriculum beyond the classes and into the homes. Monthly group meetings for parents were also available, but is not well documented. During IHDP home visits, families in treatment groups were given toys with instructions on how to play with their child with the toys. This was to extend the curriculum beyond the classroom. Home visits also sought to improve the parents’ ability to problem solve, cope with personal issues, and function as parents. In addition, parent groups were offered as a chance for parents to share information and concerns with each other, and to provide them with the opportunity to learn about child education and community resources. Surveys were conducted by college graduates. ETP had two treatment groups. In one group, parents received two nine-month training sessions; in the other, parents received one nine-month training session. During these training sessions, the objective of the intervention was made clear to mothers during visits to schools. Mothers were encouraged to engage in their children’s learning, as well as to expand the experiential environment of the child (e.g., trips to the library). c Treatment group individuals received all these items as well. The control group of the first cohort of ABC received health check-ups for the first year, after which this practice was discontinued. d In ABC, the High Risk Index (HRI) was comprised of: “absence of maternal relatives in the area”; “siblings of school age one or more grades behind age-appropriate level or with equivalently low scores on school-administered achievement test”; “payments received from welfare agencies within past three years”; “record of father’s work indicates unstable or unskilled and semi-skilled labor”; “record of mother’s or father’s IQ indicate scores of 90 or below”; “record of sibling’s IQ indicates scores of 90 or below”; “relevant social agencies in the community indicate the family is in need of assistance”; “one or more members of the family has sought counseling or professional help in the past three years”; maternal and paternal educational levels; family income; and father’s presence. e In PPP, criteria for home environment included education of parents, occupational level of father, maternal employment, and household density. f Signifies that staff were specially trained for the program. g Signifies that staff were state certified.

Language development Motor development Cognitive development Noncognitive development Task orientation High-risk behavior School readiness

250

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

the Carolina Approach to Responsive Education (CARE) (Gross, Spiker, and Haynes 1997).33 Of these studies, PPP and ABC presently have the longest follow-ups, with data up to ages forty and thirty- four, respectively. A follow-up through age fifty of Perry is being collected at the time of this writing. Both PPP and ETP served preschool- age children and had home visits with their parents. ABC served children from birth through preschool age. IHDP served children and had home visits from birth to age three. ABC had two treatment phases, birth to age five and five to eight, and correspondingly two rounds of randomization. ABC was the most intensive program (eight hours per day starting from one to three months and continuing to age eight). There were no home visits in the first phase, but parents were encouraged to visit the center. There were home visits in the second phase. We focus on the first phase (birth to age five) because there is little evidence of treatment effects from the second phase.34 While ETP, PPP, and ABC served relatively narrowly targeted populations, IHDP was more inclusive and served a population that was far more heterogeneous in terms of race and socioeconomic status, although all children served had low birth weight.35 All four programs had relatively educated staffs with some experience in education and high teacher- to-child ratios. They varied in the amount of time children spent in the center—PPP had two years of center- based treatment for three hours a day and weekly home visits; ETP had intensive summer school and weekly home visits for up to three years, but no yearround center care; and ABC included center- based care during all of early childhood from birth to school entry for up to eight hours per day. Like ABC, IHDP also began at birth. During the first year, the program provided weekly home visits. These visits became bimonthly in the second and third years of treatment. IHDP provided center- based treatment for up to nine hours a day for fifty weeks a year in the second and third years of the program. Both ABC and IHDP included medical components—most prominently, regular physical check-ups for the treated children. PPP, ABC, and ETP are not strictly means- tested programs. They use varying measures of disadvantage roughly correlated with income, such as the quality of home environments as characterized by single parenthood, parental education, and housing density. Additionally, PPP and ETP were explicitly designed to serve African American children. IHDP differs from the other programs in its eligibility criteria. All partici33. Appendix C provides further details about CARE. 34. See Conti, Heckman, and Pinto (forthcoming) and Campbell et al. (2014). 35. García (2015) compares the IHDP sample with the cohort born in the same year (1985) in the United States. The author finds that IHDP individuals are, on average, relatively disadvantaged. The author suggests that this is a consequence of the correlation between measures of disadvantage: maternal labor supply, household income, a father’s presence at home, premature birth status, and low birth weight.

251

Early Childhood Education Table 4.3

Control group background characteristics at baseline, all programs (mean outcomes) PPP Mean

Black (%) 100 IQ, ages 2–4 79.02 Mother’s age 29.10 Mother’s years of education 9.42 Mother works (%) 20 Father at home (%) 53 Father’s age 32.81 Father’s years of education 8.60 Father works (%) 86 Household income (2014 USD) n/a Siblings 4.28 Treatment (%) 47

ABC

IHDP

ETP

SD

Mean

SD

Mean

SD

Mean

SD

0 6.44 6.57

97 90.42 19.89

16 11.46 4.82

53 88.00 24.87

50 20.16 6.00

100 87.29 30.11

0 11.88 8.84

2.20 40 50 6.88

10.23 73 29 23.21

1.84 45 46 5.91

12.40 34 56 27.64

2.42 47 50 6.67

8.96 40 87 32.82

2.62 49 34 10.10

2.40 35

10.95 87

1.76 34

13.16 51

2.89 50

9.59 97

2.75 17

n/a 2.59 50

7,653 0.64 52

10,049 1.10 50

41,868 1.02 39

32,623 1.17 49

n/a 3.59 48

n/a 2.21 50

Source: Own calculations. Note: This table displays baseline characteristics of the control group of the demonstration programs we study. Mother’s and father’s years of education are counted as the number of years of schooling completed by the mother and father, respectively, at the time of program entry. The number of siblings is reported at program entry. For PPP: child’s IQ at age three is measured using the Stanford-Binet Intelligence Scale; ABC: child’s IQ at age two is measured using the Stanford-Binet Intelligence Scale. Mother’s age is reported at the time of program entry. For IHDP: child’s IQ at age three is measured using the Stanford-Binet Intelligence Scale; ETP: child’s IQ at age four is measured using the Stanford-Binet Intelligence Scale. Test scores are constructed to have a national mean of 100 and a standard deviation of 15. We only report characteristics of the control group because for programs that started at birth (ABC and IHDP), we do not observe treatment baseline characteristics. Household income was not an eligibility criteria in any of the programs in this table. n/a indicates this data was not available.

pants were premature births (≤ 37 weeks), low birth weight (≤ 2,500 grams), and resided, at most, forty- five minutes away from the location of the program. While the other demonstration programs served fairly narrowly defined disadvantaged populations (although the criteria used differ), IHDP served a population that was more heterogeneous in socioeconomic status and race and only homogeneous in child birth weight. However, because perinatal health is related to the socioeconomic characteristics of the parents, IHDP subjects were disadvantaged compared to the general United States population (García 2015). Table 4.3 describes the baseline characteristics of the populations served by the four demonstration programs we study.36 36. We describe only the control groups.

252

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

4.3.3

Possible Limitations in the Evidence from Demonstration Programs

Age of Programs The programs we study are valuable for analyzing the effectiveness of early childhood education because long- term follow-ups of their participants are available. Though it is natural to question the relevance of older programs to current policy, we argue that the lessons from them are still highly relevant. The basic principles of enhancing the investments in, and the environments of, disadvantaged children that were laid down fifty years ago remain intact. Objections to relying on evidence from early high- quality programs are made by analysts who think that the outcome of an evaluation study should be an up or down assessment of that program, rather than a contribution to understanding the general principles from multiple programs that can guide the construction of future programs. The effectiveness of any particular program is presumably a lower bound on the effectiveness of new programs that build on and improve that program. Evidence for the success of a program should not be a call for slavish application of that program. We make four additional points on the relevance of the evidence from older programs. First, all of the demonstration programs we analyze have school readiness as a main goal. This goal is shared with most contemporary early education programs. Second, the success of some of these demonstration programs influenced the creation and design of the most important current early childhood education programs. ETP and PPP influenced the creation of Head Start (Zigler and Muenchow 1994), and ABC motivated policymakers to consider programs that targeted even younger children and inspired the creation of Early Head Start (Schneider and McDonald 2006). Third, and most important, as documented in section 4.4.1, although demonstration programs were very high quality for their time, they bear strong resemblance to current high- quality early childhood education programs in terms of their structure, staffing, and curricula. For example, a version of HighScope is the second most commonly used curriculum in Head Start, utilized by roughly 30 percent of Head Start centers.37 Contemporary programs share other features with the programs we study, such as teacher- to-child ratios (Heckman, Humphries, and Kautz 2014). Finally, some of the programs studied have long- term follow-ups. Understanding the impacts of early childhood education on skill formation requires analysis of effects on adult outcomes. This research requirement necessitates analysis of older programs. Positive long- term outcomes are a strong indication of a welldesigned program. 37. Our own calculations using HSIS data.

Early Childhood Education

253

Small Sample Sizes Samples are often small. Several recent studies use exact small- sample inference to estimate multiple treatment effects with precision, even when dividing samples by gender and accounting for the biases arising in testing multiple hypotheses (“cherry picking”).38 Application of small- sample inference methods produces results that are often not substantively different from the results using bootstrap or standard asymptotic inference procedures (Heckman et al. 2010a; Campbell et al. 2014). The methodologies employed to analyze IHDP, PPP, and ABC are conservative. Control Contamination The extent to which the control group received center- based care varies across ETP, PPP, ABC, and IHDP. There was no control contamination in ETP or PPP because of a lack of center- based substitutes, whereas there was control contamination in ABC and IHDP, which were launched after Head Start was founded. In ABC, the control group had access to non- centerbased and center- based child care, especially during ages birth to five years (Elango et al. 2015). This included high- quality care provided in churches and care at one Head Start center. In IHDP, 39 percent of the children attended substitute programs, though their quality is unknown (García, Hojman, and Shea 2014). None of the studies we discuss address the issue of control contamination, even though most of the control groups had access to high- quality alternatives. This practice makes conservative reported estimates of the effects of the programs (compared to the home alternative). Attrition and Nonresponse PPP and ABC data are used for assessing long- term benefits because they have high- quality follow-ups. Follow-ups are available through age forty in PPP and through age thirty- four in ABC. Attrition and nonresponse complicate the interpretation of the evidence. Reliable analyses adjust for these features of the data. 4.3.4

Effects on IQ, Achievement Test Scores, and Conscientiousness

Table 4.4 presents estimated treatment effects on early IQ, early and late achievement test scores, and early conscientiousness pooled over genders. Tables 4.5 and 4.6 display the same information by gender. We adjust all test statistics for the effects of multiple hypothesis testing using procedures

38. See Romano, Shaikh, and Wolf (forthcoming). If a 10 percent significance level is used in a sample with 100 outcomes, and thus 100 null hypotheses of no treatment effects, roughly ten would be “statistically significant” even if all null hypotheses are true, that is, treatment had no effect on any outcome. Heckman et al. (2010a), Gertler et al. (2014), Campbell et al. (2014), and García et al. (2015) use methods to correct for this multiplicity of hypotheses.

254 Table 4.4

Perry

ABC

IHDP

ETP

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman Treatment effects on early-life skills for samples pooled across gender

IQ, age 5 IQ, age 8 Achievement test score, ages 5–10 Conscientiousness, ages 4–7 Achievement test score, age 27 IQ, age 5 IQ, age 8 Achievement test score, ages 5–10 Conscientiousness, ages 4–7 Achievement test score, age 21 IQ, age 3 IQ, age 8 Achievement test score, ages 5–10 Conscientiousness, ages 4–7 Achievement test score, age 18 IQ, age 7 IQ, age 8 Achievement test score, ages 5–10

Treatment effect

Permutation, one sided

Permutation, two sided

Stepdown, one sided

Stepdown, two sided

11.422 1.254 0.394 0.273 1.795 6.398 4.500 0.544 0.047 0.422 8.475 –0.671 –0.012 0.075 0.108 6.343 5.743 0.534

0.000 0.080 0.000 0.040 0.020 0.030 0.080 0.010 0.400 0.010 0.000 0.680 0.570 0.060 0.470 0.020 0.100 0.380

0.000 0.430 0.000 0.060 0.070 0.030 0.080 0.010 0.680 0.010 0.000 0.420 0.840 0.140 0.950 0.080 0.240 0.820

0.000 0.080 0.010 0.050 0.080 0.030 0.180 0.020 0.860 0.120 0.000 0.910 0.830 0.180 0.730 0.050 0.150 0.510

0.000 0.430 0.010 0.070 0.060 0.030 0.180 0.020 0.890 0.120 0.000 0.430 0.870 0.190 0.930 0.050 0.200 0.800

Source: Own calculations. Note: Initial sample sizes are: PPP: 123; ABC: 122; IHDP: 985; ETP: 91. Nonparametric permutation p-values account for compromised randomization, small sample size, and item nonresponse. See Heckman et al. (2010a) and Campbell et al. (2014, appendix) for details. Step-down p-value accounts for the same and for multiple hypotheses testing. All school-age and adult achievement and conscientiousness measures have mean 0 and standard deviation 1. All IQ measures have mean 100 and standard deviation 15 and they are standardized using the national population mean and standard deviation. For PPP, IHDP, and ETP at ages five, three, and seven we use the Stanford-Binet IQ test. For ABC at age five we use the Wechsler Preschool and Primary Scale of Intelligence. For PPP and ETP at age eight we use the Stanford-Binet IQ test. At this same age, we use Wechsler Intelligence Scale for Children for ABC and IHDP. School-age achievement is a factor measured through a factor of items at ages five, six, and seven. The items analyzed come from the California Achievement Test (ABC, PPP); Metropolitan Achievement Test (ETP); Peabody Individual Achievement Test (ABC); and Woodcock-Johnson Test of Achievement (ABC, IHDP). School-age conscientiousness is a factor constructed through a battery of items from various questionnaires: Achenbach Child Behavior Checklist (ABC); Classroom Behavior Inventory (ABC); Walker Problem Behavior Identification Checklist (ABC); Teacher rating (PPP, IHDP); and Reputation test (PPP, IHDP). Adult achievement is measured by Adult Performance Level (PPP); Woodcock-Johnson Test (ABC); and Wechsler Adult Intelligence Scale (IHDP). Adult achievement and conscientiousness measures are not available in ETP.

applied in Heckman et al. (2010a). We base our interpretation on nonparametric, permutation- based, one- sided p- values to test if the programs had positive effects on the outcomes described. However, we also report results using two- sided tests. Effects are shown for two measures of cognition: IQ and achievement test scores. All effects are presented in units of standard deviations. In the case of IQ, we follow the convention and use standardized scores that normalize the population mean and standard deviations of 100 and 15, respectively. Also shown are effects on conscientiousness, a noncognitive skill that is of interest due to its low correlation with cognition and high correlation with important later- life outcomes (Borghans, Meijers, and ter Weel 2008; Heckman, Humphries, and Kautz 2014). All programs have positive effects on early measures of IQ. For both

Table 4.5

Perry

ABC

IHDP

ETP

Treatment effects on early-life skills for females Treatment effect

Permutation, one sided

Permutation, two sided

Stepdown, one sided

Stepdown, two sided

12.666 4.240 0.564 0.515 0.407 3.051 4.573 0.822 0.110 0.737 9.877 –0.158 –0.034 0.089 0.517 8.611 9.056 0.448

0.000 0.410 0.180 0.380 0.110 0.050 0.110 0.260 0.600 0.240 0.000 0.780 0.500 0.240 0.650 0.120 0.290 0.810

0.000 0.900 0.400 0.850 0.390 0.050 0.150 0.280 0.960 0.600 0.000 0.490 0.920 0.440 0.790 0.140 0.540 0.350

0.000 0.700 0.300 0.610 0.330 0.060 0.360 0.410 0.910 0.790 0.000 0.940 0.790 0.500 0.840 0.180 0.440 0.980

0.000 0.940 0.390 0.860 0.430 0.060 0.360 0.410 0.960 0.840 0.000 0.600 0.970 0.530 0.910 0.180 0.550 0.270

IQ, age 5 IQ, age 8 Achievement test score, ages 5–10 Conscientiousness, ages 4–7 Achievement test score, age 27 IQ, age 5 IQ, age 8 Achievement test score, ages 5–10 Conscientiousness, ages 4–7 Achievement test score, age 21 IQ, age 3 IQ, age 8 Achievement test score, ages 5–10 Conscientiousness, ages 4–7 Achievement test score, age 18 IQ, age 7 IQ, age 8 Achievement test score, ages 5–10

Source: Own calculations. See notes in table 4.4.

Table 4.6

Perry

ABC

IHDP

ETP

Treatment effects on early-life skills for males Treatment effect

Permutation, one sided

Permutation, two sided

Stepdown, one sided

Stepdown, two sided

10.607 –0.721 0.269 0.087 0.214 9.962 4.174 0.277 0.009 0.095 6.988 –1.206 0.012 0.065 –0.456 4.111 2.333 –0.795

0.000 0.060 0.000 0.030 0.110 0.530 0.410 0.010 0.590 0.070 0.000 0.450 0.720 0.090 0.500 0.100 0.140 0.180

0.000 0.250 0.020 0.040 0.230 0.540 0.410 0.010 0.690 0.070 0.000 0.930 0.650 0.170 0.820 0.200 0.210 0.280

0.010 0.150 0.050 0.040 0.160 0.890 0.760 0.030 0.980 0.120 0.000 0.810 0.900 0.250 0.710 0.160 0.260 0.260

0.010 0.190 0.050 0.040 0.200 0.890 0.760 0.030 0.980 0.120 0.000 0.950 0.740 0.270 0.840 0.170 0.280 0.280

IQ, age 5 IQ, age 8 Achievement test score, ages 5–10 Conscientiousness, ages 4–7 Achievement test score, age 27 IQ, age 5 IQ, age 8 Achievement test score, ages 5–10 Conscientiousness, ages 4–7 Achievement test score, age 21 IQ, age 3 IQ, age 8 Achievement test score, ages 5–10 Conscientiousness, ages 4–7 Achievement test score, age 18 IQ, age 7 IQ, age 8 Achievement test score, ages 5–10

Source: Own calculations. See notes in table 4.4.

256

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

females and males in PPP, this effect is approximately three- fourths of a population standard deviation. The effects are also sizable for ABC and IHDP. For ETP, the effects are weaker—less than one- half of a standard deviation. Nevertheless, these effects are substantial compared to the shortterm effects reported for Head Start and for the universal programs discussed in sections 4.4 and 4.5, respectively. In contrast to the IQ measures, the achievement measures used weight both cognitive and noncognitive skill components more equally.39 Achievement outcomes for ABC and PPP are strong. There is evidence of program effects on noncognitive skills, but the different programs do not report strictly comparable measures. Furthermore, defining and measuring noncognitive skills accurately is an open challenge that presents difficulties in detecting effects even when they are present. Fadeout of Effects for Cognitive Skills A general pattern for IQ and achievement test scores is that they tend to surge while children are in pre-K and then fade. In some cases, they completely dissipate. In two documented cases, IQ effects persist long after school entry: for the whole ABC sample (see appendix D) and for some subgroups of IHDP (Duncan and Sojourner 2013). Even in those cases, the impacts during the program were stronger than the long- term impacts. All other studies in this chapter that report the dynamics of impacts on test scores find that IQ or achievement gains dissipate. This is true for other demonstration programs (Weikart 1970; Gray, Ramsey, and Klaus 1982), Head Start (see Deming 2009; Zhai, Brooks-Gunn, and Waldfogel 2014), and state programs (see Lipsey et al. 2013). In figure 4.2, the first graph illustrates the fadeout phenomenon using evidence from PPP. IQ tests are usually scaled to show the level of a child relative to that of the overall population of their age. The decrease in standardized IQ for children in the treatment group after entering elementary school indicates that the gap between them and an average US child increases. The figure does not reveal whether skills gained by the treatment group depreciate or those gained by the control group catch up. The second graph of figure 4.2 presents the raw scores in terms of total questions answered. They increase uniformly during childhood (Hojman 2015). Additional figures illustrating the evolution of IQ and achievement scores over the life cycle are presented for all programs in appendix D. Hojman (2015) analyzes the causes of fadeout in cognition measured by IQ for PPP and ETP. He finds that the gains experienced by the treatment group occur rapidly during the first months of treatment and are followed by small or zero gains in the subsequent years of treatment. He also finds that almost all of the fadeout happens during the first year of elementary 39. See Heckman and Kautz (2012).

A

B

Fig. 4.2

Dynamics of IQ in PPP

Source: Reproduced from Hojman (2015). Note: The solid line represents the trajectory of the treated group, and the dotted line represents the trajectory of the control group. Thin lines surrounding trajectories are asymptotic standard errors. It shows standardized IQ as measured by the Stanford-Binet test in each year. IQ is age standardized based on a national sample to have a US national mean of 100 points and standard deviation of 15 points. In the second graph, the scores are not standardized. The scores in it represent the raw scores, or the sum of the number of correct questions in each year.

258

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

school. The gap between treatment and control groups narrows because the control group gains more from schooling. Measured IQ improves as a direct consequence of the initial formal educational experiences and the increase is roughly independent of the age at which entry into preschool or formal education begins. The laggard growth of IQ for all disadvantaged children may be consequences of the low quality of the schools they attend, the lack of stimulation in their home environments, or some combination of those factors. The precise causes are not known. Differences by Gender A consistent finding across all four programs is the difference in treatment effects for males and females. This difference is substantial enough to create important gender differences in both benefit- cost ratios and internal rates of return for PPP and ABC. This pattern is consistent with the literature on differences in development between girls and boys.40 Girls develop earlier. Uniform curricula across genders appears to benefit the laggard boys on many dimensions, but girls benefit as well, as we document in our discussion of the long- term treatment effects of ABC and PPP. In addition, all programs (except IHDP) target ages three to four, when aggressive behavior that predicts adult aggression and participation in crime begins to manifest itself (White et al. 1994). Gender- specific curricula in preschool may be an appropriate strategy. Treatment Effect Heterogeneity by Socioeconomic Status The IHDP served a more heterogeneous population compared to the other demonstration programs. A consistent policy- relevant finding for this program is the heterogeneity in treatment effects across socioeconomic groups. The literature finds much higher treatment effects for the low- low birth- weight children (≤ 2,000 grams) when compared to the effects for the high- low birth- weight children (> 2,000 grams, ≤ 2,500 grams).41 For example, the effects on IQ at age eighteen are negative but not statistically significant for the latter and are significantly positive for the former. Treatment effects are also heterogeneous by socioeconomic status. Brooks-Gunn et al. (1992) discuss the effects of the programs on IQ at age three and find that children whose mothers had a college degree or higher experienced no treatment effects on IQ, while children with relatively uneducated mothers had sizable effects. A recent study shows that program effects on IQ exhibit a gradient corresponding to household income, suggesting that poorer children experience the greatest benefits. Duncan and Sojourner (2013) find that at age two, the treatment effect for cognition accounts for 40. Lavigueur, Tremblay, and Saucier (1995), Kerr et al. (1997), Mâsse and Tremblay (1997), Nagin and Tremblay (2001), and Bertrand and Pan (2011). 41. Brooks-Gunn et al. (1994) and McCormick et al. (2006).

Early Childhood Education

259

.82 standard deviations for children of families with relatively low income with a standard error of .30, while the estimated effect is .46 for children of families with relatively high income with a standard error of .23. 4.3.5

Long-Term Outcomes

PPP and ABC programs are the only demonstration programs with follow-up during adulthood. A summary of their most important effects is given in table 4.7, which is based on results from Heckman et al. (2010a) Heckman, Pinto, and Savelyev (2013), Campbell et al. (2014), and Elango et al. (2015). The results reported in the table are statistically significant after accounting for multiple hypotheses testing across relevant, related outcomes. PPP caused a 56 percent increase in high school graduation for females, and a 29 percent increase in employment at age forty for males. Other beneficial effects include criminal activity, employment, health behavior, and welfare take-up. In general, the table shows that PPP and ABC had statistically significant positive outcomes that persist into adulthood. Noncognitive outcomes are notably absent due to lack of data. In PPP and ABC, and for early education programs in general, noncognitive skills are not typically followed in the long term. 4.3.6

Connecting Short-Term and Long-Term Effects

Dissipation of initial IQ gains is a common finding across programs. In some cases, IQ gains completely dissipate by the teenage years. Analysts focusing solely on IQ as a measure of program effectiveness confront a puzzle: Why do early childhood education programs have long- term effects if the effects on IQ dissipate? Heckman, Pinto, and Savelyev (2013) present a solution to this puzzle by considering the process through which skills form and develop. They find that program effects on noncognitive skills are important determinants of later- life outcomes.42 This conclusion highlights the importance of skill formation as a multiskill dynamic process in which different skills complement each other. Heckman, Pinto, and Savelyev (2013) decompose the effects of PPP on later- life outcomes using a mediation analysis. The results of this are reported in figures 4.3 and 4.4.43 They find that boosts in noncognitive skills are substantial determinants of long- term effects. For females, academic motivation mediates 30 percent and 40 percent of the effects on achievement and employment, respectively. Further, reductions in externalizing behavior explain 65 percent of the reduction in lifetime violent crimes and reduce lifetime arrests and unemployment by 40 percent and 20 percent, respectively. 42. We use the term mediation analysis to refer to the exercise of decomposing effects of policies or programs on an outcome into distinct components. The outcome is usually thought of as an output and the components are the inputs generating this output. For a formal definition and analysis, see Heckman and Pinto (2015). 43. See Heckman, Pinto, and Savelyev (2013).

Table 4.7

Life-cycle outcomes, PPP and ABC PPP

ABC

Age

Female

Male

High school graduation

— — 19a

— — 0.56 (0.000)

— — 0.02 (0.416)

Economic Employed

40a

–0.01 (0.615) $6,166 (0.224) 0.129 (0.055) –0.27 (0.049)

.29 (0.011) $8,213 (0.150) 0.206 (0.103) 0.03 (0.590)

–2.77 (0.041) –2.45 (0.051)

–4.88 (0.036) –4.85 (0.025)

≤34c

— — 0.111 (0.110) 0.067 (0.206) 0.330 (0.002)

— — 0.119 (0.089) 0.194 (0.010) 0.090 (0.545)

— — — —

— — — —

Cognition and education Adult IQ

Yearly labor income, 2014 USD

40a

HI by employer

40a

Ever on welfare

18–27a

Crime No. of arrestsd No. of non-juv. arrests One-sided permutation Lifestyle Self-reported drug user

≤40a ≤40a

Not a daily smoker

— — 27a

Not a daily smoker

40a

Physical activity

40a

Health Obesity (BMI > 30) Hypertension I

— — — —

Age

Female

Male

21c

10.275 (0.005) 0.238 (0.090)

2.588 (0.130) 0.176 (0.100)

0.147 (0.135) $3,578 (0.000) 0.043 (0.512) 0.006 (0.517)

0.302 (0.005) $17,214 (0.110) 0.296 (0.035) –0.062 (0.000)

–5.061 (0.051) –4.531 (0.061)

–6.834 (0.187) –6.031 (0.181)

30c — — — — — 21b

0.031 (0.590) — — — — 0.249 (0.004)

–0.438 (0.030) — — — — 0.084 (0.866)

30–34c

0.221 (0.920) 0.096 (0.380)

–0.292 (0.060) 0.339 (0.010)

21c

30c 30c 31b 30c

≤34c

30–34c

Sources: Heckman et al. (2010a), Campbell et al. (2014), and Elango et al. (2015). Note: This table displays statistics for the treatment effects of PPP and ABC on important life-cycle outcome variables. Hypertension I is the first stage of high blood pressure (systolic blood pressure between 140 and 159 and diastolic pressure between 90 and 99). “HI by employer” refers to health insurance provided by the employer and is conditional on being employed. For the further definitions of the outcomes, see the respective web appendices of the cited papers. Outcomes from Heckman et al. (2010a) are reported with one-sided p-value, which is based on Freedman-Lane procedure, using the linear covariates of maternal employment, paternal presence, and SB (Stanford-Binet) IQ, and restricting permutation orbits within strata formed by a socioeconomic status index being above or below the sample median and permuting siblings as a block. P-values for the outcomes from Campbell et al. (2014) are one-sided single hypothesis constrained permutation p-values based on the IPW (inverse probability weighting) t-statistic associated with the difference in means between treatment groups; probabilities of IPW are estimated using the variables gender, presence of father in home at entry, cultural deprivation scale, child IQ at entry (SB), number of siblings, and maternal employment status. P-values for the outcomes from Elango et al. (2015) are bootstrapped with 1,000 resamples, corrected for attrition with inverse probability weights, with treatment effects conditioned on treatment status, cohort, number of siblings, mother’s IQ, and the ABC high-risk index. a Heckman et al. (2010a). b Campbell et al. (2014). c Elango et al. (2015). d “No. of arrests” includes offenses in the case of ABC, even where more than one offense was charged per arrest.

Early Childhood Education

Fig. 4.3

261

Decompositions of treatment effects of PPP on male adult outcomes

Source: Reproduced from Heckman et al. (2013). Note: The total treatment effects are shown in parentheses. Each bar represents the total treatment effect normalized to 100 percent. One-sided p-values are shown above each component of the decomposition. See the web appendix of Heckman et al. (2013) for detailed information about the simplifications made to produce the figure. The “CAT total” denotes California Achievement Test total score normalized to control mean 0 and variance of 1. Monthly income is adjusted to thousands of 2006 dollars using annual national CPI. *** Significant at the 1 percent level. **Significant at the 5 percent level. *Significant at the 10 percent level.

There are persistent effects of boosts in noncognitive skills even though in the short run, cognitive effects fade out. Conti, Heckman, and Pinto (forthcoming) conduct a similar analysis for both PPP and ABC, but focus on health outcomes. According to their findings, externalizing behavior is the primary mediator for the outcomes found in PPP, which is consistent with the findings in Heckman, Pinto, and Savelyev (2013). For ABC, they find that task orientation and childhood BMI mediate approximately half of the improvements in blood pressure and hypertension found for males in the treatment group. Figure 4.5, panels (a) and (b), illustrate the results from their mediation exercises. García (2014) decomposes the ABC treatment effects pooling males and females. He analyzes three outcomes at age thirty: high school graduation, ever being enrolled in a four- year college, and employment (see figure 4.6). He shows that the more relevant the outcome is for economic success, the

262

Fig. 4.4

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

Decompositions of treatment effects of PPP on female adult outcomes

Source: Reproduced from Heckman et al. (2013). See note in figure 4.3.

less it is mediated through cognition and the more it is mediated through noncognitive skills. 4.3.7

Cost-Benefit and Rate of Return Analyses

Cost- benefit and rate of return analyses produce concise, policy- relevant statistics for assessing the social benefits of programs. While there is a vast literature evaluating treatment effects for demonstration programs, costbenefit analyses are scarce (Currie 2001). This scarcity arises from the difficulty in securing the relevant data. Cost- benefit analyses require comprehensive data in order to account for impacts over the life cycle. Very few programs have been evaluated rigorously using cost- benefit analysis. In fact, only PPP and ABC have the data required to conduct such exercises, accounting for the variety of outcomes including criminal activity, income, and health. Heckman et al. (2010b) substantially improve on an earlier cost- benefit analysis of PPP by Belfield et al. (2006) that does not report standard errors, does not disaggregate by gender, and uses an ad hoc method for forecasting out of sample earnings gains. Heckman et al. (2010b) use a broader base of data and substantially refine the estimates in Belfield et al. (2006). Both papers incorporate costs of education and estimates of benefits. Heckman et al. (2010b) additionally account for the deadweight loss created by collecting public funds. They calculate standard errors for their estimates. They invoke standard assumptions about the deadweight losses associated with

A

B

Fig. 4.5 Decompositions of treatment effects of PPP and ABC on male adult outcomes Source: Reproduced from Conti, Heckman, and Pinto (forthcoming). Note: This graph provides a simplified representation of the results of the dynamic mediation analysis of the statistically significant outcomes for PPP and ABC. Each bar represents the total treatment effect normalized to 100 percent. One-sided p-values that test if the share is statistically significantly different from 0 are shown above each component of the decomposition. The mediators displayed are: externalizing behavior, as in Heckman et al. (2013) among the early childhood inputs; and income as in Heckman et al. (2010a) among the adult inputs. The complete mediation results and the definition of each outcome is reported in the web appendix of Conti, Heckman, and Pinto (forthcoming). The sample the outcomes refer to (M = males; F = females) and the age at which they have been measured (y. o. = years old) are shown in parentheses to the left of each bar, after the description of the variable of interest. ***Significant at the 1 percent level. **Significant at the 5 percent level. *Significant at the 10 percent level.

264

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

Fig. 4.6 Decompositions of treatment effects of ABC on male and female (pooled) adult outcomes Source: Own calculation. Note: This plot decomposes the total treatment effect ABC has on graduating high school, ever enrolling in a four-year college, and employment at age thirty. The figure presents the components of Laspeyres decomposition of the relevant outcome on a measure of cognition and a factor summarizing character skills. Cognition is measured at age twenty-one using the Woodcock-Johnson Test of Achievement. Character is measured at age fifteen by a factor created using measures of conscientiousness. The numbers inside the bars represent the proportion explained by each component. They do not sum to 1 because the decompositions condition on sociodemographic variables, which are not displayed above. See García (2014) for more details.

collecting tax revenue to support programs, the social costs of crime, and the procedures used to extrapolate future benefits. The range of estimates for the annual rate of return pooled across genders is 7– 10 percent per annum. The corresponding range for the benefit- cost ratio is 3.9– 6.8. Disaggregating by gender produces higher estimates. All of these estimates are statistically significant. Their preferred estimates are presented in the columns under “PPP” in table 4.8. Elango et al. (2015) present the benefit- cost analysis of ABC through age thirty- five.44 Their study demonstrates the social efficiency of this program. The benefit- cost estimates are lower when compared to PPP, in part because the costs of the program are higher. It is the first study to account for life- cycle gains in health using age thirty- four biomarkers to project future 44. This paper extends the methodology in Heckman et al. (2010b).

Early Childhood Education

Table 4.8

265

Costs and benefits of PPP and ABC, 2014 USD PPP

ABC

Net present value

Female

Male

Pooled

Female

Male

Pooled

Parent income Control group preschoolb Program cost per recipientc

— — $31,168

— — $31,168

— — $31,168

$88,358 $1,832 $91,519

$88,358 $1,292 $91,519

$88,358 $1,469 $91,519

Education costsd Subject labor incomee Subject transfer incomef Savings in medical expendituresg Savings in crimeh Quality of life (QALY) benefitsi

$9,626 $149,157 $9,656 — $26,400 —

$(19,678) $50,269 $4,248 — $131,330 —

$(7,528) $91,272 $6,490 — $87,823 —

$28,715 $36,270 $2,614 $9,920 $9,924 $2,997

$5,083 $89,417 $1,729 $22,236 $219,911 $21,845

$12,586 $70,798 $2,256 $19,604 $101,726 $19,985

Net benefit Benefit-cost ratio S. E. Internal rate of return (%) S. E.

$144,420 7.3:1 (3.2) 9.5 (2.7)

$174,358 5.4:1 (3.0) 9.7 (3.0)

$31,671 1.4:1 (0.98) 4.1 (0.10)

$358,352 4.9:1 (3.19) 12.7 (0.06)

$200,009 3.2:1 (1.53) 11 (0.05)

a

$161,944 6.6:1 (2.7) 7.7 (2.6)

Source: The PPP estimates are from Heckman et al. (2010b), and the ABC estimates are from Elango et al. (2015). Note: The PPP results use a 3 percent discount rate, and ABC results use a 4 percent discount rate. All results take into account deadweight loss of public spending of 50 percent. Cost-benefit ratios in PPP do not exactly reflect the net benefits and costs because the ratios and the internal rates of return are adjusted for compromised randomization. a Parental income: annual labor income during children’s ages birth to fifteen. b Costs incurred by parents of the control group children for sending them to preschool. c Cost per recipient of either PPP or ABC. d Education costs from elementary school up to latest education over the life cycle. e Labor income from ages twenty-one to sixty-five. f Total income transferred from the government to the individual. Given this is a transfer from one agent of society (government) to another (individual), this number only accounts for the deadweight loss generated by the transfer. g Total medical expenditures from age thirty-four up to expected death. Treatment group individuals spend more, on average, because they live longer due to positive treatment effects on multiple health measures. h Savings due to crime reduction, accounting both for costs to victims and prison costs. i QALY stands for quality-adjusted life years. Quality of life is measured by an index of activities of daily life and takes values between 0 and 1, where 0 represents death and 1 represents full health. Each year of life is valued at $150,000 and weighted by the quality of life. Standard errors are obtained using bootstrapping.

health. Other important sources of benefit from the program are gains in parental income while participants are young, gains in later- life income, and decreases in criminal activity. The study finds an overall benefit- cost ratio of 3.2:1 and an internal rate of return of 11 percent.45 When decomposed by gender, the results are much stronger for males because the main benefits are reduced criminal activity and improved health, both of which show stronger effects for males.46 45. The estimates are statistically significant at the 10 percent level. 46. Barnett and Masse (2007) provide an estimate of the benefit- cost ratio for ABC of 2.5:1, but give no standard error for their estimate, do not aggregate by gender, and use an ad hoc

266

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

Table 4.8 displays the main components of the cost- benefit analyses of PPP and ABC. Lifetime earnings and health benefits are crucial components of the benefits of ABC, as well as reductions in criminal activity corresponding to serious crimes for males (Elango et al. 2015).47 Gains in parental income are an important component of the returns to ABC because the program provided care for up to nine hours a day, thus enabling mothers to increase their labor supply. Early childhood education has effects not only on the children, but also on the economic lives of their families. It is a form of enriched child care that enables mothers to work and to provide additional resources for disadvantaged families. There are likely intergenerational effects on the children of participants in both programs as well. Data being collected on PPP will enable analysts to compute the gains to the children of participants (Heckman 2015). Our evidence on the social benefits of ABC and PPP does not suggest that these programs should be slavishly imitated. It suggests guiding principles for future policy, which can only benefit from the knowledge acquired since the time these programs were implemented. It shows the promise of such programs and provides a lower bound on what is possible. 4.3.8

Summary of the Evidence from Demonstration Programs

The evidence on demonstration programs supports several general conclusions. High- quality early childhood education programs targeted to disadvantaged children have long- term positive effects on important social and economic outcomes. Although the short- term effects on IQ tend to fade, a careful examination of program effects on multiple skills and dynamic skill formation demonstrates how improvements in noncognitive skills generate lasting effects on many later- life outcomes. The strong estimated effects and the evidence on social efficiency supported by cost- benefit analyses provide a strong case for the public provision of high- quality targeted programs. These programs also provide child care and facilitate working by the mothers of disadvantaged children. 4.4

Evidence from Head Start

Head Start is the largest and oldest public early childhood education program in the United States.48 Evidence on it is important for understanding method to forecast future benefits of treatment. Their calculation does not account for the most recent follow-up of ABC, including the substantial boost in health of participant males. Its main components are gains on parental income when the children are young and individual income up to age twenty- one, but their estimates of earnings impacts are not credible. 47. Health data were not collected for PPP. 48. Other large- scale, targeted early childhood education programs in the United States include the Chicago Parent-Child Centers and Early Head Start. Reynolds and Temple (1998, 2006), Reynolds et al. (2011), and Love et al. (2005) respectively evaluate them.

Early Childhood Education

267

the benefits of early education. There are multiple evaluations of Head Start based on different methodologies and data sources. Studies use evidence from both nationally representative data sets and a randomized controlled trial designed to evaluate Head Start.49 The evaluations of Head Start report contradictory claims, in part because they fail to articulate the different policy questions that they implicitly answer. Project Head Start (1969) and McKey, Aitken, and Smith (1985) are two highly cited studies claiming to find no long- term effects on relevant socioeconomic outcomes. On the other hand, Ludwig and Miller (2007) and others claim that the program recovers its costs and then some through the gains it creates in the educational attainments of participants. As a group, these studies are imprecise about the counterfactuals being estimated. They typically do not discuss the alternative child- care arrangements available to participants at the time they were enrolled. This section presents evidence from evaluations with rigorous methodologies. We discuss studies that address well- defined policy questions that consider the availability of alternative child- care arrangements. These studies find that Head Start has positive effects in the short term on measures of cognitive and noncognitive skills. They are reinforced by the evidence from several studies evaluating long- term outcomes, using many different data sets and methodologies, all of which find impacts in substantive adult outcomes. 4.4.1

Overview of Head Start

Head Start is a means- tested, federal preschool program founded in 1965. It is the largest ongoing early childhood education program in the United States. Children age three or four are eligible if family income is below or at the poverty line (though there is a designated quota for children whose families are above the poverty line). Children who enter the program at age three receive two years of treatment, which is mainly given in center- based programs. Its objective is to foster cognitive and noncognitive development and school readiness with a “whole child” approach. It pursues these objectives by granting funds to qualified centers. In turn, these centers are required to maintain high performance standards. Performance standards within Head Start mandate minimal quality levels for health, nutrition, and family partnerships. Head Start centers must verify the child’s health status and screen for behavioral or mental health problems. Head Start centers also provide services to parents and families in order to improve the “whole” environments of the children.50 Despite its uniform minimum standards, there is substantial heterogeneity in the quality of Head Start centers, both in services and in the skills of the staff. While many categorize Head Start as a high- quality program, we 49. The Head Start Impact Study (HSIS) is reported in Puma et al. (2010, 2012). 50. Administration for Children and Families, Office of Head Start (2009).

268

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

cannot make an absolute judgment of “the” effect of Head Start due to the substantial heterogeneity in treatment effects. Early Head Start Early Head Start is an offshoot of Head Start. Established in 1994, it serves pregnant women and children under age three who meet Head Start’s income- eligibility criteria. All Early Head Start programs offer full- day, full- year treatment and have center- based and/or home- visiting components. Like Head Start, it has a “whole child” approach with the goal of preparing children for future growth and development. Notably, it focuses on nurturing healthy attachments between children and their parents and caregivers. Both Early Head Start and Head Start offer transition services to help children adjust and move smoothly from Early Head Start to Head Start and from Head Start to kindergarten. We do not review results from Early Head Start due to the scarcity of rigorous evaluations of it, their short- term follow-up, and high heterogeneity of the treatments offered.51 Comparability with Demonstration and Universal Programs Like the demonstration programs previously discussed, Head Start is means tested and provides services beyond center- based care. In fact, Head Start shares important features with PPP and ABC, including curricular and extracurricular program components. There is a relationship between Head Start and previous early childhood education programs, such as PPP and ABC. Roughly 30 percent of the Head Start Impact Study (HSIS) centers use the HighScope curriculum, which was developed from the PPP curriculum. This curriculum seeks to improve school readiness by targeting age- appropriate developmental tasks such as gross/fine motor, language and literacy, cognitive, and social- emotional development. It emphasizes the importance of a supportive learning environment and the relationship between caretaker and child.52 Second, ABC and Head Start share extracurricular components, including medical and nutritional services. Eightyeight percent of the children who participated in HSIS received nutritional services through the program. Some 80 percent received medical services. ABC and Head Start also share operational similarities (Puma et al. 2010, 51. One evaluation of Early Head Start is by Love et al. (2005). They use an instrumental variable approach to assess the effects of program participation on a variety of outcomes at age three. Early Head Start had three types of implementations: (a) center- based programs, (b) home- based programs, and (c) mixed- approach programs. When pooling the sample, they find important gains on mental development, cognition, and some measures of child behavior. Unfortunately, the results are not as clear when the samples are broken down into type of implementation. The available Early Head Start evaluations do not isolate the effects by treatment stream. Furthermore, it fails to provide estimates of the effects of the program in the long term because data are not available. Given its similarities with Head Start, future evaluations should discuss whether control contamination is an issue. 52. Puma et al. (2010, 2012).

Early Childhood Education

269

2012). Forty- five percent of Head Start centers offer care from birth to age five by combining Head Start and Early Head Start.53 Further operational similarities include access to full- day care and transportation to the center. Sixty- eight percent of children who participated in HSIS were offered the option of attending full- day care, and 63 percent had the option of being transported to the center, as in ABC.54 Head Start also has similarities with the universal programs we discuss in section 4.5. It is a wide- ranging program that serves diverse disadvantaged populations. Analyses of Head Start are not subject to questions of large- scale reproducibility that burden the evidence from demonstration programs. 4.4.2

Data

There are two sources of evidence on Head Start: (a) HSIS, which is the largest randomized control trial on early childhood education in the United States; and (b) studies based on nationally representative observational data, such as the Panel Study of Income Dynamics ([PSID]; see Panel Study of Income Dynamics 2015), the National Longitudinal Survey of Youth 1979 ([NLSY79]; see Bureau of Labor Statistics 2015), and the Children of the National Longitudinal Survey of Youth ([CNLSY]; see Bureau of Labor Statistics 2011), which record participation in Head Start and have longterm follow-up data. As the largest randomized control trial of an early childhood education program in the United States, HSIS is a preferred source of data for analysts. It does not suffer from the small sample- size problems that plague demonstration programs. Moreover, it is nationally representative of Head Start centers across the nation, which implies generalizability of its results. Yet, it suffers from some major limitations that complicate the estimation of meaningful policy parameters, namely: heterogeneous treatments across centers, lack of long- term follow-up, and control contamination. Heterogeneous Populations and Treatment Alternatives Head Start provides funding to local centers, which attempt to tailor treatment of the problems of the populations they serve. Thus, the quality of the centers, the populations served, and the alternatives available to parents vary among centers. Lack of Long-Term Follow-Up The HSIS has follow-up until age nine and cannot be used to evaluate long- term effects of Head Start. Lack of long- term follow-up in HSIS is mitigated by the availability of long- term outcomes in nationally representa53. Administration for Children and Families, Office of Head Start (2014). 54. Puma et al. (2010).

270

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

tive data such as the PSID, NLSY79, and CNLSY. However, this results in an additional limitation on evaluations of Head Start, as long- term evaluations need to address the methodological challenges of integrating nonexperimental data with experimental data. Control Contamination An important challenge emerges from the extensive control contamination that is present in HSIS. While the control group was denied treatment in the study centers—that is, the centers participating in HSIS—nothing prevented control (or treatment) families from seeking alternative options. This alternative could even include other centers providing Head Start. In fact, 15 percent of the control group attended other Head Start centers. In the HSIS study, some 40 percent of the control group used center- based care. Therefore, estimates of treatment effects that do not account for control contamination compare Head Start to Head Start for many participants. Such estimates—unsurprisingly—are close to zero and do not speak to the efficacy of Head Start compared to the home care provided by parents. We present short- term and long- term evidence on the impacts of HS in the following section. We summarize the evidence from all sources in table 4.9. 4.4.3

Short-Term Outcomes

Puma et al. (2012) report a battery of mean differences between the treatment and “control” groups followed in HSIS using data through the age nine follow-up. They report estimates for an age three cohort and age four cohort. The age three cohort received at least one year of treatment; after the first year of treatment, 63 percent of the treatment group remained at a Head Start center, and 26 percent of the treatment group were in some other center- based care arrangement. The age four cohort received only one year of treatment. For both cohorts, they report short- term positive effects for most measures of cognition, which disappear by age nine. There are some treatment effects for noncognitive skills, but the measures used are unreliable.55 There are positive effects on parenting quality, especially for the age three cohort. Parents of the age three cohort spanked their children 14 percent less than control parents after the first year of treatment; by the age six follow-up, they spanked their children 9 percent less. The authors report that these estimates are significant at the 10 percent level, but do not report exact p- values or standard errors. The control group had access to early childhood education alternatives, including other Head Start centers, so 55. Treatment effects on the same measures of noncognitive skills vary in sign depending on whether the measure was parent or teacher reported. Parent- reported measures yield favorable treatment effects, while teacher- reported measures yield unfavorable treatment effects.

Early Childhood Education

271

the reported treatment effect does not compare Head Start to home- based child care. Ludwig and Phillips (2008) use cognitive outcomes measured at the end of the first year of treatment and attempt to improve the interpretation of the estimates by statistically adjusting for the presence of control children who attend a Head Start center not in the HSIS study. To account for differences in enrollment to Head Start in the treatment and control group, they use a Bloom (1984) estimator to adjust intent- to-treat estimates reported in Puma et al. (2005). They find effect sizes of .346 for the age three cohort with standard error .074 and an effect size of .319 for the age four cohort with standard error .147.56 Their study does not address control contamination of other types. These estimates can be understood as estimates of the effects of offering Head Start in one center: the impact of Head Start at the center against the next best alternative, which may be another Head Start center. When considering the effectiveness of providing public early childhood education programs compared to no programs at all, it is not the policy- relevant parameter. Two recent studies address control contamination in HSIS more systematically. They relate their estimates to theoretical parameters in order to answer well- defined and relevant policy questions.57 Both studies provide estimates of the average treatment effects in Head Start compared to different alternatives available to parents: (a) other preschool programs, and (b) home care. Their estimates are based on five exhaustive and mutually exclusive groups: (a) those who are always Head Start users (11 percent), (b) those who are always preschool users (11 percent), (c) those who always keep children at home (12 percent), (d) those who enroll in Head Start58 (20 percent), and (e) those who stay at home after randomization into the program (45 percent).59 Identification in both papers relies on strong functional form assumptions. Feller et al. (2014) use a version of the standard econometric selection model and rely heavily on normality assumptions on the observed variables driving selection into treatment to identify their reported treatment effects. Kline and Walters (2014) present a much richer interpretive framework but rely on normality to characterize dependence among choices and outcomes, although they do not impose normality on the full model as do Feller et al. (2014). These studies discuss the identification problems present when using a single randomization to identify the effects of multiple choices.60 56. Literacy is measured by the Woodcock-Johnson letter identification test. 57. Feller et al. (2014), and Kline and Walters (2014). 58. “Compliers” in the language of LATE. 59. We take these numbers from Feller et al. (2014). Kline and Walters (2014) report very similar percentages. 60. See Heckman and Vytlacil (2007) for a general analysis of multiple competing choices and the use of instruments in this context.

272

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

Both papers give estimates of the effect of Head Start relative to staying at home, which is the closest estimate of the parameter assessing the effect of Head Start relative to no treatment at all. The magnitudes of their preferred estimates on cognition are different: 0.23 of a standard deviation in Feller et al. (2014) (standard error .038) and 0.38 of a standard deviation in Kline and Walters (2014) (standard error .047).61 Kline and Walters (2014) find negative selection into the program. Individuals who gain the most are the least likely to participate. After correcting for selection, the average treatment effect on the population is as high as 0.47 standard deviations of test scores (standard error .110), which approaches the effect that demonstration programs have on early measures of cognition. Both papers conclude that the effect of Head Start is similar to that of the alternative, local, center- based preschool alternatives and are both better than home care. This underscores the importance of carefully defining the alternative against which Head Start is compared. Another recent study (Zhai, Brooks-Gunn, and Waldfogel 2014) uses HSIS data to evaluate the short- term effects of Head Start. They compare individuals assigned to the treatment group with individuals assigned to the control group. The control group received care from three alternatives: (a) parental care, (b) care from relatives, and (c) care from another center. For comparison, they match individuals in the treatment group to three subsamples of the control group using standard methods for controlling for selection on observables.62 They assess measures of both cognitive and noncognitive behavior, as reported by the parents. Their findings on cognition are similar to the findings of Feller et al. (2014) and Kline and Walters (2014). They find that children who would have been cared for by their parents or relatives benefit the most from Head Start. The effects sizes on PPVT are .30 (parental care) and .19 (care from relatives) points at age three and .15 (parental care) and .30 (care from relatives) points at age four, for the respective comparison groups. The evidence is somewhat ambiguous on program effects for noncognitive outcomes, but using parent reports, children generally become less aggressive and hyperactive at ages three and four.63 Teacher- reported measures of noncognitive outcomes have negative treatment effects (see Puma et al. 2010). Zhai, Brooks-Gunn, and Waldfogel (2014) do not report standard errors for their estimates. 61. One of the reasons for this discrepancy is the use of different measures of cognition. Feller et al. (2014) use the Peabody Picture Vocabulary Test (PPVT), while Kline and Walters (2014) use an index of various measures. 62. Inverse probability weighting. 63. Bitler, Hoynes, and Domina (2014) present evidence relevant to our discussion using quantile instrumental variable methods. Children with relatively low skill endowments or from disadvantaged backgrounds benefit the most from treatment in Head Start. A serious limitation of these methods is the assumption of rank preservation in treatment and control distributions. When tested, this assumption is usually rejected. (See, e.g., Cunha, Heckman, and Navarro 2005; Kline and Tartari 2015).

Early Childhood Education

4.4.4

273

Long-Term Outcomes

The HSIS has no long- term follow-up. Evaluating the long- run impacts of Head Start requires use of nonexperimental methods. We present results from such methodologies and discuss their policy implications. Currie and Thomas (1995), Garces, Thomas, and Currie (2002), and Deming (2009) use longitudinal data in conventional, but controversial, panel data “fixed- effects” models that assume that the unobserved characteristics driving selection into treatment—and into preschool in general— are constant across time and are identical across children within families. They control for access to alternative early education programs to address the problem of control contamination. Currie and Thomas (1995) find short- term effects on cognition for both African American and white children. However, these gains fade out for African American children. Deming (2009) finds short- term effects for African American but not for white children, and also finds a fadeout pattern consistent with that reported in Currie and Thomas (1995). These studies are inconclusive about the effectiveness of the program because they do not consider their benefits on the multiple skills known to be important predictors of life outcomes. Garces, Thomas, and Currie (2002) and Deming (2009) measure treatment effects on outcomes during adulthood. Both studies find positive effects on high school completion and college attendance—the former for white enrollees and the latter for African American enrollees. Garces, Thomas, and Currie (2002) document positive effects on crime for African American participants, but Deming (2009) finds no effects on crime. Although these studies attempt to account for selection into treatment, they only allow for a single additive unobserved component generating selection within the family and across time. Therefore, they cannot determine if the differences in their results are due to heterogeneity in treatment, problems in the specification of the models, differences in the populations, or something else. Ludwig and Miller (2007) exploit variation in access to technical assistance for implementing Head Start in 300 poor counties, offered by the Office of Economic Opportunity in the 1960s. These counties were 50– 100 percent more likely to participate in Head Start when compared to similarly situated counties. They find no notable differences in baseline characteristics between their 300 poor counties and their comparison counties. The authors find that Head Start has beneficial effects on mortality and schooling, although these findings are, at best, suggestive because they are based on limited data. Their reported effects are identified by comparing the outcomes in the 300 poor counties with other poor counties where alternatives to early childhood education are very limited. Their evidence is consistent with the finding that treatment is especially effective for disadvantaged children. In the best available study, Carneiro and Ginja (2014) examine the long-

274

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

term effects of Head Start by exploiting discontinuities in eligibility rules using the NLSY79 (Bureau of Labor Statistics 2015) and the CNLSY79 panel data sets. They show that there are multiple eligibility thresholds across years, states, family size, and family structure. This distinguishes their study from standard regression discontinuity designs with a single threshold. They estimate the marginal effect of relaxing eligibility requirements for different groups of the population. This methodology is important when relating their findings to policy questions because it allows for comparison of the effects across individuals with different alternatives. The authors report long- term positive effects on health behaviors, such as the number of visits to the doctor, use of medicine, and reduced smoking, as well as on behavioral outcomes, such as grade repetition and special education. They also find that the program reduces obesity at ages twelve and thirteen, depression and obesity at ages sixteen and seventeen, and crime at ages twenty and twenty- one. As in the case of demonstration programs, Head Start is judged to be effective when it is evaluated using multiple outcomes, rather than focusing solely on cognitive outcomes. 4.4.5

Cost-Benefit Analyses

Although a formal cost- benefit analysis for Head Start is not available, several studies present limited calculations of the social benefits of the program. Currie and Thomas (1995) find that effects on African American enrollees are not sufficient to recover the costs of the program, while the results for whites are sufficient to do so. Ludwig and Miller (2007), Deming (2009), Kline and Walters (2014), and Carneiro and Ginja (2014) argue that the social returns of the program are positive. They do not account for many relevant benefit components and interpret their results as lower bounds. This evidence indicates that the program may be socially efficient, though it is based on rough calculations and approximations and therefore is less definitive than the evidence on effectiveness from the demonstration programs. Nonetheless, it is consistent with their estimate. An example of this sort of analysis is the study by Kline and Walters (2014), who use the estimated effects reported for the Tennessee Star Study on earnings to link the short- term effects on cognition to earnings in Head Start.64 Their calculation is approximate because the programs have different objectives and did not serve comparable populations.65

64. The earnings estimates for their calculations come from Chetty et al. (2011). 65. This practice is widely used in the literature. Many of the current analyses of the longterm gains generated by early education use ad hoc relationships between short- term measurements and long- term outcomes to forecast future gains from the program (see Barnett and Masse 2007; Bartik, Gormley, and Adelstein 2012), a practice of questionable value. Elango et al. (2015) present a more principled extrapolation analysis and a discussion of general procedures.

Early Childhood Education

4.4.6

275

Summary of the Evidence from Head Start

We summarize the estimates for Head Start that are reported in the literature in table 4.9. As previously noted, the counterfactuals identified in these studies are not clearly specified. We also present comparable estimated effects from PPP and ABC by way of comparison. The effects reported in demonstration programs are typically stronger. It is important to note that: (a) the studies based on HSIS only evaluate the impact of a single year of Head Start; (ii) the Head Start population is less disadvantaged than the populations served by ABC and PPP; and (c) the quality offered at Head Start centers is heterogenous, but on average is probably lower than the quality offered by ABC or PPP. Thus, it is not surprising that even after control contamination is taken into account, and a more clearly defined counterfactual identified, the estimated short- term impacts of Head Start are smaller than the impacts of the demonstration programs. Long- run studies of Head Start based on observational data show substantial effects on later- life, socioeconomic outcomes. These findings reinforce the need to consider multiple skills when evaluating early childhood programs. Dismissing Head Start as a failure because of a documented fadeout of IQ ignores the fact that early education has effects on multiple important dimensions of individual lifetimes. This is especially important because these dimensions may be complementary and self- productive. Negative assessments of Head Start ignore an important body of evidence.66 4.4.7

The Tennessee Voluntary Pre-Kindergarten Program

A recent evaluation of a means- tested local program in the United States (The Tennessee Voluntary Pre- kindergarten Program) has recently captured public attention. This program is not a Head Start program. However, like Head Start, it is large scale and targets children on the basis of socioeconomic status. A handful of sites affiliated with the program are Head Start centers, although it is not clear whether any of these are included in the program’s evaluation. This program is used as evidence against the effectiveness of large- scale preschool programs like Head Start (see Barshay 2015). The Tennessee Voluntary Pre- kindergarten Program (TN-VPK) is a statewide kindergarten program, targeting disadvantaged four- year- old children one year before kindergarten. It began as a pilot program in 1998 and became statewide in 2005. More details on its implementation, quality, and funding are reported in appendix B. The program is evaluated by a randomized control trial. However, the evaluation has major flaws and the interpretation of its results is clouded by the presence of control contamination. Program implementers requested parental consent after performing the randomization, causing substantial 66. An illustrative example is Fox Business News (2014).

— — — — — — — — — — 0.00 (0.071) 0.031 (0.067) 0.051 (0.357) — — –0.126 (0.05)

1979–1987

— — — — 0.46 (0.129) 0.201 (NA) –0.008 (0.098) — — — — — — — — — —

Data set Subpopulation

Years of birth Impacts IQ/achievement, ages 3–4

Ever booked crime

Idle

Earnings, ages 23–40

Attended some college

High school grad. (no GED)

Grade retention ever

IQ/achievement, ages 7–21

IQ/achievement, ages 5–6

Behavior, ages 3–4

PSID AA, mother edu. ≤ high school 1966–1977

C-NLSY AA

Study

Garces, Thomas, and Currie (2002) C-NLSY AA

Deming (2009)

— — — — — — — — — — 0.117 (0.080) 0.028 (0.019) — — — — — —

— — — — 0.287 (0.095) 0.031 (0.076) –0.107 (0.056) 0.067 (0.044) 0.136 0.049 — — –0.030 (0.053) 0.051 0.050

1960–1975 1979–1986

Multiple

Ludwig and Miller (2007)

Evidence across studies of the impacts of Head Start

Currie and Thomas (1995)

Table 4.9

HSIS

Feller et al. (2014)

— — — — — — — — — — — — — — — — — — — —

0.230 (0.038) — — — — — — — — — — — — — — — — — —

1977–1996 1998–1999

C-NLSY Males

Carneiro and Ginja (2014) HSIS

Zhai et al. (2014)

Perry Preschool (Various sources)

0.375 (0.047) — — — — — — — — — — — — — — — — — —

0.30a — 0.35–0.19a — — — — — — — — — — — — — — — — —

— — — — 0.763 c (0.127) 0.084 c (0.059) — (0.151) 0.56d (0.093) — — $6,166 d (8244) — — –2.77d (1.590)

AA, low child IQ at entry & SES 1998–1999 1998–1999 1959–1964

HSIS

Kline and Walters (2014)

0.880b (0.147) — — 0.427c (0.227) 0.300c (0.213) –0.244b — 0.185b (0.210) — — $8,499b (8018) — — –5.739b (4.250)

98% AA, low mother IQ, & low SES 1972–1977

Abecedarian (Various sources)

— — — —

— — — —

— — — —

— — — —

–0.647 (0.582) –0.552 (0.489)

— — — —

— — — —

— — — —

— — — —

— — — —

Note: Impacts are in bold whenever they would be significant in a t-test at the 10 percent significance level. SES stands for socioeconomic status. Impacts on IQ/achievement scores are reported in standard deviations. Currie and Thomas (1995) originally report impacts on IQ/achievement in terms of test scores: PPVT at age eight in Currie and Thomas (1995) is calculated using their interaction of Head Start and Peabody Picture Vocabulary Tests coefficient. The SE for the predicted impact at this age is not reported. Our calculations use bootstrapped standard errors. Grade retention is measured at age five in Currie and Thomas (1995) and at age eighteen in all other studies. Earnings in Garces et al. (2002) are measured in logs. Ludwig and Miller (2007) use census data, vital statistics, and the NELS. For the sake of brevity, we limit the number of estimates we present from Ludwig and Miller (2007) to only one per data set: the impact of treatment on mortality is from the vital statistics, impact on high school completion is from the NELS, and impact on attending some college is from the census. Impact on high school completion and college attendance are for children roughly eighteen to twenty-four years old. Feller et al. (2014) originally reported 95 percent posterior intervals of 0.15, 0.30 during the Head Start Program. Impacts reported in Kline and Walters (2014) are estimated from a summary index created from Peabody Picture Vocabulary Tests and Woodcock-Johnson III Preacademic Skills tests taken in Spring 2003; this index is standardized to have mean 0 and a standard deviation of 1. The Center for Epidemiological Studies Depression Scale in Carneiro and Ginja (2014) measures symptoms of depression in percentile scores, where higher scores are negative. AA: African American. a For IQ in Zhai et al. (2014), we report effect sizes on PPVT at ages three and four (they coincide). For behavior we report hyperactiveness at these same ages. Only Zhai et al. (2014) accounts for multiple hypotheses testing, across similar outcomes. For the studies using HSIS data, all treatment effects are reported in terms of effect sizes and, thus, are comparable across studies. For the estimation results that are reported separately for three-year-old and four-year-old cohorts, we use simple averages. For ages three to four, we report the results in Feller et al. (2014), Kline and Walters (2014), and Zhai et al. (2014), measured after the Head Start year. For ages five to six, we report the results in Zhai et al. (2014) measured after the children finish kindergarten. The comparable results in Puma et al. (2012) are 0.135 for ages three to four and 0.085 for ages five to six. b Impacts are reproduced from the web appendix for Elango et al. (2015). IQ is reported at age three using the Stanford-Binet Intelligence Scale. Grade retention is reported for K–12 schooling. High school graduation is reported at age nineteen. Income is reported at age thirty in 2014 dollars. “Ever booked crime” represents total arrests by age thirty-four. c Own calculations. See table 4.4; impacts are in bold whenever they have a significant one-sided, permutation p-value. IQ for ABC is reported at age five and eight using the Wechsler Intelligence Scale. d Results taken from table 4.7; see the corresponding table note for details. This table only displays results for females from PPP. “Ever booked crime” represents total arrests by age forty.

Depression scale, ages 16–17

Behavior index, ages 12–13

278

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

selective attrition from the study. The subsample for whom they received consent is called the Intensive Substudy. For the first cohort of participants, only 46 percent of the parents in the treatment group consented to enter the study and 32 percent of the parents in the control group consented. The rates of consent for the second cohort were 74 percent for the treatment group and 68 percent for the control group. This sampling plan creates a major problem of selective attrition. Experimental methods to evaluate this program become invalid, so the evaluators rely on nonexperimental methods (Lipsey et al. 2013; Lipsey, Farran, and Hofer 2015).67 The evaluation of TN-VPK does not account for control contamination. In their sample, 27 percent of the children in the control group attended Head Start or a private, center- based preschool program (Lipsey, Farran, and Hofer 2015). The evaluation of this program does not address these confounds and does not identify a clear counterfactual. A reduced set of measures were reported for the full sample, including grade repetition, attendance, disciplinary action, and special education. Estimates of these outcomes do not rely on flawed nonexperimental methodology. The authors find that the treatment group was .77 percentage points less likely to repeat kindergarten. Short- term effects on cognition for the intensive substudy sample fade out or become negative as children age. The treatment group was 4 percentage points less likely to repeat a school grade. Short- term effects on cognition fade out. This evaluation does not represent strong evidence against the effectiveness of early childhood education programs. Instead, it illustrates that interpreting effects without accounting for flaws in experimental design or estimating clear counterfactuals produces misleading policy conclusions. It cautions against the use of randomized controlled trials as a gold standard. Evidence from nonexperimental studies should not be outweighed by evidence from a randomized control trial without serious consideration of the methodologies of the individual studies. 4.5

Evidence from Large-Scale Programs

Evidence from demonstration programs and Head Start provides a strong case for the effectiveness of means- tested early childhood education in promoting child development. Moreover, the evidence from PPP and ABC shows that programs targeting disadvantaged children are socially and economically efficient. They also support work by mothers with young children. In this section we study large- scale means- tested programs other than Head Start, and the evidence from universal programs.68 Proposals have been 67. To correct the selection problem caused by differential consent across control and treatment groups, the authors match on observable covariates. However, differential consent changed the composition of each group, and this methodology does not account for the resulting differences in unobserved characteristics. 68. A universal program is available to a general population of children in a local setting (e.g., county, state, country) when the only eligibility requirement is age.

Early Childhood Education

279

made for universal programs (Office of the Mayor, New York City 2014) and different forms of means- tested programs (The White House 2014b). The US government funds a variety of large- scale programs and initiatives. Table 4.10 describes the components of some major sources of federal funding for early childhood initiatives. There are two other major sources of funding: (a) Race to the Top: a source of funding for states, in which they compete on the basis of the quality, outcomes, and progress of their programs, and where states are selected for awards between 37.5 and 75 million 2014 USD (The White House 2014b); and (b) Preschool for All: an initiative providing 75 billion 2014 USD over ten years targeting low income (≤ 200 percent of the federal poverty line) four- year- olds, with the aim of expanding the program to moderate- income children. Its goal is to increase the quality and quantity of available preschool and to support voluntary home- visiting programs for the most disadvantaged families by providing grants to states to expand their existing preschool infrastructure and Head Start options (The White House 2014b). Though the evidence on preschool programs is limited by a dearth of noncognitive and long- term measures, a clear pattern emerges. Universal programs are not universally effective. Results from several large- scale programs show that early childhood education is most effective when targeted toward disadvantaged children. Studies of child- care arrangements of children in the United States indicate that impacts depend on the quality of the program being taken up relative to the quality of the next best alternative. Because disadvantaged children typically have low- quality alternatives compared to advantaged children, they gain more from early childhood education. The studies discussed in this section shed light on the potential benefits from universal programs and provide two major insights: (a) though they offer access with no eligibility constraints besides age, universal programs do not produce universal take-up; and (b) disadvantaged children benefit the most from universal programs. This is a consequence of their having lowerquality alternatives compared to more advantaged children. There is also a hint that at current quality levels, universal programs may harm the children of affluent parents who have better alternatives. The magnitude of effects depends on the quality of the program relative to a child’s alternative.69 The rest of this section proceeds as follows. First, we summarize studies of universal subsidies to child care in Quebec, Canada, and Norway (section 4.5.1). Second, we summarize studies of a group of universal preschool programs in Oklahoma, Georgia, and Boston (section 4.5.2). We then summarize the findings of the section (section 4.5.3). We present detailed descriptions of these programs in appendix B. 69. Blau (2003) refers to center- based programs as formal programs and to non- center- based programs as informal programs. He notes that, generally, the quality of the former is higher than that of the latter. This section follows his characterization of child care.

Expectant mothers and children under age three. Family income ≤ 190 percent fed. income level.

Family income ≤ 85 percent of the state median income for a family of the same size. Children under thirteen.

Preschool-age (three to five) children who are experiencing developmental delays (as defined by state law) and need special education.

Early Head Start, 1994–present

Child Care Development Fund (CCDF), 1990–present

Individuals with Disabilities Education Act (IDEA) Preschool Grants, 1977–present

Funds provided to states on the basis of the state’s proportion of disabled children. They must be used on educational programs that promote school readiness and incorporate preliteracy, language, and numeracy skills.

Funds are granted to states that provide subsidies to families for the purpose of paying for child care.

Grants given to centers that provide development services, child care, parenting education, case management, health care (including referrals), nutrition, and family support. Can be home based (which includes weekly home visits and group socialization), center based, and mixed approach.

Grants given to centers that provide development services, child care, parenting education, case management, health care (including referrals), nutrition, and family support. Can be home based (which includes weekly home visits and group socialization), center based, family care, and mixed approach.

Program description

Children with disabilities must be educated with children who are not disabled.

Few restrictions. Child-care facilities must meet state health/ safety regulations. Two percent of funds must be allocated to educating families on child care options.

Centers must follow curricular guidelines and pass teacher/staff qualification requirements and program quality and compliance evaluations.

Centers must follow curricular guidelines and pass teacher/staff qualification requirements and program quality and compliance evaluations.

Program requirements

Federal allocations in 2014: $353 million (2014 USD). Enrollment in 2014: 749,971 children.

CCDF federal-only funding in 2013: $5.10 billion (2014 USD). In 2013 national “average monthly adjusted number of families and children served”: 874,200 families and 1,455,100 children.

Federal appropriation in 2014: $1.37 billion (2014 USD). Enrollment in 2014: 115,826.

Federal appropriation (including local projects and support activities) in 2013: $7.74 billion (2014 USD). Enrollment in 2013 (including migrant programs): 903,679.

Scope

Source: HS and EHS: Vogel et al. (2006), Love et al. (2002), and Administration for Children and Families, Office of Head Start (2009). There are some exceptions to the income requirements for special needs children and certain minorities. Furthermore, up to 10 percent of enrollees in each center may have family income higher than the cutoff. IDEA: Administration for Children and Families, Office of Head Start (2014). CCDF: US Department of Education (2015). Note: This table compares some of the major federal funding streams for public child care. CCDF is also known as the Child Care and Development Block Grant (CCDBG). IDEA was passed in 1990, but was a continuation of the Education for All Handicapped Children Act, which was passed in 1975.

Children aged three to five. Family income ≤ 190 percent fed. income level.

Eligibility

Federal funding streams for child care

Head Start, 1965–present

Table 4.10

Early Childhood Education

4.5.1

281

Universal Subsidies to Child Care

Norway In 1975, the Norwegian parliament approved the Kindergarten Act, a reform that promoted a large- scale expansion of subsidized child care. The reform was universal: all children from ages three to six were eligible, regardless of their family background. It led to a staged expansion- inducing time and regional variation across 400 municipalities. The reform assigned responsibility for child- care provision to municipalities that followed federal quality standards, for example, educational content, group size, staff skill composition, and physical environment. As a consequence of the reform, child- care coverage for children ages three to six increased from 10 percent in 1975 to 28 percent in 1979 (Havnes and Mogstad 2011).70 Havnes and Mogstad (2011) exploit regional and time variation across municipalities in the rollout of the reform to identify its effects using a standard difference- in-difference framework. They find positive effects of the program on a battery of long- term outcomes measured when participants were in their mid- thirties, including years of education, college attendance, probability of being a high school dropout, welfare dependency, and single parenthood.71 They present two estimates. First, the intent- to-treat estimate, which simply compares eligible and ineligible children, given the time and regional variation. Second, they use a Bloom estimator to adjust the intent- to-treat estimate by the increase in child- care coverage.72 In all cases, the effects are larger when adjusting for take-up. Applying the Bloom estimator produces a 7 percent increase in the probability of attending college, a 6 percent decrease in the probability of being a high school dropout, and a 5 percent decrease in the probability of being on welfare. When they decompose results for a subsample of children of high school dropouts and high school graduates, they find that the effects on education are driven primarily by children whose mothers are less educated. Estimates by gender show that females who received the treatment are less likely to be low earners and more likely to be average earners. This finding aligns with the evidence from ABC, indicating a positive treatment effect on age thirty income for women. 70. The two main studies from which we draw results do not provide details on the characteristics of the families of children who used center- based care compared to those that did not. Thus, we cannot characterize the children who take up the program and distinguish from those who did not. Drange, Havnes, and Sandsør (2012) provide some related description of child care take-up in Norway. As recently as 1996, relatively disadvantaged children under age six were underrepresented in early childhood education participation. 71. Examples of treatment effects include: an increase of .06 (s. e. .02) years of education; an increase of 1 percent (s. e. .3 percent) in college attendance; a decrease on the probability of being a dropout of 1 percent (s. e. .3 percent); and a decrease in welfare dependency of 1 percent (s. e. .3 percent). 72. See Bloom (1984).

282

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

Although the authors do not explore the mechanisms driving their results, they provide a set of estimates that shed light on this. As discussed so far, they point out the relevance of considering children’s next best alternative when the reform rolled out. They show that the reform had no effect on the amount of hours mothers work. However, it changes child care take-up. The authors conclude that the reform crowds out informal child care and increases the quality of the formal child care taken up. Parents sent more children to center- based or formal child care and less to informal care. Thus, the positive effects are a consequence of moving children from informal to formal care. Havnes and Mogstad (2015) expand the analysis of Havnes and Mogstad (2011). They use the characteristics of the children who were affected by the reform and note that relatively disadvantaged children benefited the most from it. They allow for nonlinearity in the differences- in-differences framework of Havnes and Mogstad (2011). Specifically, they explore variation in the effects of the reform on children along the earnings distribution once they become adults. They find that “upper- class children suffer a mean loss of $1.15 for every dollar spent on subsidized child care, whereas children of low- income parents experience an average gain of $1.31 for every dollar spent” (Havnes and Mogstad 2015, 108), which produces an increase in social mobility across the participating cohorts. The evidence from this reform relates to two of the policy implications on which we present evidence throughout this chapter. First, disadvantaged children benefit the most from early childhood education. In the case of Norway, it is very plausible that the reform crowded out poor informal alternatives for disadvantaged children, resulting in a relatively large improvement in their early environments compared to those of advantaged children. This interpretation is further supported by the relatively larger effects for children of high school dropouts compared to children of high school graduates. This point relates to the second implication. The quality of the early environments of children is fundamental. The reform in Norway made more slots available in formal or center- based care, which is relatively high quality. This produces gains in short- and long- term outcomes for the neediest children. Quebec In 1997, the government of Quebec introduced a universal policy for families with children of ages birth to four. Regulated, center- based child care was subsidized to have an effective price of at most 5.00 Canadian dollars73 a day. All children age five have access to free public kindergarten.74 73. 1997 dollars. 74. Classroom size, caregiver education, and similar standards were imposed as part of the reform, one of its objectives being to improve the quality of child care. More details are in appendix B.

Early Childhood Education

283

Before 1997, only low- income families in Quebec received child- care subsidies. Further, low- income families (≤ 57,680 2014 USD) received a 75 percent tax credit for child- care expenditures (Baker, Gruber, and Milligan 2008). This implies that the gain low- income families had from the 1997 reform was relatively small compared to the gain of high- income families. There are three components to the reform. First, for children younger than age two, all previously informal child- care centers were certified and the staff was trained. Second, for children older than two but younger than kindergarten age, center- based child care was subsidized. Third, kindergarten was made free. Baker, Gruber, and Milligan (2008) evaluate the effects of the policy exploiting cross-Canada regional variation around the years of its implementation, comparing the pre- and postpolicy outcomes of families in Quebec with the outcomes of families in the rest of Canada. They find that the effects of these reforms on child behavior and parent- child interactions are negative. The policy caused a sizable increase in maternal labor supply (around 10 percentage points) with its effect mainly being experienced by high- income families, for which the program dramatically changed the cost of child care. As a result, it crowded out parental care, which may be of a higher quality than center- based arrangements for some high- income families. The policy increased emotional disorder and physical aggression at ages two and three, and decreased social development from birth to age three. Furthermore, it had negative effects on families in terms of effective parenting and maternal depression when children were between birth and four years old. Offsetting these negative findings, in later work, Baker, Gruber, and Milligan (2015) find that the policy had small, but beneficial effects for disadvantaged children. These include reduced hyperactivity, anxiety, and aggression at ages two to three. Effects on noncognitive outcomes are particularly strong for boys. Moreover, Baker, Gruber, and Milligan (2015) find evidence of decreased criminal activity as measured by apprehensions and convictions. The benefits reported in adolescence for disadvantaged boys is consistent with other evidence from programs targeted to disadvantaged families. The 1997 reform in Quebec was implemented on top of existing subsidies to low- income families. It attracted more affluent families into the program by subsidizing child care, but not providing high- quality services at the level offered in affluent homes. The negative early- life results arise because: (a) disadvantaged families were being offered a subsidy before the policy, and centers for children above age three were certified and presumably high quality; and (b) the program crowded out maternal time spent on child care by relatively affluent families. This evidence underscores the importance, in any evaluation, of considering who took up the policy and what their next best alternative would have been in the absence of the policy.

284

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

4.5.2

Local Universal Programs in the United States

For the universal public programs provided in Georgia and Oklahoma, some data on program take-up by socioeconomic status are available. Universal access to programs does not imply universal take-up. In these programs, low socioeconomic status is measured by eligibility for free or reduced- price lunch, which requires that the child’s family is at or below 185 percent of the federal poverty line. In Georgia, 59 percent of all preschool- age children in the state took up the program. Of these, 60 percent were eligible for free or reduced- price lunch. In Oklahoma, 74 percent of all preschool- age children took up the program. Of these, 61 percent were eligible for free or reducedprice lunch. Take-up is substantially lower among more affluent families.75 Cascio and Schanzenbach (2013) provide further evidence on take-up. By pooling data from Georgia and Oklahoma to make a comparison with the rest of the states in the United States, they find that take-up differs across maternal education levels. Specifically, they find that between four and five out of every ten children enrolled in public schools would have otherwise been enrolled in private preschools if their mothers had at least some college education. Thus, they project that the increase in preschool attendance in this relatively advantaged group is between 11 and 14 percentage points, compared to an increase of between 19 and 20 points for the pooled sample. Georgia and Oklahoma sponsor preschool programs that have a relatively high score in the National Institute for Early Education Research (NIEER) quality index (Cascio and Schanzenbach 2013), which is claimed to measure the quality of a state preschool program.76 Georgia and Oklahoma have a high score because they require the teachers in every classroom to hold a bachelor’s degree and have a certificate in early education. They also have class- size requirements—class size is capped at twenty children and a 1:10 teacher- student ratio is enforced. Both programs are partially funded through the Preschool for All initiative, though they also receive funding from other sources. Oklahoma’s preschools are provided by public schools and they receive funding from state and federal sources. Though Georgia’s preschools are publicly funded, the services are provided by private centers. 75. Family poverty is defined in terms of family income starting below the 200 percent poverty line. Using elementary probability calculations and data on the percentage of children eligible for free or reduced- price lunches (for which eligibility is determined by family income at or below the 185 percent poverty line), 49 percent of children in Oklahoma and Georgia were in poverty (US Census Bureau 2014). Using the total take-up and take-up by socioeconomic status statistics, the probability of taking up the program for a child in a poor household is 79 percent in Georgia and 99 percent in Oklahoma. Similarly, the probability of taking up the program for a child in a nonpoor household is 40 percent in Georgia and 49 percent in Oklahoma. 76. We note, however, that the Tennessee Program previously discussed also had a high NIEER quality index. See Lipsey, Farran, and Hofer (2015). The validity of the NIEER score has not been established.

Early Childhood Education

285

Cascio and Schanzenbach (2013) evaluate the Georgia and Oklahoma programs using a strategy similar to that of the evaluations of the Norway and Quebec reforms by exploiting regional and time variation across these and the rest of the states in the United States. They estimate intent- to-treat effects of the policy on children up to eighth grade. Their findings indicate that disadvantaged children, as measured by their eligibility for free lunch, have substantial gains in reading and math test scores by fourth grade. The effects on reading vanish by eighth grade, but the effects on math scores remain statistically precise and are economically significant. For advantaged children, the effects become small by fourth grade and vanish by eighth grade. The authors present evidence on the mechanisms producing the effects. Disadvantaged children spend less time with their mothers, but the quality of the interaction increases because they spend more time reading, playing, and doing other activities together. That is, there is a relatively large improvement in the quality of the early environment for disadvantaged children. The strategies used to identify the effects of the reforms in Norway and Quebec and the state programs in Georgia and Oklahoma are very similar. They exploit time and regional variation in program rollout. In Norway, the reform was gradual and had time and regional variation across 400 municipalities. Thus, the estimates compare regions that differ in time of the policy implementation. In Quebec, the reform was introduced in the whole province and the estimates are identified by comparing outcomes in Quebec with those in the rest of Canada. Similarly, the state programs in the United States are evaluated by comparing outcomes across Georgia and Oklahoma and the rest of the states in the United States. There is a crucial drawback to this strategy, which is inherent in differencein-difference strategies. If there are any differences in trends of unobserved local characteristics across treatment and comparison group regions, then difference- in-difference estimates do not represent the effects of the reform, but rather differences in trends that would cause these effects even in the absence of the reform. In the example of Quebec, if previous policies uniquely changed the way in which the market for female labor increased in that province, and this caused the child- care decisions observed in the period after the reform, then the estimates of program effects on labor supply are contaminated by this preexisting trend. To assess this concern, in their study, Havnes and Mogstad (2011) perform a battery of robustness checks. These include different calculations of standard errors, such as clustering, to allow for various scenarios of unobserved correlation across municipalities, excluding cities from the sample, adding municipal fixed effects, and adding time trends interacted with multiple observed characteristics at municipality level. Their results are not sensitive to any of these sensitivity exercises. The fact that the reform in Norway was

286

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

rolled out at municipality level provides a large amount of variation with which to perform many forms of sensitivity analyses. Unfortunately, this is not the case for Quebec, as the reform was at the provincial level. Nevertheless, the authors of the Quebec study perform sensitivity analyses and report robust results. In the study of Cascio and Schanzenbach (2013), the authors perform sensitivity analysis by controlling for state trends and use a battery of observed characteristics. They also explore sensitivity with respect to the window of observations they consider. While these three studies differ in the degree to which they test for sensitivity, all find little evidence for it. Gormley and Gayer (2005) and Gormley et al. (2005) evaluate Oklahoma’s preschool program in a local setting. They use administrative data from Tulsa and exploit a sharp regression discontinuity design on age eligibility. Namely, children are eligible to attend preschool if they are four years of age by September 1 of the school year. Thus, they compare children of very similar ages who were just barely eligible with those who are just barely ineligible. Data include tests measuring cognition for both groups. For the children who were not eligible, they use tests at preschool entry the following year. For the children who were eligible, they use tests at the end of preschool. They report a gain of 0.39 and 0.24 standard deviations in language and motor skills, respectively. However, this estimate is short run in nature. The program accelerates academic competence but has no long- run effect. This evidence suggests that children in some form of schooling do better on tests than children not in school. After all children enter school, the effects vanish by grade 3.77 Weiland and Yoshikawa (2013) evaluate a universal preschool program in Boston using a similar strategy. The program served 2,045 children in sixty- nine elementary schools within the city. Any child turning four years old before September 1 was eligible. Participants of the program received a year of free full- day prekindergarten in an urban public school. The children received a common curricula: full implementation of the literacy and language curriculum, Opening the World of Learning, and the mathematics curriculum, Building Blocks. Reports indicate that the curricula were implemented with high fidelity across preschools (Weiland and Yoshikawa 2013). The nature of the data makes it straightforward to compare children who were arbitrarily close to the eligibility cohort, but still not eligible, with those who were eligible and participated in the program. The reported results are positive on mathematics, reading, and some measures of social skills at the beginning of the first school year immediately following program completion. However, when they are disaggregated, these positive results show considerable variability. While children eligible for free lunch had impacts 77. See Hill, Gormley, and Adelstein (2012).

Early Childhood Education

287

on self- control (0.3 effect size), ineligible children had no impacts on this dimension. Impacts in numeracy were very strong for both groups. The magnitudes of the effect sizes are .66 and .47, respectively. We are skeptical about the interpretation of the estimates reported in Gormley and Gayer (2005), Gormley et  al. (2005), and Weiland and Yoshikawa (2013). Their reported effects are short run in nature and simply compare exposed children to unexposed children at the end of one year of the program. They do not account for catch-up in the scores when the unexposed children eventually enter school. Effects vanish by grade 3 in the Gormley studies. (Weiland and Yoshikawa [2013] only analyze short- term outcomes measured in the fall after preschool completion.) An additional problem with these regression discontinuity studies is the large bandwidth often employed (i.e., a broad band of ages of children on which either side of the discontinuity point is used). There are few children available to identify the impact in the vicinity of the cutoff and there is selective attrition of children from samples. 4.5.3

Summary of the Evidence from Universal Programs

The evidence on universal programs supports a general finding consistent with the entire body of evidence in this chapter. Disadvantaged children benefit more from early child- care education than do advantaged children. This is due to a larger improvement in the quality of the early environment for disadvantaged children compared to advantaged children. When children attend programs with higher- quality care than they would have received at home or at an alternative setting, the effects of the programs are generally positive. Given that disadvantaged children have less access to alternatives, they benefit the most from universal programs. Programs that crowd out high- quality alternatives for advantaged children, as in Quebec, produce weak or even negative effects. Further research is required to strengthen this body of evidence. In particular, the most rigorous analyses study policy changes and estimate their effects through reduced- form estimates. Some of them shed light on the mechanisms driving the policy by exploring long- term effects, effects on maternal labor supply, and so forth. However, this literature could benefit from models that investigate the mechanisms through which estimated effects are generated. 4.6

The Importance of Quality

The studies discussed thus far indicate that when the child- care options for families are low in quality, center- based policies tend to have positive effects. This is especially true for disadvantaged families for whom alternatives are of relatively low quality. Following the recent literature, this sec-

288

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

tion uses attendance to center- based care as an indicator for participation in a high- quality program and attendance to non- center- based care as an indicator for participation in a low- quality program. Generally speaking, center- based child- care establishments are required to be certified to be funded or run (see appendix B). Disadvantaged children have less access to center- based child care. All programs found to have positive effects have relatively high- quality standards (see appendices A and B). Blau and Currie (2006) present an extensive survey of the market for child care. They find that standards such as low staff- child ratios, small classroom size, and higher levels of teacher education contribute to the effectiveness of childcare centers. Bernal (2008) and Bernal and Keane (2011) reinforce the evidence on the importance of quality by comparing the effects of center- based and noncenter- based arrangements. They use the NLSY79 to examine child- care decisions in the United States and their impacts on parental labor force participation and child development. They analyze the range of child- care options available in the United States, including formal and informal care options. They use different methodologies to assess the impact of child care on cognitive and noncognitive development: (a) an approach using a fully structural model and (b) an instrumental variables approach. The first paper uses a sample of married women. The second paper uses a sample of single mothers and exploits exogenous changes in welfare program structures as sources of variation affecting the probability of a child being in child care. The papers show that child care has negative effects on cognition at ages five to eight, with a magnitude of 0.13– 0.14 standard deviations and a standard error of .049. The negative effects arise from non- center- based child care, while center- based child care has no effect. García, Hojman, and Shea (2014) provide new insights using data from a demonstration program, IHDP. Using a methodology similar to that of Bernal and Keane (2011), but utilizing a more complete set of measures, they find that: (a) time spent with the mother and center- based child care have positive effects that are very similar in magnitude on average; (b) policies that give access to center- based child care crowd out maternal time; and (c) maternal time has strikingly different consequences for more or less disadvantaged children, reflecting the quality of home interactions. Better home environments promote child development, while adverse home environments retard it. 4.7

Summary

Our analysis is based on three important principles from the literature on the economics of human development: (a) multiple skills beyond just cognition are important and are produced by effective programs; (b) the skillformation process is dynamic, and early home environments play a major

Early Childhood Education

289

role in shaping children’s lives; and (c) answering policy questions requires consideration of the alternatives available to the targeted population. Our main conclusion is that at current levels of quality provided, disadvantaged children benefit the most from early childhood education. The services offered improve on what is offered to them at home. The high- quality means- tested demonstration programs that we have examined are socially efficient as measured by benefit- cost ratios and rates of return. There is a strong case for high- quality means- tested early childhood education (using a broad definition of means tested). The evidence for universal programs is somewhat ambiguous. The evidence from Quebec suggests that standard child- care programs supporting the market labor supply of affluent women may harm their children, but may aid the children of disadvantaged families. These conclusions are based on the following bodies of evidence: 1. From our primary analysis of the data on high-quality demonstration programs, we conclude: (a) increases in cognition, as measured by IQ, generally fade out, but do not always disappear. However, gains in early- life noncognitive skills generate success later in life, boosting outcomes such as education, employment, health, and reduced criminal activity; (b) methodology is available to assess demonstration programs with compromised randomizations, small sample sizes, and attrition. Applying it shows that high- quality demonstration programs have positive effects over the life cycle. These effects survive conservative tests, adjusting test statistics for the effects of multiple hypotheses testing; (c) when evaluated comprehensively, demonstration programs targeting disadvantaged populations are socially efficient, as measured by their rates of return and benefit- cost ratios. 2. Head Start: (a) Head Start provides heterogeneous treatment to heterogeneous populations. Therefore, when assessing its impacts, it is crucial for researchers to study the available alternatives in the settings where children take up treatment; (b) studies accounting for control group contamination— that is, control group families that find alternative early childhood education environments outside of the home—show that the short- run effects of Head Start on cognitive and noncognitive skills are positive and moderate to strong; (c) studies evaluating long- term outcomes from Head Start find that the program has persistent beneficial effects on important later- life outcomes, such as health and education based on nationally representative data sets; (d) crude cost- benefit analyses of Head Start hint that the program might be socially efficient. More comprehensive evaluations likely imply high internal rates of returns, as current estimates only include gains in earnings. 3. Universal Programs: Disadvantaged children benefit the most from universal programs offered at current quality levels. Advantaged children have enriched environments available to them, and their parents are less likely to use them. In contrast, without access to such programs, disadvantaged children spend time in low- quality environments or informal settings.

290

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

References Administration for Children and Families, Office of Head Start. 2009. Head Start Program Performance Standards and Other Regulations. Technical Report. Washington, DC: US Department of Health and Human Services. ———. 2014. Head Start Program Facts Fiscal Year 2014. Technical Report. Washington, DC: US Department of Health and Human Services. Baker, M., J. Gruber, and K. Milligan. 2008. “Universal Child Care, Maternal Labor Supply, and Family Well-Being.” Journal of Political Economy 116 (4): 709– 45. ———. 2015. “Non-Cognitive Deficits and Young Adult Outcomes: The Long-Run Impacts of a Universal Child Care Program.” NBER Working Paper no. 21571, Cambridge, MA. Barnett, W. S., and L. N. Masse. 2007. “Comparative Benefit-Cost Analysis of the Abecedarian Program and Its Policy Implications.” Economics of Education Review 26 (1): 113– 25. Barshay, J., and the Hechinger Report. 2015. “Studies Shed Light on Fleeting Benefits of Early Childhood Education.” U.S. News & World Report News, October 5. http://www.usnews.com/news/articles/2015/10/05/studies- shed- light- on- fleeting - benefits- of-early- childhood- education. Bartik, T. J., W. Gormley, and S. Adelstein. 2012. “Earnings Benefits of Tulsa’s Pre-K Program for Different Income Groups.” Economics of Education Review 31 (6): 1143– 61. Belfield, C. R., M. Nores, W. S. Barnett, and L. Schweinhart. 2006. “The High/Scope Perry Preschool Program: Cost-Benefit Analysis Using Data from the Age- 40 Follow-Up.” Journal of Human Resources 41 (1): 162– 90. Bernal, R. 2008. “The Effect of Maternal Employment and Child Care on Children’s Cognitive Development.” International Economic Review 49 (4): 1173– 209. Bernal, R., and M. P. Keane. 2011. “Child Care Choices and Children’s Cognitive Achievement: The Case of Single Mothers.” Journal of Labor Economics 29 (3): 459– 512. Bertrand, M., and J. Pan. 2011. “The Trouble with Boys: Social Influences and the Gender Gap in Disruptive Behavior.” NBER Working Paper no. 17541, Cambridge, MA. Bitler, M. P., H. W. Hoynes, and T. Domina. 2014. “Experimental Evidence on Distributional Effects of Head Start.” NBER Working Paper no. 20434, Cambridge, MA. Blau, D. 2003. “Child Care Subsidy Programs.” In Means-Tested Transfer Programs in the United States, edited by R. A. Moffitt, 443– 516. Chicago: University of Chicago Press. Blau, D., and J. Currie. 2006. “Preschool, Daycare, and Afterschool Care: Who’s Minding the Kids?” In Handbook of the Economics of Education, vol. 2 of Handbooks in Economics, edited by E. A. Hanushek and F. Welch, 1163– 278. Amsterdam: Elsevier. Bloom, H. S. 1984. “Accounting for No-Shows in Experimental Evaluation Designs.” Evaluation Review 82 (2): 225– 46. Borghans, L., H. Meijers, and B. ter Weel. 2008. “The Role of Noncognitive Skills in Explaining Cognitive Test Scores.” Economic Inquiry 46 (1): 2– 12. Brooks-Gunn, J., R. Gross, H. Kraemer, D. Spiker, and S. Shapiro. 1992. “Enhancing the Cognitive Outcomes of Low Birth Weight, Premature Infants: For Whom is the Intervention Most Effective?” Pediatrics 89 (6, part 2): 1209– 15. Brooks-Gunn, J., C. M. McCarton, P. H. Casey, M. C. McCormick, C. R. Bauer,

Early Childhood Education

291

J. C. Bernbaum, J. Tyson, M. Swanson, F. C. Bennett, D. T. Scott, et al. 1994. “Early Intervention in Low-Birth-Weight Premature Infants: Results through Age 5 Years from the Infant Health and Development Program.” Journal of the American Medical Association 272 (16): 1257– 62. Bureau of Labor Statistics. 2011. National Longitudinal Surveys: NLSY79 Children and Young Adults. Washington, DC: US Department of Labor. http://www.bls .gov/nls/nlsy79ch.htm. ———. 2015. National Longitudinal Surveys: The NLSY79. Washington, DC: US Department of Labor. http://www.bls.gov/nls/nlsy79.htm. Campbell, F. A., G. Conti, J. J. Heckman, S. H. Moon, R. Pinto, and E. P. Pungello. 2014. “Early Childhood Investments Substantially Boost Adult Health.” Science 343 (6178): 1478– 85. Carneiro, P., and R. Ginja. 2014. “Long-Term Impacts of Compensatory Preschool on Health and Behavior: Evidence from Head Start.” American Economic Journal: Economic Policy 6 (4): 135– 73. Cascio, E. U., and D. W. Schanzenbach. 2013. “The Impacts of Expanding Access to High-Quality Preschool Education.” NBER Working Paper no. 19735, Cambridge, MA. Caucutt, E. M., and L. J. Lochner. 2012. “Early and Late Human Capital Investments, Borrowing Constraints, and the Family.” NBER Working Paper no. 18493, Cambridge, MA. Chetty, R., J. N. Friedman, N. Hilger, E. Saez, D. W. Schanzenbach, and D. Yagan. 2011. “How Does Your Kindergarten Classroom Affect Your Earnings? Evidence from Project STAR.” Quarterly Journal of Economics 126 (4): 1593– 660. Conti, G., J. J. Heckman, and R. Pinto. Forthcoming. “The Effects of Two Early Childhood Interventions on Health and Healthy Behaviors.” Economic Journal. Conti, G., J. J. Heckman, J. Yi, and J. Zhang. 2015. “Early Health Shocks, IntraHousehold Resource Allocation, and Child Outcomes.” Economic Journal 125 (588): F347– 71. Cunha, F. 2015. “Subjective Rationality, Parenting Styles, and Investments in Children.” In Families in an Era of Increasing Inequality: Diverging Destinies, National Symposium on Family Issues Series, edited by P. R. Amato, A. Booth, S. M. McHale, and J. Van Hook, 83– 94. New York: Springer. Cunha, F., I. T. Elo, and J. Culhane. 2013. “Eliciting Maternal Expectations about the Technology of Cognitive Skill Formation.” NBER Working Paper no. 19144, Cambridge, MA. Cunha, F., and J. J. Heckman. 2007. “The Technology of Skill Formation.” American Economic Review 97 (2): 31– 47. ———. 2008. “Formulating, Identifying and Estimating the Technology of Cognitive and Noncognitive Skill Formation.” Journal of Human Resources 43 (4): 738– 82. ———. 2009. “The Economics and Psychology of Inequality and Human Development.” Journal of the European Economic Association 7 (2– 3): 320– 64. Cunha, F., J. J. Heckman, and S. Navarro. 2005. “Separating Uncertainty from Heterogeneity in Life Cycle Earnings.” The 2004 Hicks Lecture. Oxford Economic Papers 57 (2): 191– 261. Cunha, F., J. J. Heckman, and S. M. Schennach. 2010. “Estimating the Technology of Cognitive and Noncognitive Skill Formation.” Econometrica 78 (3): 883– 931. Currie, J. 2001. “Early Childhood Education Programs.” Journal of Economic Perspectives 15 (2): 213– 38. Currie, J., and D. Thomas. 1995. “Does Head Start Make a Difference?” American Economic Review 85 (3): 341– 64.

292

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

Deming, D. 2009. “Early Childhood Intervention and Life-Cycle Skill Development: Evidence from Head Start.” American Economic Journal: Applied Economics 1 (3): 111– 34. Drange, N., T. Havnes, and A. M. J. Sandsør. 2012. “Kindergarten for All: Long Run Effects of a Universal Intervention.” Discussion Paper no. 6986, Institute for the Study of Labor. Bonn. Duncan, G. J., and K. Magnuson. 2013. “Investing in Preschool Programs.” Journal of Economic Perspectives 27 (2): 109– 32. Duncan, G. J., and R. J. Murnane, eds. 2011. Whither Opportunity? Rising Inequality, Schools, and Children’s Life Chances. New York: Russell Sage Foundation. ———. 2014. Restoring Opportunity: The Crisis of Inequality and the Challenge for American Education. Cambridge, MA: Harvard Education Press. Duncan, G. J., and A. J. Sojourner. 2013. “Can Intensive Early Childhood Intervention Programs Eliminate Income-Based Cognitive and Achievement Gaps?” Journal of Human Resources 48 (4): 945– 68. Eckenrode, J., M. Campa, D. W. Luckey, C. R. Henderson, R. Cole, H. Kitzman, E. Anson, K. Sidora-Arcoleo, and D. L. Olds. 2010. “Long-Term Effects of Prenatal and Infancy Nurse Home Visitation on the Life Course of Youths: 19-Year Follow-Up of a Randomized Trial.” Journal of the American Medical Association 164 (1): 9– 15. Elango, S., J. L. García, J. J. Heckman, A. Hojman, D. E. Leaf, M. J. Prados, J. Shea, and J. C. Torcasso. 2015. “The Internal Rate of Return and the Benefit-Cost Ratio of the Carolina Abecedarian Project.” Unpublished Manuscript, Department of Economics, University of Chicago. Feller, A., T. Grindal, L. Miratrix, and L. Page. 2014. “Compared to What? Variation in the Impacts of Early Childhood Education by Alternative Care-Type Setting.” Working Paper, Department of Statistics, Harvard University. Fox Business News. 2014. “Head Start Has Little Effect by Grade School?” Video. http://video.foxbusiness.com/v/3306571481001/head- start- has- little- effect- by - grade- school/?#sp=show- clips. Garber, H. L. 1988. The Milwaukee Project: Preventing Mental Retardation in Children at Risk. Washington, DC: American Association on Mental Retardation. Garces, E., D. Thomas, and J. Currie. 2002. “Longer-Term Effects of Head Start.” American Economic Review 92 (4): 999– 1012. García, J. L. 2014. “Ability, Character, and Social Mobility.” Working Paper, Department of Economics, University of Chicago. ———. 2015. “Childcare and Parental Investment: Short and Long-Term Effects.” Unpublished Manuscript, Department of Economics, University of Chicago. García, J. L., J. J. Heckman, A. Hojman, D. E. Leaf, M. J. Pados, J. Shea, and J. C. Torcasso. 2016. “Analyzing the Short- and Long-Term Effects of Early Childhood Education on Multiple Dimensions of Human Development.” Department of Economics, University of Chicago. García, J. L., A. Hojman, and J. Shea. 2014. “The Opportunity Cost of Early Childhood Education: Formal, Informal and Maternal Care.” Unpublished Manuscript, Department of Economics, University of Chicago. Gertler, P., J. J. Heckman, R. Pinto, A. Zanolini, C. Vermeersch, S. Walker, S. Chang, and S. M. Grantham-McGregor. 2014. “Labor Market Returns to an Early Childhood Stimulation Intervention in Jamaica.” Science 344 (6187): 998– 1001. Gilhousen, M. R., L. F. Allen, L. M. Lasater, D. M. Farrell, and C. R. Reynolds. 1990. “Veracity and Vicissitude: A Critical Look at the Milwaukee Project.” Journal of School Psychology 28 (4): 285– 99. Gormley, Jr., W. T., and T. Gayer. 2005. “Promoting School Readiness in Oklahoma:

Early Childhood Education

293

An Evaluation of Tulsa’s Pre-K Program.” Journal of Human Resources 40 (3): 533– 58. Gormley, Jr., W. T., T. Gayer, D. Phillips, and B. Dawson. 2005. “The Effects of Universal Pre-K on Cognitive Development.” Developmental Psychology 41 (6): 872– 84. Gray, S. W., B. K. Ramsey, and R. A. Klaus. 1982. From 3 to 20: The Early Training Project. Baltimore: University Park Press. Gross, R. T., D. Spiker, and C. W. Haynes. 1997. Helping Low Birth Weight, Premature Babies: The Infant Health and Development Program. Stanford, CA: Stanford University Press. Havnes, T., and M. Mogstad. 2011. “No Child Left Behind: Subsidized Child Care and Children’s Long-Run Outcomes.” American Economic Journal: Economic Policy 3 (2): 97– 129. ———. 2015. “Is Universal Child Care Leveling the Playing Field?” Journal of Public Economics 127:100– 14. Heckman, J. J. 1992. “Randomization and Social Policy Evaluation.” In Evaluating Welfare and Training Programs, edited by C. F. Manski and I. Garfinkel, 201– 30. Cambridge, MA: Harvard University Press. ———. 2008. “Schools, Skills and Synapses.” Economic Inquiry 46 (3): 289– 324. ———. 2015. “Analyzing the Impacts of Two Influential Early Childhood Programs on Participants through Midlife.” Proposal submitted to the National Institutes of Health, October 5. Heckman, J. J., N. Hohmann, J. Smith, and M. Khoo. 2000. “Substitution and Dropout Bias in Social Experiments: A Study of an Influential Social Experiment.” Quarterly Journal of Economics 115 (2): 651– 94. Heckman, J. J., M. Holland, T. Oey, D. L. Olds, R. Pinto, and M. Rosales. 2014. “A Reanalysis of the Nurse Family Partnership Program: The Memphis Randomized Control Trial.” Working Paper, Department of Economics, University of Chicago. Heckman, J. J., J. E. Humphries, and T. Kautz, eds. 2014. The Myth of Achievement Tests: The GED and the Role of Character in American Life. Chicago: University of Chicago Press. Heckman, J. J., and T. Kautz. 2012. “Hard Evidence on Soft Skills.” Adam Smith Lecture. Labour Economics 19 (4): 451– 64. ———. 2014. “Fostering and Measuring Skills: Interventions that Improve Character and Cognition.” In The Myth of Achievement Tests: The GED and the Role of Character in American Life, edited by J. J. Heckman, J. E. Humphries, and T. Kautz, 341– 430. Chicago: University of Chicago Press. Heckman, J. J., S. Kuperman, and C. Cheng. 2015. “Understanding and Comparing the Mechanisms Producing the Impacts of Major Early Childhood Programs with Long-Term Follow-Up.” Unpublished Manuscript, Department of Economics, University of Chicago. Heckman, J. J., S. H. Moon, R. Pinto, P. A. Savelyev, and A. Q. Yavitz. 2010a. “Analyzing Social Experiments as Implemented: A Reexamination of the Evidence from the HighScope Perry Preschool Program.” Quantitative Economics 1 (1): 1– 46. ———. 2010b. “The Rate of Return to the HighScope Perry Preschool Program.” Journal of Public Economics 94 (1– 2): 114– 28. Heckman, J. J., and S. Mosso. 2014. “The Economics of Human Development and Social Mobility.” Annual Review of Economics 6 (1): 689– 733. Heckman, J. J., and R. Pinto. 2015. “Econometric Mediation Analyses: Identifying the Sources of Treatment Effects from Experimentally Estimated Production

294

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

Technologies with Unmeasured and Mismeasured Inputs.” Econometric Reviews 34 (1– 2): 6– 31. Heckman, J. J., R. Pinto, and P. A. Savelyev. 2013. “Understanding the Mechanisms through which an Influential Early Childhood Program Boosted Adult Outcomes.” American Economic Review 103 (6): 2052– 86. Heckman, J. J., and E. J. Vytlacil. 2007. “Econometric Evaluation of Social Programs, Part II: Using the Marginal Treatment Effect to Organize Alternative Economic Estimators to Evaluate Social Programs and to Forecast Their Effects in New Environments.” In Handbook of Econometrics, vol. 6B, edited by J. J. Heckman and E. E. Leamer, 4875– 5143. Amsterdam: Elsevier. Hill, C. J., W. T. Gormley, Jr., and S. Adelstein. 2012. “Do the Short-Term Effects of a Strong Preschool Program Persist?” Working Paper, Center for Research on Children in the United States, Washington, DC. Hojman, A. 2015. “Evidence on the Fade-Out of IQ Gains from Early Childhood Interventions: A Skill Formation Perspective.” Unpublished Manuscript, Center for the Economics of Human Development, University of Chicago. Kagitcibasi, C., D. Sunar, S. Bekman, N. Baydar, and Z. Cemalcilar. 2009. “Continuing Effects of Early Enrichment in Adult Life: The Turkish Early Enrichment Project 22 Years Later.” Journal of Applied Developmental Psychology 30 (6): 764– 79. Kerr, M. A., R. E. Tremblay, L. Pagani, and F. Vitaro. 1997. “Boys’ Behavioral Inhibition and the Risk of Later Delinquency.” Archives of General Psychiatry 54 (9): 809– 16. Kline, P., and M. Tartari. 2015. “Bounding the Labor Supply Responses to a Randomized Welfare Experiment: A Revealed Preference Approach.” NBER Working Paper no. 20838, Cambridge, MA. Kline, P., and C. Walters. 2015. “Evaluating Public Programs with Close Substitutes: The Case of Head Start.” NBER Working Paper no. 21658, Cambridge, MA. Knudsen, E. I., J. J. Heckman, J. Cameron, and J. P. Shonkoff. 2006. “Economic, Neurobiological, and Behavioral Perspectives on Building America’s Future Workforce.” Proceedings of the National Academy of Sciences 103 (27): 10155– 62. Lavigueur, S., R. E. Tremblay, and J.-F. Saucier. 1995. “Interactional Processes in Families with Disruptive Boys: Patterns of Direct and Indirect Influence.” Journal of Abnormal Child Psychology 23 (3): 359– 78. Lipsey, M. W., D. C. Farran, and K. G. Hofer. 2015. “A Randomized Control Trial of the Effects of a Statewide Voluntary Prekindergarten Program on Children’s Skills and Behaviors through Third Grade.” Research Report, Peabody Research Institute, Vanderbilt University, Nashville, TN. Lipsey, M. W., K. G. Hofer, N. Dong, D. C. Farran, and C. Bilbrey. 2013. “Evaluation of the Tennessee Voluntary Prekindergarten Program: Kindergarten and First Grade Follow-Up Results from the Randomized Control Design.” Research Report, Peabody Research Institute, Vanderbilt University, Nashville, TN. Love, J. M., E. Eliason Kisker, C. Ross, H. Raikes, J. Constantine, K. Boller, R.  Chazen-Cohen, J. Brooks-Gunn, L. B. Tarullo, C. Brady-Smith, A. Sidle Fuligni, et al. 2005. “The Effectiveness of Early Head Start for 3-Year-Old Children and Their Parents: Lessons for Policy and Programs.” Developmental Psychology 41 (6): 885– 901. Love, J. M., E. E. Kisker, C. M. Ross, P. Z. Schochet, J. Brooks-Gunn, D. Paulsell, K. Boller, J. Constantine, C. Vogel, A. S. Fuligni, and C. Brady-Smith. 2002. “Making A Difference in the Lives of Infants and Toddlers and Their Families: The Impacts of Early Head Start. Volumes I-III: Final Technical Report and Appendixes and Local Contributions to Understanding the Programs and

Early Childhood Education

295

Their Impacts.” Technical Report no. ED472186, Mathematica Policy Research, Princeton, NJ. Ludwig, J., and D. L. Miller. 2007. “Does Head Start Improve Children’s Life Chances? Evidence from a Regression Discontinuity Approach.” Quarterly Journal of Economics 122 (1): 159– 208. Ludwig, J., and D. A. Phillips. 2008. “Long-Term Effects of Head Start on LowIncome Children.” Annals of the New York Academy of Sciences 1136 (1): 257– 68. Mâsse, L. C., and R. E. Tremblay. 1997. “Behavior of Boys in Kindergarten and the Onset of Substance Use during Adolescence.” Archives of General Psychiatry 54 (1): 62– 68. Mayer, S. E. 1997. What Money Can’t Buy: Family Income and Children’s Life Chances. Cambridge, MA: Harvard University Press. McCormick, M. C., J. Brooks-Gunn, S. L. Buka, J. Goldman, J. Yu, M. Salganik, D. T. Scott, F. C. Bennett, L. L. Kay, J. C. Bernbaum, C. R. Bauer, et al. 2006. “Early Intervention in Low Birth Weight Premature Infants: Results at 18 Years of Age for the Infant Health and Development Program.” Pediatrics 117 (3): 771– 80. McKey, R. H., S. S. Aitken, and A. N. Smith. 1985. “The Impact of Head Start on Children, Families and Communities.” Technical Report, United States Head Start Bureau, Washington, DC. McLanahan, S. 2004. “Diverging Destinies: How Children are Faring under the Second Demographic Transition.” Demography 41 (4): 607– 27. McLanahan, S., and C. Percheski. 2008. “Family Structure and the Reproduction of Inequalities.” Annual Review of Sociology 34 (1): 257– 76. Nagin, D. S., and R. E. Tremblay. 2001. “Parental and Early Childhood Predictors of Persistent Physical Aggression in Boys from Kindergarten to High School.” Archives of General Psychiatry 58 (4): 389– 94. Noll, S., and J. Trent, eds. 2004. Mental Retardation in America: A Historical Reader (The History of Disability). New York: NYU Press. Office of the Mayor, New York City. 2014. “Ready to Launch: New York City’s Implementation Plan for Free, High-Quality, Full-Day Universal Pre-Kindergarten.” Technical Report, New York Department of Education. Olds, D. L. 2006. “The Nurse-Family Partnership: An Evidence-Based Preventive Intervention.” Infant Mental Health Journal 27 (1): 5– 25. Olds, D. L., C. R. Henderson, R. Chamberlin, and R. Tatelbaum. 1986. “Preventing Child Abuse and Neglect: A Randomized Trial of Nurse Home Visitation.” Pediatrics 78 (1): 65– 78. Olds, D. L., C. R. Henderson, and H. Kitzman. 1994. “Does Prenatal and Infancy Nurse Home Visitation Have Enduring Effects on Qualities of Parental Caregiving and Child Health at 25 to 50 Months of Life?” Pediatrics 93 (1): 89– 98. Page, E. B. 1972. “Miracle in Milwaukee: Raising the IQ.” Educational Researcher 1 (10): 8– 10, 15– 16. Panel Study of Income Dynamics. 2015. PSID: A National Study of Socioeconomics and Health over Lifetimes and across Generations. https://psidonline.isr.umich.edu /default.aspx. Project Head Start. 1969. The Impact of Head Start: An Evaluation of the Effects of Head Start on Children’s Cognitive and Affective Development. New York: Westinghouse Learning Corporation. Puma, M., S. Bell, R. Cook, and C. Heid. 2010. “Head Start Impact Study: Final Report.” Technical Report, Office of Planning, Research and Evaluation, Administration for Children and Families, US Department of Health and Human Services, Washington, DC. Puma, M., S. Bell, R. Cook, C. Heid, P. Broene, F. Jenkins, A. Mashburn, and

296

Sneha Elango, Jorge Luis García, James J. Heckman, and Andrés Hojman

J. Downer. 2012. “Third Grade Follow-Up to the Head Start Impact Study: Final Report.” OPRE Report no. 2012-45, Office of Planning, Research and Evaluation, Administration for Children and Families, US Department of Health and Human Services, Washington, DC. Puma, M., S. Bell, R. Cook, C. Heid, and M. Lopez. 2005. Head Start Impact Study: First Year Findings. Technical Report. Washington, DC: Office of Planning, Research and Evaluation, Administration for Children and Families, US Department of Health and Human Services. Putnam, R. D. 2015. Our Kids: The American Dream in Crisis. New York: Simon and Schuster. Raine, A., J. Liu, P. H. Venables, S. A. Mednick, and C. Dalais. 2010. “Cohort Profile: The Mauritius Child Health Project.” International Journal of Epidemiology 39 (6): 1441– 51. Ramey, C. T., G. D. McGinness, L. Cross, A. M. Collier, and S. Barrie-Blackley. 1982. “The Abecedarian Approach to Social Competence: Cognitive and Linguistic Intervention for Disadvantaged Preschoolers.” In The Social Life of Children in a Changing Society, edited by K. M. Borman, 145– 74. Hillsdale, NJ: Lawrence Erlbaum Associates. Reynolds, A. J., and J. A. Temple. 1998. “Extended Early Childhood Intervention and School Achievement: Age 13 Findings from the Chicago Longitudinal Study.” Child Development 69 (1): 231– 46. ———. 2006. “Economic Returns of Investments in Preschool Education.” In A Vision For Universal Preschool Education, edited by E. F. Zigler, W. S. Gilliam, and S. S. Jones, 37– 68. New York: Cambridge University Press. Reynolds, A. J., J. A. Temple, B. A. B. White, S.-R. Ou, and D. L. Robertson. 2011. “Age 26 Cost-Benefit Analysis of the Child-Parent Center Early Education Program.” Child Development 82 (1): 379– 404. Ricciuti, A. E., R. G. St. Pierre, W. Lee, and A. Parsad. 2004. “Third National Even Start Evaluation: Follow-Up Findings from the Experimental Design Study.” Technical Report, US Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Washington, DC. Romano, J. P., A. M. Shaikh, and M. Wolf. Forthcoming. “Multiple Testing.” The New Palgrave Dictionary of Economics. Schneider, B., and S.-K. McDonald, eds. 2006. Scale-Up in Education, Volume 2: Issues in Practice. Blue Ridge Summit, PA: Rowman & Littlefield Publishers. Sommer, R., and B. A. Sommer. 1983. “Mystery in Milwaukee: Early Intervention, IQ, and Psychology Textbooks.” American Psychologist 38 (9): 982– 85. St. Pierre, R. G., J. I. Layzer, B. D. Goodson, and L. S. Bernstein. 1997. “National Impact Evaluation of the Comprehensive Child Development Program: Final Report.” Technical Report. Cambridge, MA: Abt Associates Inc. ———. 1999. The Effectiveness of Comprehensive, Case Management Interventions: Evidence from the National Evaluation of the Comprehensive Child Development Program.” American Journal of Evaluation 20 (1): 15– 34. The White House. 2014a. “The Economics of Early Childhood Investments.” Washington, DC: Executive Office of the President of the United States. ———. 2014b. “Fact Sheet: Invest in US: The White House Summit on Early Childhood Education.” Washington, DC: Office of the Press Secretary. US Census Bureau. 2014. “American Community Survey.” Data Set, United States Census Bureau, Washington, DC. US Department of Education. 2015. “Preschool Grants for Children with Disabili-

Early Childhood Education

297

ties: Funding Status.” Technical Report, US Department of Education, Washington, DC. http://www2.ed.gov/programs/oseppsg/funding.html. Vogel, C. A., N. Aikens, A. Burwick, L. Hawkinson, A. Richardson, L. Mendenko, and R. Chazan-Cohen. 2006. “Findings from the Survey of Early Head Start Programs: Communities, Programs, and Families. Final Report.” Technical Report no. ED498072, US Department of Health and Human Services, Washington, DC. Weikart, D. P. 1967. “Preliminary Results from a Longitudinal Study of Disadvantaged Preschool Children.” ERIC Report no. ED 030 490, presented at the 1967 convention of the Council for Exceptional Children, St. Louis, MO. ———. 1970. Longitudinal Results of the Ypsilanti Perry Preschool Project, volume 1 of Monographs of the High/Scope Educational Research Foundation. Ypsilanti, MI: High/Scope Educational Research Foundation. Weiland, C., and H. Yoshikawa. 2013. “Impacts of a Prekindergarten Program on Children’s Mathematics, Language, Literacy, Executive Function, and Emotional Skills.” Child Development 84 (6): 2112– 30. White, J. L., T. E. Moffitt, A. Caspi, D. J. Bartusch, D. J. Needles, and M. StouthamerLoeber. 1994. “Measuring Impulsivity and Examining Its Relationship to Delinquency.” Journal of Abnormal Psychology 103 (2): 192– 205. Yoshikawa, H., C. Weiland, J. Brooks-Gunn, M. R. Burchinal, L. M. Espinosa, W. T. Gormley, J. Ludwig, K. A. Magnuson, D. Phillips, and M. J. Zaslow. 2013. “Investing in Our Future: The Evidence Base on Preschool Education.” Technical Report, Society for Research in Child Development, Ann Arbor, MI. Zhai, F. H., J. Brooks-Gunn, and J.Waldfogel. 2014. “Head Start’s Impact is Contingent on Alternative Type of Care in Comparison Group.” Developmental Psychology 50 (12): 2572– 86. Zigler, E., and S. Muenchow. 1994. Head Start: The Inside Story of America’s Most Successful Educational Experiment. New York: Basic Books.

Contributors

Burt S. Barnow Trachtenberg School of Public Policy and Public Administration George Washington University 805 21st St NW, Room 601T Washington, DC 20052

Jorge Luis García Center for the Economics of Human Development The University of Chicago 1126 East 59th Street Chicago, IL 60637

Robert Collinson Robert F. Wagner Graduate School of Public Service New York University 295 Lafayette Street New York, NY 10012

James J. Heckman Department of Economics The University of Chicago 1126 East 59th Street Chicago, IL 60637

Mark Duggan Department of Economics Stanford University 579 Serra Mall Stanford, CA 94305–6072 Sneha Elango Center for the Economics of Human Development The University of Chicago 1126 East 59th Street Chicago, IL 60637 Ingrid Gould Ellen Robert F. Wagner Graduate School of Public Service New York University 295 Lafayette Street New York, NY 10012

Andrés Hojman Center for the Economics of Human Development The University of Chicago 1126 East 59th Street Chicago, IL 60637 Melissa S. Kearney Department of Economics University of Maryland 3105 Tydings Hall College Park, MD 20742 Jens Ludwig Harris School of Public Policy The University of Chicago 1155 East 60th Street Chicago, IL 60637

299

300

Contributors

Robert A. Moffitt Department of Economics Johns Hopkins University 3400 North Charles Street Baltimore, MD 21218 Stephanie Rennane Department of Economics University of Maryland 3115N Tydings Hall College Park, MD 20742

Jeffrey Smith Department of Economics University of Michigan 238 Lorch Hall 611 Tappan Street Ann Arbor, MI 48109–1220

Author Index

Aaron, H. J., 86, 86n27, 116 Abadie, A., 174n92 Adams, C., 137 Adelstein, S., 274n65, 286n77 Aidala, A., 116 Aitken, S. S., 267 Aizer, A., 9n10, 22, 25, 30, 44, 45 Akerlof, G. A., 4, 32 Allin, S., 35 Anderson, K., 213, 214 Andersson, F., 113, 186, 191, 193, 196, 197, 199, 200, 200n125, 201, 203, 203n127, 204, 205, 214, 215 Angrist, J. D., 102n36, 174n92, 184, 190 Ashenfelter, O., 178, 212n135 Autor, D., 5n5, 45 Baker, M., 283 Balducchi, D., 198n121 Barnett, W. S., 265n46, 274n65 Barnow, B. S., 128, 128n2, 131n6, 132n10, 133, 138n20, 138n21, 139n24, 140n26, 142n33, 142n34, 142n36, 142n37, 144, 146, 147, 148, 148n42, 150n47, 156n54, 162, 162n65, 181n100, 187n111, 190, 191, 193n116, 210, 211, 218 Barshay, J., 275 Bartik, T., 143n38, 210n134, 274n65 Barton, D. M., 101, 105, 105n40 Bassett, G., 174n92 Bauer, C., 66 Baum-Snow, N., 106n43, 110, 115

Bayer, P., 93 Belfield, C. R., 262 Bell, S., 197, 216 Ben-Shalom, Y., 12 Berk, G., 196n120 Berkowitz, E. D., 5, 6, 9 Bernal, R., 288 Bertrand, M., 43, 258n40 Bitler, M., 174, 174n92, 272n63 Black, D., 174, 188, 194n117 Blank, D., 142n37 Blau, D., 236n7, 237n11, 279n69, 288 Bloom, H., 112, 184n107, 190, 215n136, 271, 281n72 Bonastia, C., 108 Boo, K., 194 Borden, W., 142n36 Borghans, L., 254 Bound, J., 46, 190 Bradley, D., 140 Brewer, M., 62 Briggs, X. d. S., 116 Brooks-Gunn, J., 237n13, 244n30, 256, 258, 258n41, 272 Brown, C., 190 Brown, J., 62 Buescher, A. V., 35 Burghardt, J., 183n105, 190, 191, 207, 207n129, 210 Burkhauser, R. V., 2, 46, 213, 214 Burstein, N., 209 Burtless, G., 181n99

301

302

Author Index

Burton, L., 90 Bushway, S., 187n112 Busso, M., 187, 201 Butler, W., 128 Campbell, F. A., 250n34, 253, 253n38, 254, 259, 260 Card, D., 195 Carlson, D., 102, 107, 109n47, 111 Carneiro, P., 176n94, 187n113, 273 Cascio, E. U., 238n14, 283, 285, 286 Caucutt, E. M., 243 Cave, G., 207 Cheng, C., 246 Chetty, R., 114, 115n52, 274n64 Chyn, E., 115, 119 Clague, E., 131n6 Clements, N., 174, 174n92, 176 Coe, N., 41n30, 46n31 Cohen, E., 188 Collinson, R. A., 102, 110 Conti, G., 250n34, 263 Cook, R., 137 Cook, T., 187 Courty, P., 210, 211 Crépon, B., 180n98, 194 Culhane, J., 243n25 Cullen, J. B., 36n28, 45 Cummings, J. L., 106 Cunha, F., 240, 242, 242n22, 243n25, 272n63 Currie, J. M., 86, 101, 102n36, 105, 107n44, 112, 211, 237n11, 262, 273, 274, 288 Dahl, G. B., 38, 39 Dahlby, B., 193 Daly, M. C., 2 D’Amico, R., 140n26, 141, 142n34, 143, 144, 144n39, 145, 146, 148, 148n42 Darden, J. T., 116 Davidson, C., 194 Davies, P. S., 40 Decker, P., 196n120, 216n137 Dehejia, R., 186 Deming, D., 256, 273, 274 Desai, M. A., 106 Deshpande, M., 38, 40, 41 Desmond, M., 91, 94, 97 Devine, T., 138n19, 213 DeWitt, L., 5, 6, 9 Dharmapala, D., 106 Diamond, P., 32, 33, 47

Dickinson, K., 187n111 DiNardo, J., 187, 201 DiPasquale, D., 106 Dixit, A., 210 Djebbari, H., 174, 174n92 Domina, T., 272n63 Doolittle, F., 182, 182n101, 182n104, 189, 215 Drange, N., 281n70 Dubin, R. A., 93 Duggan, M., 3, 3n4, 5n5, 9, 37, 38, 42, 44, 45 Duncan, G. J., 235n1, 243 Earls, F., 118 Eberts, R., 132n7, 143n38, 151, 210n134 Eberwein, C., 192n115 Eckenrode, J., 245n31 Edin, K., 105n41 Elango, S., 253, 259, 260, 264, 266, 274n65 Ellen, I. G., 94, 97, 98, 108, 109, 110, 116 Elo, I. T., 243n25 Evans, G. W., 91 Evans, W. N., 39 Eyster, L., 187, 209 Fagnoni, C., 142n37 Falk, G., 59 Farran, D. C., 238n16, 278, 284n76 Feller, A., 237n13, 244n30, 271, 271n57, 271n59, 272, 272n61 Ferber, R., 181n99 Ferreira, F., 93 Fichtner, A., 144, 144n39, 148n42 Finkel, M., 73, 73n18, 90 Fischer, W., 61 Fisher, R., 173n91 Fisk, W. J., 90 Flores-Lagunes, A., 207n129 Ford, R., 180n98 Forslund, A., 194 Fraker, T., 49, 186 Franklin, G., 133 French, E., 7 Frieden, B., 68 Friedman, L., 66 Frölich, M., 187, 187n113, 201 Frost, R., 173n91 Gahvari, F., 86 Galdo, J., 188 Galiani, S., 110 Gallo, F., 138n18

Author Index Ganong, P., 102, 110 Garber, H. L., 245n31 Garces, E., 273 García, J. L., 250n35, 251, 253, 261, 264, 288 Garfinkel, I., 130n4 Garrett, B., 18, 43 Garthwaite, C. L., 39 Gayer, T., 286, 287 Gechter, M., 182n103 Gelbach, J., 174, 174n92 Gertler, P., 245n31, 253n38 Gilby, E., 11 Gilhousen, M. R., 245n31 Ginja, R., 273 Glaeser, E. L., 61 Glazerman, S., 191, 192, 207, 207n129 Glied, S., 18, 43 Gonzales, A., 207n129 Gordon, N., 9n10, 22, 30, 44, 45 Gormley, W. T., Jr., 274n65, 286, 286n77, 287 Goux, D., 91 Gray, S. W., 246, 256 Greenberg, D., 173n90, 190, 191, 193, 195, 202, 209 Gritz, M., 198n121 Gross, R. T., 250 Gruber, J., 283 Gueron, J., 184 Gyourko, J., 60, 61 Ham, J., 192n115 Hansen, K., 107, 176n94 Hanushek, E., 90, 91, 105 Harkness, J., 111, 113n48 Haveman, R., 190 Havnes, T., 281, 281n70, 282, 285 Haynes, C. W., 250 Hays, R. A., 67 Heald, L., 142n37 Heckman, J. J., 138n19, 174, 174n92, 176, 176n94, 178, 178n95, 179n97, 181, 181n99, 182, 182n104, 183, 183n105, 184, 184n107, 186, 186n110, 187n113, 188, 191, 194, 196, 197, 199, 200, 203, 210, 211, 212, 213, 236n3, 238n15, 240, 240n18, 241, 242, 242n20, 242n21, 242n22, 242n23, 242n24, 243, 244n26, 244n27, 244n28, 245n31, 246, 250n34, 252, 253, 253n38, 254, 256n39, 259, 259n42, 259n43, 260, 261, 262, 263, 264n44, 266, 271n60, 272n63

303

Heinberg, J., 193n116 Heinrich, C., 175, 196, 199, 200, 200n125, 201, 202, 202n126, 203, 204, 205, 210 Hemmeter, J., 11, 40 Henderson, C. R., 245n31 Hendra, R., 180n98 Hendren, N., 114 Hill, C. J., 215n136, 286n77 Hirano, K., 201 Hirsch, A. R., 67 Hirsch, W., 181n99 Hobbie, R., 128, 150n47 Hofer, K. G., 238n16, 278, 284n76 Hojman, A., 253, 253n38, 256, 257, 288 Hollenbeck, K., 190, 196, 199, 200n125, 202, 204, 205 Holupka, C. S., 111 Holzer, H., 132n7 Horn, K. M., 69, 74, 77, 106, 108, 109 Hotz, V. J., 175n93, 182, 182n103, 190 Hoynes, H., 174, 174n92, 272n63 Huang, W.-J., 143n38, 190, 210n134 Hubbard, R., 11 Huber, E., 187, 188, 201 Humphries, J. E., 252, 254 Hunt, D. B., 60, 63n5, 108, 108n46 Hurst, E., 34 Iacus, S., 187n111 Ichimura, H., 197 Imbens, G. W., 102n36, 174n92, 175n93, 182n103, 184, 201 Imberman, S., 45 Jacob, B. A., 92, 95, 102, 105, 107, 111, 113, 113n48, 115, 117, 118, 119 Jacobson, L., 179, 209 Johnson, B., 187n112 Johnson, G., 194n118 Johnson, T., 187n11, 198n121 Johnston, J., 137n16, 138n18 Joseph, M. L., 94 Kagitcibasi, C., 245n31 Kain, J. F., 91 Kapustin, M., 111, 113, 118 Kassabian, D., 188 Katz, L. F., 114 Kautz, T., 242n20, 242n23, 252, 254, 256n39 Keane, M. P., 288 Kearney, M., 3, 3n4, 9, 9n10, 22, 25, 30, 37, 38, 42, 44, 45

304

Author Index

Kemple, J., 182n104, 189, 215 Kerr, M. A., 258n40 Kesselman, J., 127n1, 131n5 Kessler, R. C., 114 King, C. T., 131n6, 139n24, 140n26, 142n34, 142n37, 144, 146, 147, 148, 148n42, 211 King, G., 187n111 Kitzman, H., 245n31 Klaus, R. A., 246, 256 Klerman, J., 175n93 Kleven, H. J., 32, 33 Kline, P., 237n13, 244n30, 271, 271n57, 271n59, 272, 272n61, 272n63, 274 Kling, J. R., 114 Kluve, J., 195 Knudsen, E. I., 235 Koenke, R., 174n92 Kopczuk, W., 32, 33 Kornfeld, R., 190 Kostol, A. R., 38 Kramer, L., 131n6 Krepcio, K., 196n120 Krolikowski, P., 179n96 Krueger, A., 182, 190, 194 Kubik, J. D., 43, 44 Kuperman, S., 246 Lakey, J., 180n98 LaLonde, R., 127n1, 128n2, 175, 176, 178, 178n95, 179, 181n99, 184, 186, 191, 192n115, 195, 196, 199, 209, 219 Lam, K., 73, 73n18 Lavigueur, S., 258n40 Lechner, M., 175, 186, 187, 187n113, 192, 201, 202, 216 Lee, K. O., 98 Lefgren, L., 113n48 Lei-Gomez, Q., 90 Lein, L., 105n41 Lennen, M. C., 86 Lens, M. C., 110 Leventhal, T., 90, 118 Levitan, S., 138n18 Liebman, J. B., 114 Linder, S., 46 Lipsey, M. W., 238n16, 256, 278, 284n76 Lise, J., 194 Lochner, L., 39, 194, 243 Løken, K. V., 96 Long, D., 207n130 Loprest, P., 38 Love, J. M., 266n48

Lower-Basch, E., 172n89 Lubell, J. M., 107 Ludwig, J., 63n5, 92, 95, 100, 102, 105, 107, 111, 113, 114, 117, 118, 267, 271, 273, 274 Maestas, N., 7, 47 Mallach, A., 102 Mallar, C., 187n113, 192, 207n130 Malpezzi, S., 106n43 Mangum, G., 133n11 Mani, A., 97 Mann, D. R., 48, 49 Marion, J., 106n43, 110, 115 Marschke, G., 210, 211 Mâsse, L. C., 258n40 Masse, L. N., 265n46, 274n65 Mathiowetz, N., 190 Maurin, E., 91 Mayer, S. E., 241, 243 Maynard, R., 186 Mayo, S. K., 104, 108 McCall, B., 194n117, 216, 218, 219 McConnell, S., 183n105, 190–91, 192, 207, 216n137 McCormick, M. C., 258n41 McCrary, J., 187, 201 McDonald, S.-K., 252 McKey, R. H., 267 McLanahan, S., 235n1, 236n2, 236n3 McMillan, R., 93 Meehan, E. J., 66 Meijers, H., 254 Mendell, M. J., 90 Meyer, B. D., 38 Michalopoulos, C., 191, 195, 202 Mikelson, K., 131 Miller, D. L., 267, 273, 274 Milligan, K., 283 Mills, G., 102, 102n37, 106, 106n42, 107, 109, 111, 113, 118 Mitchell, J. P., 65, 66 Moffitt, R. A., 212n135 Mogstad, M., 38, 96, 281, 282, 285 Moore, Q., 148n43, 216, 217 Mortimer, J., 182n103 Mosso, S., 240n18, 241, 242, 242n21, 242n24, 243 Mueller-Smith, M., 199n123 Muenchow, S., 246, 252 Mueser, P., 175, 196, 200n125, 201, 202, 202n126, 204

Author Index Mullen, K., 7, 47 Muller, S., 182n103 Murnane, R. J., 235n1, 243 Murphy, A., 110 Musgrave, P., 130 Musgrave, R., 130 Nagin, D. S., 258n40 Navarro, S., 186n110, 272n63 Neumann, T., 207n129 Neumark, D., 46 Newman, S., 90, 108, 111, 113n48, 118 Neyman, J., 173n91 Nichols, A. L., 32, 46, 87 Nightingale, D., 128n2, 131 Noll, S., 246n32 O’Flaherty, B., 97, 98 Olds, D. L., 238, 245n31 O’Leary, C., 131n6 Olsen, E. O., 63n5, 68, 73, 87, 88, 94, 100, 101, 104n39, 105, 105n40, 111 Onstott, M., 173n90 O’Regan, K. M., 69, 74, 77, 106, 110 Orr, L., 106, 114, 216 Painter, G., 98 Pan, J., 43, 258n40 Pantano, J., 110 Park, J., 217n138 Parsons, D., 33 Patel, N., 142n32 Pederson, J., 180n98 Percheski, C., 236n3 Perez-Johnson, I., 148n43, 216, 216n137, 217 Perry, C., 128, 131n6 Phillips, D. A., 271 Pinto, R., 250n34, 259, 259n42, 259n43, 261, 263 Pischke, J.-S., 184 Plesca, M., 179, 201, 215, 216 Porro, G., 187n111 Powers, E. T., 35, 46 Puhani, P., 187n112 Puma, M., 209, 237n12, 244n29, 267n49, 268–69, 268n52, 269n54, 270, 271, 272 Putnam, R. D., 236n4 Quandt, R., 173n91 Quigley, J. M., 61, 101

Radin, B., 211 Raine, A., 245n31 Ralson, H., 184 Ramey, C. T., 246 Ramsey, B. K., 246, 256 Raphael, S., 61, 101 Raudenbush, S. W., 118 Rawlins, L., 137 Raymond, J., 213, 214 Reiger, A. J., 104 Ressler, S., 35 Reynolds, A. J., 238, 266n48 Riccio, J., 112, 215n136 Ricciuti, A. E., 245n31 Ridder, G., 201 Riley, G. F., 17, 18 Ripley, R., 133 Rivkin, S. G., 91 Robb, R., 176 Robins, P., 191, 193, 195, 202, 209 Romano, J. P., 253n38 Rosen, E., 94 Rosen, H. S., 63, 104 Rosenbaum, J. E., 103n38 Rosholm, M., 180n98, 219 Roy, A. D., 173n91 Rubin, D. B., 102n36, 173n91, 184 Rubinowitz, L. S., 103n38 Rupp, K., 17, 18, 35, 40, 45 Rutledge, M., 41 Salzman, J., 141 Sampson, R. J., 118 Sanbonmatsu, L., 103, 106, 108, 114, 115n52 Sandsør, M. J., 281n70 Santillano, R., 148n43, 216, 217 Sard, B., 61 Saucier, J.-F., 258n40 Savelyev, P. A., 259, 259n43, 261 Savner, S., 142n32 Schanzenbach, D. W., 238n14, 283, 285, 286 Schennach, S. M., 242, 242n22 Schill, M. H., 66, 71 Schmidt, L., 43, 45 Schnare, A. B., 108 Schneider, B., 252 Schochet, P., 183n105, 190, 191, 207, 207n129, 210 Scholz, K., 190 Schwartz, A. E., 108, 109

305

306

Author Index

Schwartz, A. F., 67, 70 Seitz, S., 194 Sevak, P., 43 Shaikh, A. M., 253n38 Shea, J., 253, 288 Sheshinski, E., 32, 33, 47 Shroder, M. D., 104, 107, 110, 111, 173n90 Sianesi, B., 181 Sinai, T., 60 Singhal, M., 106 Skinner, J., 11 Slocum, L. A., 187n112 Smith, A. N., 267 Smith, J., 174, 174n92, 176, 178, 178n95, 179n97, 181, 181n99, 182, 182n104, 184, 184n107, 186, 188, 189, 190, 191, 194, 194n117, 196, 200, 201, 210, 211, 212, 213, 215, 216, 218, 219, 221 Sommer, B. A., 245n31 Sommer, R., 245n31 Song, J., 7 Spaulding, S., 150n48 Spence, L. H., 71 Spiker, D., 250 Stabile, M., 35 Stapleton, D.C., 12, 45, 188 Steffen, B., 107 St. Pierre, R. G., 245n31 Straits, R., 131n6 Strand, A., 7, 7n8, 47 Sullivan, D., 179, 209 Sullivan, J. X., 38 Susin, S., 111 Svarer, M., 180n98 Taber, C., 182n104, 184n107, 194, 212 Tartari, M., 272n63 Temple, J. A., 266n48 Ter Weel, B., 254 Thomas, D., 273, 274 Thompkins, A., 48, 49 Thornton, C., 207n130 Todd, P., 184, 186, 197 Torcasso, J. C., 253n38 Traeger, L., 182, 182n101 Tremblay, R. E., 258n40 Trent, J., 246n32 Troske, K., 200n125, 204 Trutko, J., 128, 138n21, 142n33, 156n54, 162, 162n65, 193n116

Vale, L. J., 66, 70, 71, 96 Vandell, K., 106 Van Horn, C., 144, 144n39, 148n42, 196n120 Verma, N., 112 Vivalt, E., 182n103 Von Hoffman, A., 85, 90 Vytlacil, E., 187n113, 244n26, 271n60 Wahba, S., 186 Waldfogel, J., 237n13, 244n30, 256, 272 Wallace, G., 190 Wallace, J., 182n104, 189, 215 Walters, C., 237n13, 244n30, 271, 271n57, 271n59, 272, 272n61, 274 Wandner, S., 131n6, 151, 196n120 Weber, A., 195 Weikart, D. P., 246, 256 Weiland, C., 286, 287 Wen, P., 9 Wenchao J., 62 West, R., 187n111 Whalley, A., 189 White, J. L., 258 White, M., 180n98 Wiehler, S., 192 Wilson, J., 211 Wiseman, M., 5, 9 Wiswall, M., 96 Wittenburg, D., 9, 39, 48, 48n32, 49 Wixon, B., 7n8 Wolf, M., 253n38 Wolpin, K., 184 Woodbury, S., 194 Wright,G., 66 Wu, A. Y., 46n31 Wunsch, C., 175, 186, 187, 194n117, 201, 202, 216, 218, 218n139, 219 Yelowitz, A., 47, 48, 101, 102n36, 105, 107n44, 112 Yoshikawa, H., 286, 287 Zeckhauser, R. J., 32, 87 Zeldes, S. P., 11 Zhai, F. H., 237n13, 244n30, 256, 272 Zigler, E., 246, 252 Ziliak, J. P., 34, 209

Subject Index

Note: Page numbers followed by “f ” or “t” refer to figures or tables, respectively. ABC. See Carolina Abecedarian Project (ABC) Aid to Families with Dependent Children (AFDC), SSI program and, 43–44 American Job Centers (AJCs), 140–41 Area Redevelopment Act (ARA) of 1961, 132 Assistance, housing. See Housing assistance, low-income Boston, universal preschool program in, 286 Boys, Supplemental Security Income and, 42–43 Carolina Abecedarian Project (ABC), 245, 246–48t, 264f; cost-benefit analysis of, 264–66, 285t; decomposition of treatment effects of, on male adult outcomes, 263f; decomposition of treatment effects of, on male and female (pooled) adult incomes, 261; differences in treatments by gender and, 258; health outcomes and, 261; life-cycle outcomes of, 260t; long-term outcomes of, 259; overview of, 246–51; treatment effects on early-life skills for females of, 255t; treatment effects on early-life skills for males of, 255t. See also Early childhood programs

Community Development Block Grant program, 64n8 Comprehensive Employment and Training Act (CETA) of 1973, 135–37, 138 Concentrated Employment Program, 134t Continuing disability reviews (CDRs), SSI and, 10–11 Disability Benefits Reform Act (DBRA), 8–9 Disability determination, SSI: for adults, 6–8; for children, 8–9 Early childhood demonstration programs: characteristics of, 245–51; connecting short-term and long-term effects of, 259–62; effects of, on IQ, achievement test scores, and conscientiousness, 253– 56, 254t, 255t; evidence from largescale, 278–79; fadeout effects of, for cognitive skills, 256–58; long-term outcomes of, 259; possible evidence limitations of, 252–53; summary table of, 246–48t Early childhood programs, 235–40, 289; arguments for subsidizing, 242–43; common features in study of, 245–46; economic case for universal, 238; federal funding streams for, 280t; formation of skills over life cycle model for,

307

308

Subject Index

Early childhood programs (continued) 240–42, 241f; framework for interpreting evidence of, 240–44; importance of quality and, 287–88; local universal, in US, 284–87; in Norway, 281–82; policy evaluation questions for, 243– 44; in Quebec, 282–83; summary and basic features, 239t; universal, 281–87, 289. See also Carolina Abecedarian Project (ABC); Early Training Program (ETP); Head Start; Infant Health and Development Program (IHDP); Perry Preschool Project (PPP) Early Head Start, 268 Early Training Program (ETP), 245; fadeout effects of, for cognitive skills, 256–58; overview of, 246–51; summary table of, 246–48t; treatment effects on early-life skills for females of, 255t; treatment effects on early-life skills for males of, 255t. See also Early childhood programs Earned Income Tax Credit (EITC), 3 Economic Dislocation and Worker Adjustment Assistance Act (EDWAAA) of 1988, 138 Economic Opportunity Act of 1964, 134t EHAP. See Experimental Housing Allowance Program (EHAP) Eligible training provider (ETP) lists, 143–44 Employment and training programs, 127– 30; adult education basic grants to states and, 172; Area Redevelopment Act of 1961, 132; categories of, 128; characteristics of participants in, 155– 63; Comprehensive Employment and Training Act of 1973, 135–37, 138; cost-benefit analysis issues for, 191– 93; current funding levels for, 163–70, 164t; data and measurement issues for, 188–91; economic justifications for government involvement in, 130–31; evaluation issues for, 172–88; expenditures and enrollments of, 151–55, 152–53t; general equilibrium effects of, 193–95; history of, in US, 131–32; Job Training Partnership Act of 1982, 135, 137–39, 138n18; major (1963–1973), 134t; Manpower Development and Training Act of 1962, 132–33, 134t; matching participants to services issues of, 214–19; meta-analyses of evalua-

tions of, 195–96; participation issues of, 211–14; Pell grants and, 170–71, 171n87; performance management issues of, 209–11; review of research on impacts of, 196–209; Supplemental Nutrition Assistance Program Employment and Training (SNAP E&T), 172; Temporary Assistance for Needy Families (TANF) and, 71; Wagner-Peyser Act of 1933, 132, 166; Workforce Innovation and Opportunity Act of 2014, 135, 148–50; Workforce Investment Act of 1998, 139–48, 139n23 ES (Employment Service). See WagnerPeyser Employment Service (ES) ETP. See Early Training Program (ETP) Experimental Housing Allowance Program (EHAP), 104–5; review of effects of, on access to different neighborhoods, 108–10 Externalities: from housing consumption, 90–92; neighborhood, and neighborhood access, 92–95 Fair market rents (FMRs), 64n7, 73 Federal benefit rate (FBR), 11–12 Georgia, universal child care in, 284–87 Head Start, 252, 266–67, 289; comparability with demonstration and universal programs of, 268–69; cost-benefit analyses of, 274; data sources for evidence on, 269–70; long-term outcomes of, 273– 74; overview of, 267–69; short-term outcomes of, 270–72; summary of evidence from, 275, 276–77t. See also Early childhood programs Help through Industry Retraining and Employment (HIRE), 136 Homeless Veterans Reintegration Program (HVRP), current funding for, 169–70 HOME program, 64n8 Housing Act of 1937, 60, 70 Housing Act of 1949, 92–93 Housing affordability, 61–62; review of effects of means-tested housing programs on, 105–6 Housing assistance, low-income: characteristics of households served, 75–79, 76–78t; eligible number of households for, 80–83; federal spending on, 83, 84t; income and substitution effects

Subject Index of, 91–92, 92f; income volatility and, 97; “need” concept and, 81–82; overlap of, with other subsidy programs, 79–80; targeting of, 98–99; trends, 74– 75, 75f. See also Housing subsidies, low-income Housing consumption: externality issues for, 90–92; labor supply and, 91; productivity and, 91 Housing programs: areas of future research for, 117–20; categories of, 64; effects of, on access to different neighborhoods, 108–10; federal spending on, 83, 84t; history of, 64–70; income eligibility rules for, 70–72, 72t; intellectual justification for, 116–17; neighborhood conditions and, 92–95; proliferation of means-tested programs for, 60–61; public concern about conditions of, 69; rent requirements of, 72–74, 72t; review of indirect effects of, 111–16; tax subsidies for, 59–60. See also LowIncome Housing Tax Credit (LIHTC) program Housing programs, means-tested, 61; children’s outcomes and, 112–13; income volatility and, 97; labor supply and, 91, 112; motivations for, 61–63; narrow distribution of resources of, 95– 98; productivity and, 91; residential mobility and, 106–8; review of effects of, on access to different neighborhoods and, 108–10; review of effects of, on housing affordability, 105–6; review of effects of, on housing quality, 100– 105; review of effects of, on residential mobility, 106–8; targeting of, 98–99; work effort as and, 111–12. See also Low-Income Housing Tax Credit (LIHTC) program Housing quality: project-based vs. tenantbased subsidies and, 103–4; review of effects of means-tested housing programs on, 105–6 Housing subsidies, low-income: justification issues for, 85–90; overlap of, with other transfer programs, 79–80. See also Housing assistance, low income Housing vouchers, 94–95, 97; children’s outcomes and, 113–15; history of, 70; housing quality and, 102–3; projectbased subsidies vs., 89–90, 89f How the Other Half Lives (Riis), 60

309

HVRP (Homeless Veterans Reintegration Program), current funding for, 169–70 IFAs (individual functional assessments), 8–9 Income eligibility, for housing programs, 70–72, 72t Income volatility, housing assistance programs and, 97 Indian and Native American Program, current funding for, 169 Individual functional assessments (IFAs), 8–9 Individual Training Accounts (ITAs), 139, 141, 142, 142n32 Infant Health and Development Program (IHDP), 245; overview of, 246–51; summary table of, 246–48t; treatment effect heterogeneity by socioeconomic status and, 258–59; treatment effects on early-life skills for females of, 255t; treatment effects on early-life skills for males of, 255t. See also Early childhood programs In-kind housing assistance, 62 IQ: dissipation of initial gains of, 259, 260t; effects of early childhood demonstration programs on, 253–56; treatment effects on, 254t, 255t ITAs (Individual Training Accounts), 245 Job Corps program, 134t, 212; review of research on impacts of, 206–8 Job Opportunities in the Business Sector program, 134t Job Training for Employment in High Growth Industries Grants, current funding for, 167–68 Job Training Partnership Act (JTPA) of 1982, 135, 137–39; literature reviews of, 138n18 Labor supply: housing consumption and, 91, 92f; means-tested housing programs and, 91–92 Lawfully admitted permanent residents (LAPRs), 15 Low-income housing subsidies. See Housing subsidies, low-income Low-Income Housing Tax Credit (LIHTC) program, 59, 117; administration of, 69; creation of, 68–69; effects of, on housing affordability, 106; effects of,

310

Subject Index

Low-Income Housing Tax Credit (LIHTC) program (continued) on neighborhoods, 115–16; funding sources for, 69; growth in, 60–61. See also Housing programs; Housing programs, means-tested Manpower Development and Training Act (MDTA) of 1962, 132–33, 134t Means-testing housing programs. See Housing programs, means-tested Medicaid, 3; SSI program and, 44–45 Merit goods, 130 Migrant and Seasonal Farmworker Program, current funding for, 168 Mobility. See Residential mobility Moving to Opportunity (MTO) study, 63– 64, 64n6, 102–3 Moving to Work (MTW) demonstration program, 73, 117 Negative Income Tax experiment, 105 Neighborhood conditions, housing programs and, 92–95 Neighborhoods, access to different, and review of effects of housing programs on, 108–10 Norway, universal child care in, 281–82 Oklahoma, universal child care in, 284–87 One-Stop Career Centers, 140–41 Operation Mainstream, 134t Pell grants, employment and training programs and, 170–71, 171n87 Perry Preschool Project (PPP), 245; costbenefit analysis of, 264–66, 285t; decomposition of treatment effects of, on male adult outcomes, 261f, 263f; differences in treatments by gender and, 258; fadeout effects of, for cognitive skills, 257f; health outcomes and, 261; life-cycle outcomes of, 260t; long-term outcomes of, 259; overview of, 246–51; summary table of, 246–48t; treatment effects on early-life skills for females of, 255t; treatment effects on early-life skills for males of, 255t. See also Early childhood programs Personal Responsibility and Work Opportunity Reconciliation Act (PRWORA) of 1965, 15

Private industry councils (PICs), 138 Privately owned subsidized housing, history of, 67–69 Private Sector Initiative Program (PSIP), 136 Productivity: housing consumption and, 91; means-tested housing programs and, 91–92 Project NetWork, 48 PRWORA (Personal Responsibility and Work Opportunity Reconciliation Act), 15 PSIP (Private Sector Initiative Program), 136 Public Employment Program, 134t Public housing: history of, 65–67; studies of conditions of families in and outside of, 101–2 Public housing authorities (PHAs), 70–71, 73n19, 73n21, 98, 108n46 Public housing demolitions, children’s outcomes and, 115 Public service employment (PSE) positions, 136–37, 136n14 Qualified allocation plans (QAPs), 69 Quality Housing and Work Responsibility Act of 1998, 71, 73 Quebec, universal child care in, 282–83 Reintegration of Ex-Offenders (RExO) program, current funding for, 168 Rent requirements, for housing programs, 72–74, 72t Residential mobility: review of effects of means-tested housing programs on, 106–8; welfare-to-work experiment and, 107 Resources, subsidy, concentrating vs. dispersing, 95–98 Riis, Jacob, 60 Senior Community Service Employment Program (SCSEP), current funding level for, 166 Skill Training Improvement Program (STIP), 136 Small Area Fair Markets demonstration, 110 Smith Hughes Act of 1917, 134t SNAP (Supplemental Nutritional Assistance Program), 3

Subject Index SNAP E&T (Supplemental Nutritional Assistance Program Employment and Training), 172 Social cost of slums, 63 Social Security Administration (SSA), 5 SSI. See Supplemental Security Income (SSI) State Partnership Initiative, 48 STIP (Skill Training Improvement Program), 136 Subsidized housing, privately owned, history of, 67–69 Subsidy resources, concentrating vs. dispersing, 95–98 Sullivan v. Zelby, 8, 9, 11, 37, 43 Supplemental Nutritional Assistance Program (SNAP), 3 Supplemental Nutritional Assistance Program Employment and Training (SNAP E&T), 172 Supplemental Security Income (SSI): administration of, 5; Aid to Families with Dependent Children and, 43–44; antipoverty safety net studies of, 4; benefits and costs to participants of, 35–36; benefits awarded by age categories, 22t; benefits to disabled children and, 5–6; boys and, 42–43; caseload trends of, 18–22, 19f; categorical eligibility requirements of, 32–34; citizenship and residency requirements of, 15; continuing disability reviews and, 10–11; defined, 1; defining features of, 3–4; disability determination for adults and, 6–8; disability determination for children and, 8–9; enrollment in other government programs and, 30–31, 31t; evaluations of demonstration programs for increasing work among beneficiaries of, 48–49; evidence on interactions with Aid to Families with Dependent Children and, 43–44; evidence on interactions with Medicaid and, 44–45; evidence on interactions with TANF and, 43–44; evidence on working-age and elderly adult participation in, 45–48; geographic variation of enrollment in, 26–30, 27–29f; impact of child participation on long-term outcomes, 39–42; impact of child participation on short-term outcomes, 37–39; interactions with other govern-

311

ment programs and, 17–18; low-income children with disabilities and, 3, 5–6; means testing and, 11–15; origins, 5; percentage diagnosed with disabilities and, 23f; program interactions and, 43– 45; program spillovers of, 36; qualifying diagnoses for, 23–26, 24t; as safety net, 2–3; shifting age distribution of recipients of, 6; spending on, 1–2; state supplementation of, 15–17, 16t; TANF vs., 3; theoretical issues of, 32; trends in participants of, 20–22, 20f; work and savings disincentives of, 34 Tagging, 4, 32 T-ATET, 197–99, 200 Tax credit rents, 73–74 Temporary Assistance for Needy Families (TANF), 3; SSI program and, 3, 43–44 Temporary Assistance for Needy Families (TANF), employment and training programs and, 171 Tenant-based subsidies, 94 Tennessee Voluntary Pre-Kindergarten Program (TN-VPK), 275–78 Ticket to Work (TTW) program, 48 Trade Adjustment Assistance (TAA): current level of funding for, 166–67; review of research on impacts of, 208–9 Training programs. See Employment and training programs United States (US): history of employment and training programs in, 131–32; local universal child care in, 284–87 Universal child care programs, 289; in Boston, 286; in Norway, 281–82; in Quebec, 282–83; summary of evidence from, 287; in US, 284–87. See also Early childhood demonstration programs; Early childhood programs Veterans’ Employment and Training Service (VETS), current funding level for, 167 Vocational education, 128n3 Vouchers. See Housing vouchers Wagner-Peyser Act of 1933, 132, 166 Wagner-Peyser Employment Service (ES), 198; current funding level for, 166 W-ATET, 196–98, 200

312

Subject Index

Welfare to Work (WtW) experiment, 105–6; residential mobility and, 107 WIA Adult and Dislocated Worker programs, current funding level for, 165 WIA Youth program, current funding level for, 165 Workforce Innovation and Opportunity Act (WIOA) of 2014, 135, 148–50 Workforce Investment Act (WIA) of 1998, 135, 139–48, 139n23; accountability goal of, 142; administrative structure of, 147; characteristics of recent exiters of, 155–59, 156t; eligibility requirement changes of, 144; evolving service orientation of, 147–48; guiding principles of, 139–40; individual empowerment feature of, 142; market mechanisms and, 148; outcomes for exiters of, 159– 63, 161t; performance measures of, 142–43; required levels of services of,

141; review of research on impacts of, 196–206; services received by exiters of, 159, 160t; studies of implementation of, 144–47; universal access feature of, 142; use of eligible training provider lists, 143–44 Work Incentive Program, 134t Young Adult Conservation Corps (YACC), 136 YouthBuild program, current funding for, 169 Youth Community Conservation and Improvement Project (YCCIP), 136 Youth Employment and Demonstration Projects Act (YEDPA) of 1977, 135–36 Youth Employment and Training Program (YETP), 136 Youth Transition Demonstration (YTD) project, 48–49