Producer Dynamics: New Evidence from Micro Data 9780226172576

The Census Bureau has recently begun releasing official statistics that measure the movements of firms in and out of bus

181 62 8MB

English Pages 624 [613] Year 2009

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Spatial Polariton Dynamics In Optical Micro-Cavities [1.0 ed.]

284 87 5MB Read more

Becoming a Film Producer 9781501159459

692 141 1MB Read more

Compensation Policies within Firms : Evidence from Linked Employer-employee Data 9781846638350, 9781846638343

Within a decade, use of linked employer-employee data has enabled striking progress in our understanding of the function

142 30 1MB Read more

Data-Driven Decision Making in Fragile Contexts : Evidence from Sudan [1 ed.] 9781464810657, 9781464810640

Data deficiencies contribute to state fragility and exacerbate fragile states’ already limited capacity to provide basic

155 62 17MB Read more

Секреты эксперта. Proshow Producer 9785448323331

638 67 5MB Read more

Who Benefits From Capital Account Liberalization? Evidence From Firm-Level Credit Ratings Data [1 ed.] 9781452759142, 9781451873573

We provide new firm-level evidence on the effects of capital account liberalization. Based on corporate foreign-currency

140 96 987KB Read more

Higher-Order Evidence : new essays. 9780198829775, 0198829779

We often have reason to doubt our own ability to form rational beliefs, or to doubt that some particular belief of ours

658 39 3MB Read more

A Micro-Sociology of Violence: Deciphering Patterns and Dynamics of Collective Violence 1317977955, 9781317977957

This book aims at a deeper understanding of social processes, dynamics and institutions shaping collective violence. It

228 100 695KB Read more

Being Feared: The Micro-Dynamics of Fear and Insecurity [1st ed.] 9783030615444, 9783030615451

This book presents an alternative approach to understanding fear and crime by examining those who are feared or who caus

297 111 2MB Read more

Who Saves in Ireland? The Micro Evidence [1 ed.] 9781451996340, 9781451863918

131 16 190KB Read more

Producer Dynamics: New Evidence from Micro Data
9780226172576

Author / Uploaded
Timothy Dunne (editor)
J. Bradford Jensen (editor)
Mark J. Roberts (editor)

Citation preview

Producer Dynamics

Studies in Income and Wealth Volume 68

National Bureau of Economic Research Conference on Research in Income and Wealth

Producer Dynamics New Evidence from Micro Data

Edited by

Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts

The University of Chicago Press Chicago and London

TIMOTHY DUNNE is a senior economic advisor in the Research Department at the Federal Reserve Bank of Cleveland. J. BRADFORD JENSEN is a senior fellow at the Peterson Institute for International Economics and an associate professor at the McDonough School of Business at Georgetown University and a research associate of the NBER. MARK J. ROBERTS is professor of economics at Pennsylvania State University and a research associate of the NBER.

The University of Chicago Press, Chicago 60637 The University of Chicago Press, Ltd., London © 2009 by the National Bureau of Economic Research All rights reserved. Published 2009 Printed in the United States of America 18 17 16 15 14 13 12 11 10 09 1 2 3 4 5 ISBN-13: 978-0-226-17256-9 (cloth) ISBN-10: 0-226-17256-2 (cloth)

Library of Congress Cataloging-in-Publication Data Producer dynamics : new evidence from micro data / edited by Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts p. cm. — (Studies in income and wealth ; v. 68) Includes index. “This volume contains revised versions of most of the papers presented at the Conference on Research in Income and Wealth entitled ‘Producer Dynamics: New Evidence from Micro Data,’ held in Bethesda, Maryland, on April 8–9, 2005”—Preface. ISBN-13: 978-0-226-17256-9 (cloth : alk. paper) ISBN-10: 0-226-17256-2 (cloth : alk. paper) 1. Commerce— Econometric models—Congresses. 2. Industrial productivity— Econometric models—Congresses. I. Dunne, Timothy. II. Jensen, J. Bradford. III. Roberts, Mark J. HF1008.P77 2009 338.501'5195—dc22 2008009629

o The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences— Permanence of Paper for Printed Library Materials, ANSI Z39.48-1992.

National Bureau of Economic Research Officers Elizabeth E. Bailey, chairman John S. Clarkeson, vice-chairman Martin Feldstein, president and chief executive officer Susan Colligan, vice president for administration and budget and corporate secretary

Robert Mednick, treasurer Kelly Horak, controller and assistant corporate secretary Gerardine Johnson, assistant corporate secretary

Directors at Large Peter C. Aldrich Elizabeth E. Bailey Richard B. Berner John H. Biggs John S. Clarkeson Don R. Conlan Kathleen B. Cooper Charles H. Dallara

George C. Eads Jessica P. Einhorn Martin Feldstein Roger W. Ferguson, Jr. Jacob A. Frenkel Judith M. Gueron Robert S. Hamada Karen N. Horn

John Lipsky Laurence H. Meyer Michael H. Moskow Alicia H. Munnell Rudolph A. Oswald Robert T. Parry Marina v. N. Whitman Martin B. Zimmerman

Directors by University Appointment George Akerlof, California, Berkeley Jagdish Bhagwati, Columbia Glen G. Cain, Wisconsin Ray C. Fair, Yale Franklin Fisher, Massachusetts Institute of Technology Mark Grinblatt, California, Los Angeles Saul H. Hymans, Michigan Marjorie B. McElroy, Duke

Joel Mokyr, Northwestern Andrew Postlewaite, Pennsylvania Uwe E. Reinhardt, Princeton Nathan Rosenberg, Stanford Craig Swan, Minnesota David B. Yoffie, Harvard Arnold Zellner (Director Emeritus), Chicago

Directors by Appointment of Other Organizations Jean-Paul Chavas, American Agricultural Economics Association Gail D. Fosler, The Conference Board Martin Gruber, American Finance Association Timothy W. Guinnane, Economic History Association Arthur B. Kennickell, American Statistical Association Thea Lee, American Federation of Labor and Congress of Industrial Organizations

William W. Lewis, Committee for Economic Development Robert Mednick, American Institute of Certified Public Accountants Angelo Melino, Canadian Economics Association Harvey Rosenblum, National Association for Business Economics John J. Siegfried, American Economic Association

Directors Emeriti Andrew Brimmer Carl F. Christ George Hatsopoulos Lawrence R. Klein

Franklin A. Lindsay Paul W. McCracken Peter G. Peterson Richard N. Rosett

Eli Shapiro Arnold Zellner

Relation of the Directors to the Work and Publications of the National Bureau of Economic Research 1. The object of the NBER is to ascertain and present to the economics profession, and to the public more generally, important economic facts and their interpretation in a scientific manner without policy recommendations. The Board of Directors is charged with the responsibility of ensuring that the work of the NBER is carried on in strict conformity with this object. 2. The President shall establish an internal review process to ensure that book manuscripts proposed for publication DO NOT contain policy recommendations. This shall apply both to the proceedings of conferences and to manuscripts by a single author or by one or more coauthors but shall not apply to authors of comments at NBER conferences who are not NBER affiliates. 3. No book manuscript reporting research shall be published by the NBER until the President has sent to each member of the Board a notice that a manuscript is recommended for publication and that in the President’s opinion it is suitable for publication in accordance with the above principles of the NBER. Such notification will include a table of contents and an abstract or summary of the manuscript’s content, a list of contributors if applicable, and a response form for use by Directors who desire a copy of the manuscript for review. Each manuscript shall contain a summary drawing attention to the nature and treatment of the problem studied and the main conclusions reached. 4. No volume shall be published until forty-five days have elapsed from the above notification of intention to publish it. During this period a copy shall be sent to any Director requesting it, and if any Director objects to publication on the grounds that the manuscript contains policy recommendations, the objection will be presented to the author(s) or editor(s). In case of dispute, all members of the Board shall be notified, and the President shall appoint an ad hoc committee of the Board to decide the matter; thirty days additional shall be granted for this purpose. 5. The President shall present annually to the Board a report describing the internal manuscript review process, any objections made by Directors before publication or by anyone after publication, any disputes about such matters, and how they were handled. 6. Publications of the NBER issued for informational purposes concerning the work of the Bureau, or issued to inform the public of the activities at the Bureau, including but not limited to the NBER Digest and Reporter, shall be consistent with the object stated in paragraph 1. They shall contain a specific disclaimer noting that they have not passed through the review procedures required in this resolution. The Executive Committee of the Board is charged with the review of all such publications from time to time. 7. NBER working papers and manuscripts distributed on the Bureau’s web site are not deemed to be publications for the purpose of this resolution, but they shall be consistent with the object stated in paragraph 1. Working papers shall contain a specific disclaimer noting that they have not passed through the review procedures required in this resolution. The NBER’s web site shall contain a similar disclaimer. The President shall establish an internal review process to ensure that the working papers and the web site do not contain policy recommendations, and shall report annually to the Board on this process and any concerns raised in connection with it. 8. Unless otherwise determined by the Board or exempted by the terms of paragraphs 6 and 7, a copy of this resolution shall be printed in each NBER publication as described in paragraph 2 above.

Contents

Prefatory Note

xi

Introduction Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts

1

I. CROSS-COUNTRY COMPARISON OF PRODUCER DYNAMICS 1. Measuring and Analyzing Cross-Country Differences in Firm Dynamics Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta Comment: Timothy Dunne

15

II. EMPLOYMENT DYNAMICS 2. Studying the Labor Market with the Job Openings and Labor Turnover Survey R. Jason Faberman 3. What Can We Learn About Firm Recruitment from the Job Openings and Labor Turnover Survey? Éva Nagypál 4. Business Employment Dynamics Richard L. Clayton and James R. Spletzer

83

109 125

vii

viii

Contents

5. The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators John M. Abowd, Bryce E. Stephens, Lars Vilhuber, Fredrik Andersson, Kevin L. McKinney, Marc Roemer, and Simon Woodcock Comment: Katharine G. Abraham

149

III. SECTOR STUDIES OF PRODUCER TURNOVER 6. The Role of Retail Chains: National, Regional, and Industry Results Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda Comment: Jeffrey R. Campbell

237

7. Entry, Exit, and Labor Productivity in U.K. Retailing: Evidence from Micro Data Jonathan Haskel and Raffaella Sadun

271

8. The Dynamics of Market Structure and Market Size in Two Health Services Industries Timothy Dunne, Shawn D. Klimek, Mark J. Roberts, and Daniel Yi Xu

303

9. Measuring the Dynamics of Young and Small Businesses: Integrating the Employer and Nonemployer Universes Steven J. Davis, John Haltiwanger, Ronald S. Jarmin, C. J. Krizan, Javier Miranda, Alfred Nucci, and Kristin Sandusky Comment: Thomas J. Holmes 10. Producer Dynamics in Agriculture: Empirical Evidence Mary Clare Ahearn, Penni Korb, and Jet Yee Comment: Spiro E. Stefanou

329

369

IV. EMPLOYER-EMPLOYEE DYNAMICS 11. Ownership Change, Productivity, and Human Capital: New Evidence from Matched Employer-Employee Data Donald S. Siegel, Kenneth L. Simons, and Tomas Lindstrom Comment: Judith K. Hellerstein

397

Contents

12. The Link between Human Capital, Mass Layoffs, and Firm Deaths John M. Abowd, Kevin L. McKinney, and Lars Vilhuber 13. The Role of Fringe Benefits in Employer and Workforce Dynamics Anja Decressin, Tomeka Hill, Kristin McCue, and Martha Stinson Comment: Dan A. Black

ix

447

473

V. PRODUCER DYNAMICS IN INTERNATIONAL MARKETS 14. Importers, Exporters, and Multinationals: A Portrait of Firms in the U.S. that Trade Goods Andrew B. Bernard, J. Bradford Jensen, and Peter K. Schott Comment: James Harrigan 15. The Impact of Trade on Plant Scale, Production-Run Length, and Diversification John Baldwin and Wulong Gu Comment: James Tybout Contributors Author Index Subject Index

513

557

597 601 607

Prefatory Note

This volume contains revised versions of most of the papers presented at the Conference on Research in Income and Wealth entitled “Producer Dynamics: New Evidence from Micro Data,” held in Bethesda, Maryland, on April 8–9, 2005. Funds for the Conference on Research in Income and Wealth are supplied by the Bureau of Economic Analysis, the Bureau of Labor Statistics, the Census Bureau, the Federal Reserve Board, Statistics of Income/IRS, and Statistics Canada. We are indebted to these organizations for their support. We thank Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts, who served as conference organizers and editors of the volume. We also thank the staff of the NBER for their assistance in organizing the conference and preparing this volume. Executive Committee, January 2007 John M. Abowd Susanto Basu Ernst R. Berndt Carol A. Corrado Robert C. Feenstra John Greenlees John C. Haltiwanger Michael J. Harper Charles R. Hulten, chair

Ronald Jarmin J. Bradford Jensen Lawrence F. Katz J. Steven Landefeld Brent R. Moulton Thomas B. Petska Mark J. Roberts Matthew Shapiro David W. Wilcox

xi

Introduction Producer Dynamics Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts

The process of firm entry, growth, and exit has always been an integral part of the mechanism of resource reallocation in a market economy. Spurred by developments in micro data construction by government statistical agencies and access to these data by researchers, the empirical analysis of producer dynamics has become a major focus of economic research over the last fifteen years. The crucial input that has made the empirical study of producer dynamics possible is comprehensive longitudinal micro data that allow researchers to track new firms over their lifetimes. Using these data for a large number of countries, researchers have identified links between the characteristics of firms and their subsequent success or failure that provide a better understanding of the sources of firm and worker dynamics and their implications for the long-run growth and performance of a market economy. In recognition of its importance to public policymaking, the primary U.S. statistical agencies—the Census Bureau and the Bureau of Labor Statistics (BLS)—have recently begun to produce official statistics that measure the dynamic movements of firms in and out of business and workers in and out of jobs. The development of new data resources and empirical facts on producer dynamics has impacted many research fields in economics including industrial organization, labor, growth, macro, and international trade. Since the initial measurement studies of Dunne, Roberts, and Samuelson (1988, 1989), the longitudinal data sets have been exploited by industrial organiTimothy Dunne is a senior economic advisor in the Research Department of the Federal Reserve Bank of Cleveland. J. Bradford Jensen is an associate professor in the McDonough School of Business, Georgetown University, and a research associate of the National Bureau of Economic Research. Mark J. Roberts is a professor of economics at Pennsylvania State University, and a research associate of the National Bureau of Economic Research.

1

2

Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts

zation economists to study the competitive effects of producer turnover and why the process differs across industries and time periods. Building on the work of Davis, Haltiwanger, and Schuh (1996), an enormous literature, both empirical and theoretical, has developed in labor and macro economics to measure and explain the gross employment flows due to job creation and job destruction by firms. Bailey, Hulten, and Campbell (1992) and Griliches and Regev (1995) show how sectoral and industry productivity gains can be traced back to productivity differences that exist at the micro level combined with the exit of low productivity firms and the entry and growth of higher productivity firms. The intertemporal pattern of lumpy plant-level investment present in the micro data (Doms and Dunne 1998) have been analyzed by macro economists as a source of aggregate investment fluctuations (Caballero, Engel, and Haltiwanger 1995). International economists have also studied how trade flows are shaped by both the growth of existing exporters and importers and the flows of producers in and out of international markets (Bernard and Jensen 1995; Bernard, Eaton, Jensen, and Kortum 2003; Das, Roberts and Tybout 2007). None of these lines of research could have developed without the use of firm- and plant-level longitudinal surveys and censuses conducted by government statistical agencies. This volume is the result of a two-day conference in April 2005 devoted to the measurement and explanation of producer dynamics. The meeting was sponsored by the Conference on Research in Income and Wealth (CRIW) and had as its primary goal, as do all CRIW conferences, encouraging interaction between the statistical agencies that are developing the longitudinal firm-level data series, and data users from academics, government, and the private sector. The timing was motivated by the development of several new micro-data sets that provide much more comprehensive coverage of U.S. firms and plants than has been previously available. These include: the Longitudinal Business Database (LBD), constructed at the Center for Economic Studies of the Census Bureau; the Business Employment Dynamics (BED) and Job Openings and Labor Turnover Survey (JOLTS) programs at the Bureau of Labor Statistics; and the matched worker-employer database under construction as part of the Longitudinal Employer-Household Dynamics (LEHD) program at the Census Bureau. These data sets are also the major source of new government statistics on producer and employment dynamics. The BLS produces quarterly statistics on gross job gains and gross job losses for private sector employers through its BED program. The BED is constructed from state unemployment insurance records and provides job creation and job destruction statistics by industry, state, and firm size. Complementing the BED job flows data is worker flow data from the relatively new JOLTS program at BLS. Job Openings and Labor Turnover Survey (JOLTS) is a monthly survey of roughly 16,000 nonfarm establishments that measures job vacancies, new

Introduction

3

hires, and separations. It provides higher frequency data that is more timely than the BED but has less geographic detail. The Census Bureau has also institutionalized a program to construct Quarterly Workforce Indicators (QWI) that summarize employment dynamics in local labor markets and are based on the data from the LEHD project. The QWI reports information on both job and worker flows down to the county level and for detailed industries. These are some of the first government statistics to summarize the dynamic patterns of producer-level adjustment in the U.S. economy. These programs are discussed in chapters in this volume. Each of these data sets is a significant new resource, and together they are going to be a major source of our knowledge of producer dynamics in the U.S. economy for at least the next decade. The chapters also include analysis of longitudinal micro data sets from Canada, the OECD countries, Sweden, and the United Kingdom, which provide useful sources for comparison. Other chapters in this volume are designed to disseminate information on these data sources within the research community, provide a reference source for future users of the data, and present new empirical results that extend the measurement and analysis of producer dynamics to sectors of the economy beyond manufacturing, to a broader range of countries, to firm transitions in international markets, and to linkages between firm and worker turnover. All of these are areas where empirical research on producer dynamics is in its infancy. Cross-Country Comparison of Producer Dynamics The volume is divided into five sections based on the type of data that is used in each chapter. The first section reports the results of a project undertaken by Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta to develop comparable cross-country data on firm entry, exit, and turnover. Over the last decade there has been tremendous effort to develop statistics on producer dynamics in many countries, but the efforts are largely independent and reflect idiosyncracies in each countries’ data collection process. The usual problems of comparability that exist when analyzing data from different countries’ national accounts are compounded when cross-country comparisons on firm turnover are attempted. The unit of analysis (establishment, firm, line of business), the population of firms under study, the definitions of entry and exit, and the variables used to measure entry and exit (producer counts, employment, sales) often differ across countries. In this chapter, the authors report results from a large research project bringing together researchers from twenty-four countries to standardize data definitions and construct comparable statistics on producer dynamics and productivity. Even with the extensive coordination in the construction of the individual country data, measurement differences still exist, but the authors are

4

Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts

able to draw some broad comparisons. Looking across countries, annual entry and exit rates are substantial in most cases, averaging between 5 percent and 10 percent of the business population. Somewhat surprisingly, countries that are often believed to have rigidities that impede the development of new businesses have relatively high entry and exit rates. For example, France has entry and exit rates quite similar to the United States and Canada. Eastern European countries, in general, are found to have extensive restructuring in their business populations with very high entry rates of new businesses. Less than one-half of new firms survive through their seventh year in most countries studied. Bartelsman, Haltiwanger, and Scarpetta also document the micro-level sources of productivity growth through a set of productivity decompositions. The goal is to identify the relative contributions to labor productivity growth of entering and exiting firms, within firm productivity changes, and between firm reallocation in shares. The findings show that the within-firm changes in productivity and net entry are the major sources of labor productivity growth in most countries. Employment Dynamics A second significant line of data construction and research over the last decade has focused on the patterns of employment dynamics—the movement of workers in and out of jobs and the creation and destruction of employment positions. The Bureau of Labor Statistics and Census Bureau have made considerable progress in developing new data surveys and augmenting existing data programs to produce information on employment dynamics. The second section of this volume contains four chapters that discuss and utilize these new data series. The first two chapters, by Jason Faberman and Éva Nagypál, report on a new BLS survey: Job Openings and Labor Turnover Survey (JOLTS). This survey provides information on labor force dynamics by surveying establishments monthly about vacancies, hiring, and separations. Faberman presents an overview of the JOLTS program and an analysis of establishment-level vacancies and employment flows. A particular strength of the JOLTS program is that it produces a new series on job vacancies that is much less idiosyncratic than the help-wanted indices used in previous studies. The micro data also show a much more complex adjustment process than that observed in the aggregate series. Cyclical variation in separations is driven more by shifts in the distribution of growth rates of establishments than by changes in the average separation rates across the distribution of establishments. Establishments that are contracting or expanding have greater hiring and separation rates than stable establishments. While these patterns in labor turnover are related systematically to local unemployment conditions, differences in state unemployment rates explain little of the overall variation in establishment-level employment flows.

Introduction

5

Nagypál raises a number of important measurement issues in the JOLTS data. First, she identifies the large discrepancy in employment growth over the period 2000 to 2004 between JOLTS and other BLS establishment surveys. It is due primarily to the understatement of separations in the earliest JOLTS surveys. Over time, BLS has made improvements to the survey to reduce the problem, though Nagypál reports that at the industry-level large discrepancies remain. Nagypál also discusses a number of measurement issues with regard to vacancies. Job Openings and Labor Turnover Survey (JOLTS) only measures vacancies that are to be filled within a thirty-day period. Hiring environments where vacancy posting substantially precede the actual hiring date are excluded from the data. Job Openings and Labor Turnover Survey (JOLTS) also measures vacancies as a stock of positions and misses short-duration vacancies. The magnitude of the measurement error will be larger in sectors and time periods with high arrival rates of job candidates. Each of these issues will cause a systematic understatement of vacancies in the data. The final step of the author’s analysis estimates a simple matching function from the JOLTS data, and she finds that the matching function differs markedly across industries. The third chapter in this section, by Richard Clayton and James Spletzer, provides an overview of the Business Employment Dynamics (BED) database at the BLS. This database has been constructed from state unemployment insurance records through the Quarterly Census of Employment and Wages (QCEW) program. The BED contains data on virtually all private business establishments in the United States from 1992 onwards and produces statistics on quarterly job creation and destruction due to plant openings, expansions, contractions, and closings. Clayton and Spletzer provide a detailed analysis of job creation and destruction in the 2001 recession and the subsequent years. Job destruction initially rose sharply but then fell back to prerecession levels quickly. Alternatively, the drop in job creation persisted. To better understand the sources of job flows during the 2001 recessions, the authors examine the underlying micro changes and find that most of the decline in employment is due to concentrated increases in job creation and destruction in a relatively small number of establishments. The final chapter in this section, by John Abowd, Bryce Stephens, Lars Vilhuber, Frederik Andersson, Kevin McKinney, Marc Roemer, and Simon Woodcock, presents detailed documentation of the LEHD data sources and the methods used to construct the QWI. The QWI represents a major new statistical initiative by the Census Bureau to construct job flow statistics for county and MSA-level labor markets. The data underlying the QWI are drawn from the LEHD database, which combines employer and employee information. The QWI reports statistics on job creation, new hires, separations, and earnings for all employees and new hires disaggregated by industry, geography, and worker characteristics such as age and

6

Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts

gender. This level of detail is far greater than any other currently available statistics on employment flows. In addition to the creation of the underlying micro data set, the QWI project has invested heavily in the development of disclosure techniques that preserve the confidentiality of the data but allow for the release of very disaggregated summary statistics. Overall, the chapter provides a valuable reference source for users of the QWI and the LEHD. Sector Studies of Producer Turnover The earliest studies of producer dynamics focus on the manufacturing sector because this tends to be the sector that is most consistently surveyed and has the best micro data on producers. In addition, almost all studies of producer dynamics use data on firms with paid employees and ignore nonemployer firms. The third section of this volume contains chapters that look beyond the traditional data sources, focusing on producer dynamics in retailing, service industries, and agriculture, and extending the measurement of producer dynamics to the nonemployer segment of the business universe. The chapter by Ron Jarmin, Shawn Klimek, and Javier Miranda documents the entry and exit of establishments and firms in the U.S. retail sector based on analysis of the Census Bureau’s newly developed Longitudinal Business Database (LBD). The LBD covers all establishments with at least one paid employee and all industrial sectors of the economy for the period 1976 through 2005. While the LBD contains limited information on the establishment’s characteristics and activities, it can be linked with other Census Bureau establishment data, which considerably enhances the scope and depth of the available information. This new data has the potential to enhance our understanding of such topics as job creation and destruction, firm turnover, the life cycle of establishments, and changes in the industrial structure of the U.S. economy. In their chapter, Jarmin, Klimek, and Miranda document the overall changes in employment and the number of establishments in the retail sector focusing on differences between chain stores and individually-owned establishments. Over the last several decades there has been a fundamental shift in the organizational structure of the industry, with a significant expansion of stores owned by multi-store firms and a decline of individually-owned stores. The chapter shows that firm turnover has declined over time in most retail industries but differs systematically by market size and ownership structure. Metro areas have the highest producer turnover while rural areas have the lowest turnover. Independently-owned stores experience higher turnover compared to chain stores, but there is little difference in turnover across different types of chain stores—local, regional, or national chains. Continuing with the analysis of the retail sector, Jonathan Haskel and

Introduction

7

Raffaella Sadun document producer dynamics and productivity growth in U.K. retailing. Store entry and exit rates are quite high in the United Kingdom, averaging 10 to 15 percent per year over the period 1998 to 2003. These rates are similar across most retail industries with the exception of pharmacy stores, which has much lower rates. The chapter decomposes changes in sectoral productivity between 1998 and 2003 and finds that entry and exit play an important role in accounting for the productivity growth in U.K. retailing. These findings suggest that producer turnover in U.K. retailing enhances productivity by replacing lower productivity exiting firms with higher productivity entering firms. One complication in the U.K. micro data is that the surveys collect information from different reporting levels, making it difficult to combine data on firm-level productivity with store-level entry and exit measures. The chapter by Timothy Dunne, Shawn Klimek, Mark Roberts, and Daniel Yi Xu models the entry and exit flows in two medical services industries—dentists and chiropractors—using data for small geographic markets in the United States. They provide some of the first evidence on producer dynamics in healthcare industries using the U.S. Census Bureau’s Census of Services. In the industrial organization literature, researchers have used models of entry to explain differences in the number of firms across markets of different size. While useful for understanding long-run market structure, the models do not explain differences in entry and exit flows across markets. The authors use a dynamic model that recognizes the different costs faced by incumbent producers and potential entrants, and specify entry and exit flow regressions consistent with the dynamic framework. They find an important role for past market structure and the number of potential entrants as determinants of the level of producer turnover; this supports the dynamic framework. A common theme of virtually all papers on producer turnover is that they focus on firms or establishments with paid employees. In the United States in 2000, almost 75 percent of all firms (15.4 million out of 20.8 million) had no employees. The chapter by Steven Davis, John Haltiwanger, Ron Jarmin, C. J. Krizan, Javier Miranda, Al Nucci, and Kristin Sandusky represents the first effort to measure producer dynamics for this segment of the business population. A key contribution of the project is that it not only documents producer turnover in the nonemployer segment but also identifies transitions between nonemployer and employer firms. Of the 2.3 million employer businesses in their industry sample in 2000, 11 percent can be linked to a nonemployer business that existed between 1992 and 2000. However, it is rare for a nonemployer firm to become an employer firm. Of the almost 7.4 million nonemployer firms in the industries under study in 1994, only 3 percent became an employer firm by 1997. This data source provides enormous potential for a better understanding of the evolution of young and small firms. For example, the study shows that fluctuations in

8

Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts

nonemployer size, measured in terms of revenue, from year to year are much larger for nonemployer firms than employer firms; but as nonemployer firms age and grow, the volatility of their revenue stream declines. This latter finding is similar to age and volatility patterns observed in the employer data. The final chapter on sectoral patterns of producer turnover provides the first statistics on producer dynamics in the agriculture sector. Mary Ahearn, Penni Korb, and Jet Yee utilize data from the U.S. Census of Agricultures from 1978 to 1997 to provide new statistics on the entry, exit, and growth of farms. Entry and exit rates are measured by the number of farms, the volume of sales, and the acreage of land under cultivation. The main patterns show considerable turnover of farms over the entire period. Average annual entry and exit rates appear higher than those reported for other sectors of the U.S. economy, especially when one considers weighted measures such as sales or acreage share of entering and exiting farms. In their data, entry and exit include the sale and purchase of farmland; thus, entry and exit statistics can reflect sales or leases of an existing farm and thus does not directly correspond to the movement of land in or out of agricultural production. The authors document patterns of producer dynamics that differ from those found in many manufacturing sectors. Older cohorts have relatively low shares of sales and land and there is only a slight increase in the average size of farms as a cohort ages. Within a cohort, small continuing farms actually tend to shrink over time while larger farms have higher growth. This is opposite the patterns one sees in manufacturing, where there is a strong inverse relationship between growth and size conditioning on a firm remaining in business. Employer-Employee Dynamics A broader view of labor market dynamics integrates producer decisions to enter and exit production and expand and contract the employment positions within a firm, with the worker’s decisions to move in and out of existing employment positions. Both are a potentially important source of labor market flows, but the data requirements to measure these separate sources are demanding. Section four of this volume includes chapters that use linked employer and employee data to present a more detailed picture of worker turnover and the human capital present at a workplace. The first chapter, by Don Siegel, Ken Simon, and Tomas Lindstrom, uses matched employer-employee data from the Swedish manufacturing sector to study how corporate ownership changes affect the performance of the firm and the composition of the firm’s workforce. They find that plants undergoing ownership changes have lower labor and total factor productivity prior to the ownership change but that productivity rises to industry norms after the ownership change. The composition of the plant’s

Introduction

9

workforce also changes. Average age, worker experience, and the percent of college-educated employees rise in the plant after a change in ownership, while the share of women falls slightly. Overall, it appears that in the downsizing of these operations, plants shed workers with short job tenures and these are more likely to be younger and female workers. The two remaining chapters in this section use data from the LEHD. John Abowd, Kevin McKinney, and Lars Vilhuber utilize the LEHD to measure the human capital embodied in a firm’s workforce and relate it to the performance of the firm. The authors construct an index of human capital for each worker in a plant by decomposing the employee’s wage into a firm component and a worker component. For each employer, they construct the distribution of human capital for the workforce and examine if this is correlated with the probability a firm undergoes a mass layoff or closes. They find that mass layoffs and firm failure are much more likely in firms with a large proportion of low human capital workers. Finally, firms that do not fail generally upgrade the human capital of their workforce. Anja Decressin, Tomeka Hill, Kristin McCue, and Martha Stinson leverage the richness of the LEHD data set by augmenting the LEHD with publicly-available data on employee benefits (collected in IRS Form 5500) offered by different companies. This allows them to combine measures of worker characteristics and employer characteristics with information on nonwage compensation, including health plans and defined benefit and defined contribution pension plans. The authors show that the level of benefits offered by the firm is negatively correlated with employee turnover, but this largely reflects underlying differences in the human capital of the workforce. Firms that offer benefits have higher-skilled workers and these skilled workers have lower turnover rates. Moreover, firms offering benefits have higher labor productivity and are more likely to survive, even after controlling for worker and firm characteristics and wage compensation. Producer Dynamics in International Markets Research in international trade has recognized the importance of firm heterogeneity in productivity and profitability as factors that affect the decision to participate in international markets. The final section of this volume contains two chapters that use micro data to study transitions of firms into and out of import and export markets. Andy Bernard, Brad Jensen, and Peter Schott develop a new data set on import and export activity of U.S. firms, and provide a set of stylized facts on participation patterns. The authors combine transaction-level records of imports collected by U.S. Customs with firm-level exports collected by the U.S. Census Bureau for the period 1993 to 2000. They link these observations with the LBD, which will allow researchers to incorporate a large set of firm characteristics from the LBD into the analysis of micro trade flows. An important feature of

10

Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts

these data is that it allows the authors to distinguish between related party and arms-length transactions. Thus, one can measure the flow of crossborder goods within multinational firms. The chapter documents a number of striking patterns. The fraction of firms engaged in trade is small but growing—two to three percent of the total number of firms in the United States. However, these importing/exporting firms are large, accounting for approximately 40 percent of private sector employment in the United States. Ninety percent of import and export activity involves multinational firms and related-party transactions make up approximately one-half of imports and one-third of exports. The authors also analyze the employment dynamics of firms involved in trade. Firms that export had higher employment growth than nonexporters, and firms that entered the export or import market between 1993 and 2000 experienced very high employment growth rates. Alternatively, firms that stopped exporting and/or importing suffered declines in employment. The final chapter, by John Baldwin and Wulong Gu, explores the impact of trade liberalization resulting from the 1989 Canada-U.S. Free Trade Agreement on the decision of Canadian manufacturers to enter or exit the export market. Using a theoretical framework in which producers differ in their productivity, they characterize the determinants of a firm’s decision to enter the export market. Firms export depending on their relative cost advantage—the most efficient firms produce for the domestic and export markets, less efficient firms produce only for the domestic market, and the least efficient firms close. Trade liberalization increases the size of the market and results in greater firm specialization. Exporting firms withdraw from some product markets and expand the volume of output in their remaining products. Nonexporting firms, however, do not benefit from this increase in market size and instead face increased competition and, on average, become smaller. Using micro data for Canadian manufacturing plants, the authors test the predictions of the model using tariff rate changes as a measure of trade liberalization. They find that nonexporting firms reduce the number of product lines and decrease plant size in response to a lowering of tariffs. They find that exporting firms become more specialized and larger but these changes are not strongly correlated with industry-specific tariff reductions. Conclusion Producer dynamics can be viewed from many perspectives, including movements of firms or plants in or out of production, transitions between different geographic or product markets, or shifts of an entrepreneur from self-employment to employer status. Regardless of the focus, the decisions of firms to change the nature of their production is an important mechanism contributing to the reallocation of resources. The chapters in this vol-

Introduction

11

ume have developed and utilized a number of important micro data sets that provide a window on this diverse set of producer transitions. A recent report by the National Research Council, “Understanding Business Dynamics: An Integrated Data System for America’s Future” (Haltiwanger, Lynch, and Mackie 2007), presents a blueprint for further development of the U.S. data system to allow more accurate and timely measurement of the dynamic forces at work in the economy. Among the recommendations in the report is one to encourage the interaction of the statistical agencies that create the producer micro data and the researchers from academia, business, and government that analyze it. The chapters in this volume provide ample evidence of the knowledge that can be gained by researchers working with the statistical agencies to document and analyze the dynamic process of firm entry, growth, and exit. In many areas, particularly the service sector, the nonemployer universe, and the international arena, measurement issues have only recently begun to be addressed and much work remains. The recent efforts of the U.S. statistical agencies to produce new statistics that document the flows of workers and firms in a timely and consistent way is another important avenue through which knowledge of the process of producer dynamics is expanding. The Business Employment Dynamics (BED) program at the BLS and the Quarterly Workforce Indicators (QWI) program at the Census Bureau are providing timely information on dynamic aspects of the U.S. economy that complement the traditional focus on aggregate statistics at a point in time. Still, the series are relatively new and a better understanding of the economic forces that drive the dynamic patterns in the producer data is needed.

References Bailey, M. N., C. Hulten, and D. Campbell. 1992. Productivity dynamics in manufacturing plants. Brookings Papers on Economic Activity: Microeconomics: 187–249. Bernard, A., J. Eaton, J. B. Jensen, and S. S. Kortum. 2003. Plants and productivity in international trade. American Economic Review 93 (4): 1268–90. Bernard, A., and J. B. Jensen. 1995. Exporters, jobs, and wages in U.S. manufacturing, 1976–1987. Brookings Papers on Economic Activity: Microeconomics: 67–119. Caballero, R., E. M. R. A. Engel, and J. Haltiwanger. 1995. Plant-level adjustment and aggregate investment dynamics. Brookings Papers on Economic Activity 1955 (2): 1–54. Das, S., M. J. Roberts, and J. R. Tybout. 2007. Market entry costs, producer heterogeneity, and export dynamics. Econometrica 75 (3): 837–73. Davis, S. J., J. Haltiwanger, and S. Schuh. 1996. Job creation and destruction. Cambridge, MA: The MIT Press. Doms, M., and T. Dunne. 1998. Capital adjustment patterns in manufacturing plants. Review of Economic Dynamics (1): 409–29.

12

Timothy Dunne, J. Bradford Jensen, and Mark J. Roberts

Dunne, T., M. J. Roberts, and L. Samuelson. 1988. Patterns of firm entry and exit in U.S. manufacturing industries. The RAND Journal of Economics 19 (4): 495– 515. ———. 1989. The growth and failure of U.S. manufacturing plants. The Quarterly Journal of Economics 104 (4): 671–98. Griliches, Z., and H. Regev. 1995. Firm productivity in Israeli industry 1979–1988. Journal of Econometrics 65:175–203. Haltiwanger, J., L. M. Lynch, and C. Mackie, eds. Understanding Business Dynamics: An Integrated Data System for America’s Future. 2007. Report to the Committee on National Statistics, The National Research Council. Washington, D.C.: The National Academies Press.

1 Measuring and Analyzing Cross-Country Differences in Firm Dynamics Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

1.1 Introduction Cross-country comparisons and analysis of firm dynamics are inherently interesting, but also inherently difficult. Such comparisons are important because they provide insights into the efficiency with which resources are allocated in the economy and its effects on output, productivity, and employment. Empirical evidence for developed economies shows that healthy market economies typically exhibit a high pace of churning of outputs and inputs across businesses.1 Moreover, the evidence shows that this churning is productivity enhancing as outputs and inputs are being reallocated from less productive to more productive businesses. These findings raise the question as to whether differences in economic performance across countries can be accounted for by differences in the efficiency of the churning process across countries. A closely related question is whether certain regulations and institutions in different markets affect the churning Eric Bartelsman is a professor of economics at the VU Amsterdam, and a research fellow of the Tinbergen Institute. John Haltiwanger is a professor of economics at the University of Maryland, and a research associate of the National Bureau of Economic Research. Stefano Scarpetta is Head of the Country Studies Division III in the Economics Department of the Organization for Economic Cooperation and Development (OECD), and a Research Fellow and Deputy Program Director at IZA. We would like to thank our discussant Timothy Dunne and the participants to the 2005 NBER Conference on Research on Income and Wealth on “Producer Dynamics: New Evidence from Micro Data” (April, Washington, D.C.) for insightful and useful comments. We are grateful to the World Bank for financial support of this project and to Karin Bouwmeester, Helena Schweiger, and Victor Sulla for excellent research assistance. The views expressed in this chapter are those of the authors and should not be held to represent those of the institutions of affiliation. 1. See Caves (1998), Bartelsman and Doms (2000); Ahn (2000); and Foster, Haltiwanger, and Krizan (2001) for surveys.

15

16

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

process in a manner that slows the reallocation of resources towards more productive uses. In this chapter, we adopt the working hypothesis that policy and institutions affecting the business climate (broadly defined) may have important implications for the magnitude but also the effectiveness of firm dynamics and resource reallocation. While individual country studies can provide important insights into this issue by looking at within-country variation in performance of sectors or individual firms, another way to test the hypothesis is to link firm performance across countries that differ in their regulatory and policy settings. This strategy, however, involves an ongoing measurement and research agenda to develop comparable measures of firm dynamics across countries that can be directly related to business climate conditions. The interest in this type of analyses is rapidly spreading beyond the industrial countries and involves many developing and emerging economies that are struggling with regulatory reforms to stimulate private investment and productivity growth.2 In principle, using firm-level data to assess cross-country differences in economic performance is attractive. It avoids some of the problems typically affecting macro analyses. For example, interpreting the observed persistent differences in income per capita across countries or even growth rates of GDP and productivity has been a challenge for a long time. This is not because of the lack of candidate explanations, but rather because of the overwhelming number of possible factors. As such, the finding of a statistically significant correlation between cross-country differences in economic performance and any possible policy, institutional, or structural variable is fraught with problems of interpretation given the (many) omitted variables.3 It is misleading to argue that the firm dynamics approach overcomes the omitted variable and associated unobserved heterogeneity problems that afflict macro analyses. But the firm-level approach potentially offers a tighter theoretical link between specific institutional measures and relevant outcomes. For example, indicators of firm dynamics allow testing whether regulatory distortions that impinge on entry costs indeed affect the pace and nature of firm entry. In practice, cross-country comparisons of measures of firm dynamics suffer from significant definitional and measurement problems. Changes at the firm level take different forms, and no single indicator is likely to capture this complexity in a way that can be related to all regulatory or insti2. A number of works have recently explored the role of firm dynamics for productivity and growth in developing and emerging economies. They include Eslava et al. (2004); Roberts and Tybout (1997); Aw, Chung, and Roberts (2003); and Brown and Earle (2004). 3. This explains the difficulty in obtaining robust empirical results from macro growth regressions (e.g., Barro and Sala-i-Martin [1995] and Doppelhofer, Miller, and Sala-i-Martin [2004]). See Scarpetta (2004) for recent attempts at estimating macro growth regressions for the OECD countries.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

17

tutional issues in a meaningful way. This conceptual problem is often confounded by measurement problems induced by cross-country differences in coverage, unit of observation, classification of activity, and data quality. The combination of conceptual and measurement problems can be illustrated by considering the most basic measures of firm dynamics—the rates of firm entry and exit—and comparing them across countries with indicators of economic performance. Figure 1.1 shows the rank ordering of countries according to gross firm turnover (entry plus exit rates) and GDP per capita levels and growth rates.4 We consider these rank orderings for a set of countries for which we have harmonized statistics on firm turnover rates. The rank ordering of GDP per capita levels and growth rates are quite plausible. But while the rough order of magnitude reported in figure 1.1 for firm turnover is also reasonable, the rank ordering across countries of the firm turnover rates is more difficult to interpret. Relatively high firm turnover rates are observed both in countries with high income levels and/or high growth rates as well as in poorer and/or slow-growth countries (and vice versa).5 We argue in the chapter that this is because it is not clear whether there is an unequivocal relationship between firm turnover and economic performance, but also because there could be measurement problems that affect the cross-country comparisons of firm turnover. In this chapter, we review the measurement and analytical challenges of handling firm-level data so as to provide a user’s guide on how to construct and how to compare measures of firm dynamics across countries. In broad terms, we have three basic messages. First, it is very important to make every attempt to harmonize the indicators of firm dynamics by imposing the same metadata requirements and aggregation methods on the raw firmlevel data. Second, while harmonization is necessary, it is far from sufficient. As illustrated in figure 1.1, some core cross-country comparisons will not only be problematic because of remaining possible measurement problems, but also because some firm-level indicators cannot be unequivocally linked to better or worse economic performance. However, the third message is that there are ways to overcome at least some of the measurement problems. While the details differ depending on the type of measure and question of interest, we show that by using measurement or analytic methods that amount to some form of difference-in-difference approach, the problems we identify can be significantly reduced. The chapter proceeds as follows. In section 1.2, we describe our distrib4. This chapter draws on firm-level indicators for a sample of countries that participated in the distributed micro-data analysis. We made every attempt to harmonize the statistics by providing detailed protocols and programs to researchers with access to the confidential micro-level data sets in their countries. The indicators in our database are built up from these (confidential) micro-level sources. 5. Note that the correlation between firm turnover and the GDP/capita measures is low (– 0.22 using GDP per capita levels; and 0.18 using GDP per capita growth).

18

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

A

B

Fig. 1.1 Comparisons of GDP per capita growth and firm turnover: A, GDP per capita and firm turnover, 1996; B, GDP per capita growth and firm turnover, 1990–2003 Note: For transition economies (Estonia, Latvia, Hungary, Romania, and Slovenia) 1996–2003

uted micro-data analysis that we advocate and have used in our crosscountry comparison project. As we make clear, the problems illustrated in figure 1.1 are much worse if there is not an attempt at harmonization. In section 1.3, we describe the data collected in the World Bank and Organization for Economic Cooperation and Development (OECD) firm-level projects. In section 1.4, we provide a canonical representation of the possible sources of measurement problems in using firm-level statistics for comparative purposes. We use this representation to help us think through what types of comparisons are likely to be robust and what types of comparisons will not be robust to measurement error of different types. Sections 1.5 and 1.6 explore cross-country comparisons that can be made us-

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

19

ing our harmonized data. We present basic facts from these data, which are of interest in their own right, but discuss them in light of the measurement challenges we have described. In section 1.5, we first present the distribution of firms by size; we then document the magnitude and key features of firm dynamics (entry and exit of firms) and, finally, we study post-entry performance of different cohorts of new firms. In section 1.6 we analyze the effectiveness of creative destruction for productivity growth. We distinguish between the productivity contribution coming from the process of creative destruction (entry and exit of firms) to that stemming from withinfirm efficiency improvements and reallocation of resources across incumbents. In the last section, 1.7, we draw conclusions and discuss next steps for this approach for cross-country comparisons of firm dynamics. In this discussion, we present some ideas of the dos and don’ts of working with firm-level data for purposes of constructing and analyzing cross-country measures of firm dynamics. 1.2 Distributed Micro-Data Analysis The indicators used for cross-country comparisons in this chapter have been collected by a network of researchers with access to (confidential) micro data. The construction of the indicators in each country followed a common methodology and led to a cross-country harmonized metadata. This collection method is an attempt at the generation of comparable cross-country statistics. It is part of a long tradition of statistical harmonization that has resulted in a wide variety of cross-country sources of economic data, ranging from national accounts information to internationally harmonized surveys. Over the past decades, much institutional effort has been devoted to harmonize national accounts data across countries in order to allow meaningful cross-country comparisons. While the nominal and real indicators of GDP available in each country’s national accounts are generally comparable over time, divergence between exchange rates and purchasing power have often clouded cross-country comparisons. Several sources (including the OECD for its member countries and the World Bank for a larger set of countries) now provide Purchasing Power Parity indicators (PPPs) to convert various expenditure components of GDP into internationally comparable units. Significant efforts have also been made to produce comparable statistics at the sectoral level (e.g., the OECD Structural Analysis database—STAN— the United Nations Industrial Development Organization [UNIDO], and more recently, the EUKLEM databases). While the main underlying sources of these data are sectoral disaggregations from national accounts, other sources such as labor accounts and production statistics are generally used to fill holes. Essentially, these data sets are top down, in that sectoral output and compensation add up to national accounts totals, up to various adjustments (such as owner-occupied housing, etc.). These adjust-

20

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

ments are often not well known, and applied researchers using these data (Bernard and Jones 1996a; Griffith, Redding, and Van Reenen 2000; Nicoletti and Scarpetta 2003) generally take these sectoral data as given. Comparable micro-level data sets are even less frequent, and comparability issues are generally more severe. However, several attempts have been made to harmonize household panel surveys and labor force surveys to improve cross-country comparability. The Luxembourg Income Study, the European Community Household Panel (ECHP), or the Integrated Public Use Microdata Series (IPUMS) data sets are all examples of this effort to compile and use comparable micro data sets. Standardized Labor Force Surveys, following International Labour Organization (ILO) definitions, are also available for a large set of countries. At the firm-level, no comprehensive survey exists with data for multiple countries, nor are there international data sets that contain micro-level data for comprehensive samples of firms.6 The EU Statistical Office (EUROSTAT) has recently made a major effort in assembling a data set on firm demographics for a number of EU member countries, using common definitions and classifications.7 The data collection is based on existing data sources and some idiosyncrasies in the data cannot be eliminated. At the same time, the World Bank has been collecting data on relatively small samples of firms in more than fifty developing and emerging economies worldwide (World Bank 2004).8 These data are often limited to a few industries and do not allow tracking firm dynamics. 1.2.1 How to Collect and Compare Firm-Level Data A data set consisting of stacked micro-level data sets from multiple countries will contain the necessary information lacking from either singlecountry micro data sets or multiple-country sectoral data sets. Unfortunately, owing to the legal requirement of maintaining confidentiality of firms’ responses in many countries, micro data sets from individual countries cannot be stacked for analysis. Creating public use data from the underlying sources is a possible workaround for disseminating otherwise confidential data. For firm-level data, a public-use data set made through 6. Commercially published data sets such as Compustat or Amadeus provide panel data on financial information of publicly traded corporations. 7. See EUROSTAT (2004). The Eurostat data focus on eleven European countries over the period 1997–2000, and considers all firms, including those with zero employees. 8. This data collection is based on Investment Climate Assessment (ICA) surveys, including information on firm characteristics and performance as well as perceptions of managers about the regulatory and political environment in which they operate. A discussion of the advantages and disadvantages of the alternative approaches as well as the relationship on key findings from the ICA data set versus the type of firm-level data used here is provided in Haltiwanger and Schweiger (2004). Recent works that have used the ICA data to study firm performance include Bastos and Nasir (2004), Dollar, Hallward-Driemeier, and Mengistae (2003), Hallward-Driemeier, Wallsten, and Xu (2003).

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

21

randomization or micro-aggregation is often not feasible without the loss of necessary information. Another possible work-around is to create a data set consisting of results from single-country studies that become the input for a meta-analysis. For example, a collection of results from single-country studies on the link between Information and Communication Technology (ICT) and growth at the firm-level were presented in a recent volume of the OECD (2004). However, the combination of results of analyses from single-country studies will not provide a solution if the focus of the analysis is not identical or if methodologies differ significantly. In the World Bank and OECD firm-level projects, a hybrid approach was followed that mitigates many of the discussed problems. Given the impossibility of stacking together firm-level data for different countries, a common protocol was used to extract from the raw country data set of detailed indicators. The protocol was designed after face-to-face meetings with country experts and collection of metadata describing each country’s data sets.9 The protocol was then run on micro-level data sets in each country separately by experienced researchers. The decentralized output was combined and provided the information necessary for the cross-country analysis. This approach was first developed for the OECD firm-level growth project and is known as distributed micro-data analysis (Bartelsman 2004). It requires tighter coordination and less flexibility in research design in each country than for meta-analysis, where the methodology and output may vary across samples.10 The method of distributed micro-data analysis maintains the advantages of multicountry studies with aggregated data because the output provided by each country consists of indicators aggregated to a prespecified level of detail that passes disclosure in all countries. The method also maintains information on behavior of agents residing in micro data because the computed indicators on the ( joint) distribution of variable(s) are designed to capture hypothesized behavior. While not allowing the full flexibility of re9. In addition to the authors of this chapter, the researchers involved in the distributed micro-data analysis network for the various projects are: John Baldwin (Canada); Tor Erickson (Denmark); Seppo Laaksonen, Mika Maliranta, and Satu Nurmi (Finland); Bruno Crépon and Richard Duhautois (France); Thorsten Schank (Germany); Fabiano Schivardi (Italy); Karin Bouwmeester, Ellen Hoogenboom, and Robert Sparrow (the Netherlands); Pedro Portugal Dias (Portugal); Ylva Heden (Sweden); Jonathan Haskel, Matthew Barnes, and Ralf Martin (United Kingdom); Ron Jarmin and Javier Miranda (United States); Gabriel Sánchez (Argentina); Marc Muendler and Adriana Schor (Brazil); Andrea Repetto (Chile); Maurice Kugler (Colombia and Venezuela); David Kaplan (Mexico); John Earle (Hungary and Romania); Mihails Hazans (Latvia); Raul Eamets and Jaan Maaso (Estonia); Mark Roberts (Korea, Indonesia, and Taiwan [China]); Milan Vodopivec (Slovenia). 10. The methodology for the International Wage Flexibility Project (Dickens and Groshen 2003), evolved over time from meta-analysis to a more coordinated system with centralized research protocols, distributed computation, and centralized analysis, and now is very similar to distributed micro-data analysis.

22

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

search design available with multicountry stacked micro data, distributed micro-data analysis provides a skilled researcher the ability to use crosscountry variation to identify behavioral relationships. 1.3 Description of the Data The firm-level project organized by the World Bank involves fourteen countries (Estonia, Hungary, Latvia, Romania, Slovenia, Argentina, Brazil, Chile, Colombia, Mexico, Venezuela, Indonesia, South Korea, and Taiwan [China]) This project complements a previous OECD study that collected—along the same procedure—firm-level data for ten industrial countries: Canada, Denmark, Germany, Finland, France, Italy, the Netherlands, Portugal, United Kingdom, and the United States. Both projects use a common analytical framework that involves the harmonization, to the extent possible, of key concepts (e.g., entry, exit, or the definition of the unit of measurement) and the definition of common methods to compute the indicators. The distributed micro-data analysis was conducted for two separate themes. The first theme focused on firm demographics, and collected indicators such as entry and exit, job flows, size distribution, and firm survival. The second theme gathered indicators of productivity distributions and correlates of productivity. In particular, information was collected on the distribution of labor and/or total factor productivity by industry and year, and on the decomposition of productivity growth into within-firm and reallocation components. Further, information was collected on the averages of firm-level variables by productivity quartile, industry, and year. The key features of the micro-data underlying the analysis are as follows: Unit of observation: Data used tend to conform to the following definition (EUROSTAT 1998): “an organizational unit producing goods or services which benefits from a certain degree of autonomy in decisionmaking, especially for the allocation of its current resources.” Generally, this will be above the establishment level. However, firms that have operating units in multiple countries will have at least one unit counted in each country. Of course, it may well be that the national boundaries that generate a statistical split-up of a firm in fact split a firm in a real sense as well. Also related to the unit of analysis is the issue of mergers and acquisitions. Only in some countries does the business register keep close track of such organizational changes within and between firms. In addition, ownership structures themselves may vary across countries because of tax considerations or other factors that influence how business activities are organized within the structure of defined legal entities. Size threshold: While some registers include even single-person businesses (firms without employees), others omit firms smaller than a certain size,

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

23

usually in terms of the number of employees (businesses without employees), but sometimes in terms of other measures such as sales (as is the case in the data for France). Data used in this study exclude singleperson businesses.11 However, because smaller firms tend to have more volatile firm dynamics, remaining differences in the threshold across different country data sets should be taken into account in the international comparison.12 Period of analysis: Firm-level data are on an annual basis, with varying time spans covered. Sectoral coverage: Special efforts have been made to organize the data along a common industry classification (ISIC Rev.3) that matches the OECD-STAN database. In the panel data sets constructed to generate the tabulations, firms were allocated to one STAN sector that most closely fit their operations over the complete time span. In countries where the data collection by the statistical agency varied across major sectors (e.g., construction, industry, services), a firm that switched between major sectors could not be tracked as a continuing firm but ended up creating an exit in one sector and an entry in another. For industrial and transition economies, the data cover the entire nonagricultural business sector, while for most of Latin America and East Asia data cover the manufacturing sector only. Unresolved data problems: An unresolved problem relates to the artificiality of national boundaries to a business unit. As an example, say that the optimal size of a local activity unit is reached when it serves an area with ten million inhabitants. In smaller nations, one activity unit must be supported by the administrative activities of a business unit. If the EU boundaries were to disappear, the business unit could potentially serve twenty-seven activity units. This geographic consideration may contribute to explain why we observe a larger average firm size in a country like the United States in our sample, although this is not the case in another large country, Brazil. From a policy perspective, this difference may point towards aligning regulations in a manner that would allow busi11. The share of firms without employees is large in most countries for which data are available (see EUROSTAT 2004). Their inclusion in the analysis of firm demographics is problematic for a number of reasons, however. Zero employee firms may include part-time activities and formally self-employed people who work regular hours on a long-term basis for a sole client, thus appearing more like dependent employees for most purposes. To the extent that people involved in this false self-employment have little intention to expand their business or innovate, they are of limited interest for studies investigating the role of the entrepreneurial process for technological change, employment growth, and economic performance. In some countries/sectors, the amount of false self-employment may be quite sizeable, and possibly depends on different regulations affecting hiring and firing costs as well as taxes on labor use. 12. The productivity data are collected at different levels of aggregation in different countries and very few are able to work at more than one level. A sensitivity analysis of the productivity decompositions suggests, however, that this issue does not significantly affect the results.

24

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

ness units to enjoy transnational scale economies in meeting administrative requirements. Also related to the unit of analysis is the issue of mergers and acquisitions: only in some countries business registers have been keeping track of such organizational changes within and between firms in the most recent years. 1.3.1 The Source of the Data: Firm Demographics The analysis of firm demographics is based on business registers (Canada, Denmark, Finland, Netherlands, United Kingdom, United States, Estonia, Latvia, Romania, and Slovenia), social security databases (Argentina, Germany, Italy, and Mexico), or corporate tax roles (France, Hungary) (table 1.1). Enterprise census data were used for Brazil, Korea, and Taiwan (China), while annual industry surveys—albeit generally not the best source for firm demographics, owing to sampling and reporting issues, were used for Chile, Colombia, and Venezuela. Data for Portugal are drawn from an employment-based register containing information on both establishments and firms, while data for the three East Asian countries are from census of manufacturing firms. All these databases allow firms to be tracked through time because addition or removal of firms from the registers (at least in principle) reflects the actual entry and exit of firms. However, the three to five year frequency of manufacturing census in East Asia precludes computing many of the demographics indicators. 1.3.2 The Source of the Data: Productivity Decompositions The productivity analysis requires information on output, employment, and possibly other productive inputs such as intermediate materials and capital services. For this reason, enterprise surveys were used for most countries. Using these source data, indicators are calculated on labor and/ or total factor productivity disaggregated by STAN industry and year, and on the decomposition of productivity growth into within-firm and reallocation components. The underlying source data and availability of the indicators are provided in table 1.2. Indicators Collected Depending on the availability of output and input measures, we have calculated different indicators of labor and total factor productivity. A number of issues emerged in the calculation of labor and total factor productivity, including: • Labor input was generally based on the number of employees with no correction for hours worked. • Sales and gross output data do not include correction for inventory accumulation. • Capital stock, in countries where available, is based on book values.

Social security Social security Business register Employment-based register Business register Business register Register, based on Integrated System of Pensions Census Annual Industry Survey (ENIA) Annual Manufacturing survey (EAM) Business Register Fiscal register (APEH) Manufacturing survey Census Business register Social security Business register Business register Census Annual Industrial Survey

Germany (West) Italy Netherlands Portugal U.K. U.S. Argentina

Estonia Hungary Indonesia Korea Latvia Mexico Romania Slovenia Taiwan (China) Venezuela

Brazil Chile Colombia

Business register Business register Business register Fiscal database

Source

Period

All but civil service, self-employed All All All but public administration Manufacturing Private businesses All Manufacturing Manufacturing Manufacturing All All Manufacturing Manufacturing All All All All Manufacturing Manufacturing

1995–2002 1996–2001 1979–1999 1982–1998 1995–2001 1992–2001 1990–1995 1983–1993 (3 years) 1996–2002 1985–2001 1992–2001 1992–2001 1986–1991 (2 years) 1995–2000

All Economy All All All

Sectors

1977–1999 1986–1994 1987–1997 1983–1998 1980–1998 1988–1997

1984–1998 1981–1994 1988–1998 1989–1997

Data sources used for firm demographics

Canada Denmark Finland France

Country

Table 1.1

Yes Yes Yes No No Yes Yes Yes Yes No No

Yes Yes Yes

Yes Yes Yes Yes Yes Yes

No No Yes Yes

Availability of survival data

Emp ≥ 10 Emp ≥ 1 Emp ≥ 1 Emp ≥ 10 Emp ≥ 5 Emp ≥ 1 Emp ≥ 1 Emp ≥ 1 Emp ≥ 1 Emp ≥ 1 Emp ≥ 1; sample for 1–15

Emp ≥ 1 Emp ≥ 1 Emp ≥ 10

Emp ≥ 1 Emp ≥ 1 Emp ≥ 1 Turnover: Man: Euro 0.58m Serv: Euro 0.17m Emp ≥ 1 Emp ≥ 1 None Emp ≥ 1 Emp ≥ 1 Emp ≥ 1

Threshold

Census Fiscal database with additional information from enterprise surveys Survey Survey Survey

Employment-based register Survey

Census Annual Industrial Survey (INDEC)

Annual Industrial Survey

Annual Industry Survey (ENIA) Annual Manufacturing survey (EAM) Business Register Fiscal register (APEH) Manufacturing survey Census Business register Business register Business register Census Annual Industrial Survey

Finland France

Germany (W) Italy Netherlands

Portugal U.K.

U.S. Argentina

Brazil

Chile Colombia Estonia Hungary Indonesia Korea (Rep.) Latvia Romania Slovenia Taiwan (China) Venezuela

1980–1985 1982–1986 1995–2000 1992–1996 1990–1995 1988–1993 1996–2001 1995–1998 1992–1997 1986–1991 1995–1999

1997–2001

1987–1992 1990–1995

1997–2002 1996–1999 1997–2001 1991–1996 1996–2000

1994–1999 1994–1998 1996–2001 1997–2001

1992–1997 1996–2001

1993–1998 1987–1992

1990–1995 1993–1998 1993–1998 1992–1997

1985–1990 1992–1997 1982–1987 1983–1988 1986–1991 1980–1985

1989–1994

Last

1975–1980

First

✓ ✓ ✓

✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓

✓

✓

✓ ✓ ✓ ✓

✓

✓ ✓

✓ ✓

✓ ✓ ✓ ✓

✓ ✓ Some

✓ ✓ ✓ ✓

✓

✓

✓ ✓ ✓ ✓

✓

✗ ✓

✓ ✗ ✓ ✓

✓

TFP

Productivity LPV, LPQ ✓

Serv

✓

Mfg

Coverage

✓ ✓

✓ ✓ ✓

✓

✓

✓

✓

MFP

Plant Estab Firm Plant Firm Firm Firm Firm Firm Firm Firm

Estab

Estab Estab

Firm Estab

Firm Plant Firm Firm

Firm

Unit

Turnover €0.58m Emp 1 Turnover €5m Emp > 20, emp < 20 → Sample Emp 1 Emp 100, emp 100 → Sample Emp 1 Emp ≥ 9 and $2m threshold Emp ≥ 30 sample of 10–29 Emp ≥ 10 Emp ≥ 10 Emp ≥ 1 Emp ≥ 1 Emp ≥ 10 Emp ≥ 5 Emp ≥ 1 Emp ≥ 1 Emp ≥ 1 Emp ≥ 1 Emp ≥ 1; sample for 1–15

Emp > 5

Threshold

Note: Mfg manufacturing; Serv business services; LPV labor productivity based on value added; LPQ labor productivity based on gross output; Emp employment.

Source

Periods

Summary of the data used for productivity decompositions

Country

Table 1.2

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

27

• Total Factor Productivity (TFP) at the firm level is the log of deflated output (measured as value added) minus the weighted log of labor plus capital, where the weights are industry-specific and the same for all countries. The weights were calculated using the expenditure shares of inputs for an industry using the cross-country average from the OECDSTAN database. In the World Bank project, TFP was also computed using country and industry-specific average expenditures shares of firms. • Multifactor Productivity (MFP) calculations use expenditure shares for labor, capital, and materials. • Labor productivity estimates are based either on deflated growth output (LPQ) or on deflated value added (LPV). Similarly, MFP estimates are based on deflated gross output and TFP estimates are based on deflated value added. • Deflators for output, value added, and materials are at the two to three digit industry level, usually based on National Accounts sources. Using common factor shares across countries for a particular industry allows, in principle, for cross-country comparisons of productivity levels. However, different measurement units for the inputs, notably capital, make cross-country comparisons of TFP or MFP levels problematic. To benchmark the levels of TFP and MFP, the measured units of capital are adjusted with a multiplicative factor, such that value added minus payroll (or gross output minus payroll and materials expenditures) represents a return to capital of eight percent.13 1.4 A Canonical Representation of the Measurement Problems As discussed in the previous section, despite all efforts to harmonize the data, measurement issues remain that can affect cross-country comparisons. In reviewing such measurement issues we use the following simple notation: the indicator I is some aggregate of a (vector of) variables X, with aggregation taking place across units (firms or establishments) f that are element of the (sub)population : (1)

I A[Xf | f ∈ ].

For simplicity, we drop all subscripts (i.e., for countries as well as for disaggregated groupings), such as industry or size-class. These disaggregations are dealt with by adding an appropriate subscript to I and X, and by aggregating over individual firms in an appropriately defined subset of . With this notation framework, we assess measurement problems for a host of indicators. In particular, we consider various aggregator functions, A[..], 13. This adjustment is similar to the arbitrary adjustments to TFP made by Bernard and Jones (1996b) in order to compare apples and oranges.

28

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

such as sums, means, variances, covariances, or statistical analyses yielding reduced-form or structural coefficients (e.g., the aggregator function could be the ordinary least squares (OLS) estimator from a multivariate regression).14 The variable itself may be an aggregation of a function of one or more micro-level variables, such as a ratio (e.g., output per unit of labor) or a transformation using firm-level observations from multiple periods, such as a first difference. Alternatively, the indicator may be a function of aggregated variables (e.g., aggregate productivity as ratio of aggregate output to labor). Finally, the indicators may vary by the (sub)set of firms over which the aggregation takes place. For example, the typical productivity decompositions (see following equations) focus on the contribution to aggregate productivity growth of different sets of firms (e.g., continuing, exiting, and entering firms). Measurement errors can be analysed in a typical errors-in-variable framework, such as: (2)

X X∗ ε

where the observed value, X, is equal to the actual value, X ∗, plus an error term. For the computed indicators, a necessary extension to the framework is that the observed and the actual set of firms, ∗, may differ as well: (3)

∗

where is a general form of disturbance to the correct or actual set of firms in ∗. The disturbance takes away—or adds—units to the actual set. A simple example is when the focus of the analysis is on firms in a given industry, but some firms are erroneously classified in this industry even if they largely operate in another industry. Similarly, the actual set of continuing firms needed for decompositions of productivity growth is given by the intersection of the actual sets of firms at time t and firms at time t – s. Through errors applied to the actual sets at t or t – s the observed set of continuers may deviate from the actual, as will the complementary sets of observed exiting firms and entrants. As an added complication, it may be that the observed set differs from the actual set, but that the actual set is a statistical sample drawn from the actual universe. Or it may be that the observed set is a statistical sample drawn from the observed universe, which itself is a noisy version of the actual universe. We abstract from this by taking the sampling scheme and the errors in classification to both be represented by , regardless of the order in which the sampling process and the errors drive a wedge between the actual universe and the observed set of firms. 14. The latter possibility includes treating estimated parameters from studies of individual countries as aggregate indicators.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

29

Once a differentiation is made between the location of the errors, namely in the measurement of the variable(s) at the micro level or in the sampling or registration of the micro-units over which aggregation is made, the effects of the various measurement problems can be traced and different forms of errors may be compared. It should be stressed that while these two types of errors affect the analysis of firm-level data in each individual country, differences in characteristics of these errors across countries influence even more cross-country comparisons. In the remainder of this subsection, we explore some examples of how measurement error in both the measures of economic activity and measurement error due to sample selection can impact the measures of the distribution of output, employment, productivity levels, and productivity dynamics drawn from firm-level data. 1.4.1 Mean or Sum Both measurement errors discussed above affect aggregation indicators such as the mean or sum of firm-level data. We first discuss the case when generates random errors in obtaining the observed set of firms from the actual set. When the indicator of interest is the mean employment per firm, we get a consistent estimate by taking a normal average.15 Without measurement error of the firm-level variable, the variance of this estimator of the first moment is negligible, given the generally large size of available samples (often 90 percent or more of total employment is in the sample). With classical measurement error in employment at the firm level, ε increases the standard deviation of the (unbiased) estimate of the first moment. The estimate of mean firm-size across industries is unbiased, as the extra firms allocated to one industry represent a loss in another and, on average, the effect will be zero. With measurement error ε proportional to size, for example because of weighted sampling by size strata, sample weights are needed to get consistent estimate of first moment of the firm-size distribution. When the indicator of interest is the difference between the mean (or the sum) of two different level measures (e.g., labor productivity can be viewed as the difference of the [log] of aggregate output and employment), the previous remarks apply. The differencing does not solve the problem, or even creates further problems if the expected value of the measurement error of both measures is zero. But the variance of the estimated mean is the sum of the two classical measurement error variances, so, in this example, we have a noisier estimate of mean productivity. We need to take this into account when comparing productivity levels across countries. But having an estimate of the variance of ε would help to assess whether differences in mean productivity across country are significant. 15. It should be stressed that, given data availability, we define labor input as the number of employees and do not control for hours worked.

30

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

1.4.2 Mean or Sum; Endogenous (Sub)samples The measures of mean firm-size and number of firms by size class fall into this category, and will be noisy owing to misclassification. The sizeclass criterion used to split the subsample is not independent of ε: firms with positive noise are more likely to be above a threshold, firms with negative noise more likely below. This is a typical problem (e.g., the well known result of nonclassical measurement errors of dichotomous 0–1 indicator variables built from continuous variables with classical noise). A typical solution for this type of problem is to base the classification of firm-size on average employment in two periods. However, using only firms observed in two periods may, depending on the indicator of interest, introduce a selection bias. This problem of interaction between ε and characteristic used to make the (sub)samples in aggregation shows up for means by quartiles, for job flows, and for other such splits with endogenous classification. The problem is exacerbated if sampling errors () vary systematically with the same characteristics. In principle, weighted results can overcome this problem, but in many cases the at-risk population for the analysis is above a minimum size threshold. 1.4.3 Mean or Sum; Longitudinal Linkages and Measures of Change If aggregations are to be made over subsamples that are based on longitudinal linkages over time, such as entry/exit/continuer status, the sampling noise becomes quite important. For example, if we consider the employment of entering, exiting, and continuing firms, the measurement error in firm-level employment is coupled with possible mismeasurement of the status variables due to poor longitudinal IDs. In addition to measurement in the firm-level indicator and status variables, sample selection can play a large role here since under-sampled groups may exhibit very different firm dynamics.16 1.4.4 Higher Moments (Variances) In computing the variance of the distribution of our firm-level variables (e.g., employment) we start by assuming no sampling errors. The estimated variance of the variable will be true variance in the universe plus the variance of ε. Without knowing the distribution of micro-level measurement errors, higher moments cannot be compared directly across countries. One practical solution is to compute the variance of the distribution of employment averaged over two periods (e.g., the decomposition of productivity by Griliches and Regev [1995]). The difference between estimate of the variance and (the average of) the variances estimated from the two an16. Martin (2005) provides details on how sample weights should be used for computing productivity contributions from exit and entry.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

31

nual samples equals half the variance of ε. In other words, if the underlying true variance of our variable does not change over the two periods, the reduction in variance moving from the standard to the two-period average variance is a consistent estimate of 0.5 ∗ var(ε). However, this approach only works for calculating the variance of the cross-sectional distribution of the firm-level variable for continuing firms. We would also need to find out how the exit or entrant subsamples affect the variance of the full annual distribution of firm employment. No correction can be made for measurement error of employment for these firms. A closely related alternative is available if the distribution of the measurement error is common to all firms in a country. In this case, disaggregating the data by, for example, industry and then using a difference-in-difference comparison of the relative cross-industry variances for different countries can be made. Next consider a divergence between the observed and actual sample. If the sampling errors vary systematically by firm-size, we need to do appropriate weighting. If the sample varies, not because of sampling rules, but owing to error, this only matters if the errors are correlated with employment. If they are correlated, no consistent estimate can be made of higher level moments of the employment distribution. 1.4.5 Higher Moments (Covariances and Correlations) All of the previously mentioned problems apply to covariances, correlations, and, by association, estimates from regressions or other related multivariate statistical procedures. The problem with covariances is more complex since we must now deal with the covariance between the measurement error of two variables (either the same variable at different points in time or different variables at the same unit of time). Classical measurement error will bias any given correlation, but in many cases the measurement error may be systematic in complex ways. While the general intuition is that the classical measurement error implies lower covariances and correlations, in this setting the measurement error may yield different results. For example, one key question with firm-level data is whether more productive businesses have higher market shares. Classical measurement error in output measures will yield spuriously high covariances between the output share of a business and its measure of productivity, while classical measurement error in labor input will result in spuriously low covariances between employment share of a business and its measure of productivity. The previously mentioned issue needs to be addressed in particular for the indicator of the gap between weighted and unweighted productivity. The gap is proportional to the covariance between labor productivity and firm employment. If output and labor input are both measured with (classical) error, the gap will be underestimated, with the underestimation dependent on the variance of the measurement error in labor input. In this case, an estimate of the variance of the measurement error of the firm level

32

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

variable will be useful to know how to adjust cross-country differences in the estimated gap. If instead, the statistical agency uses labor productivity as an analytical ratio to edit the underlying micro data, then the measurement error of productivity and labor may be uncorrelated, so that the gap measure will be unbiased. In either case, computing the covariance between the cross-section of the time average of productivity and the crosssection of the time average of employment will produce a gap estimate with a lower bias, because the mean measurement error goes to zero as more periods are added. Further, difference-in-differences approaches, for example looking at relative movements between gaps in different industries, and comparing this across countries or over time, will provide robust estimates if the measurement error process of the firm-level variable does not change over time or across industries. 1.5 Assessing the Process of Creative Destruction We start our review by looking at the distribution of firms by size, for the total business sector and the subsectors. We then turn to the analysis of firm demographics—the entry and exit of firms and their impact on employment. Finally, we look at the evolution of cohorts of new firms over the initial years of their life. In all cases, our objectives are to present some of the basic facts that emerge from the newly developed cross-country data and also to evaluate the measurement and inference problems that emerge from such comparisons. In all our analysis we look at simple cross-country comparisons, but also at within-country variations along different dimensions (size, industry). We claim that the difference-in-difference approach is essential to extract valuable information from our distributed micro-data analysis for at least two reasons: • First, despite our efforts to harmonize the data across countries, there remain some differences in key dimensions: size or output thresholds that exclude micro-units, differences in the sectoral coverage and in some cases as well as differences in the definition of the unit of observation. These differences may all contribute to limit simple crosscountry comparisons using single indicators of the creative destruction process. • Second, and probably more importantly, simple cross-country comparisons on specific dimensions of the process of creative destruction may be misleading or inadequate. Differences in market structures and in institutions may lead to differences in the nature of creative destruction rather than in its absolute magnitude. For example, high barriers to entry may not reduce the overall magnitude of firm turnover

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

33

but rather the composition of entrant and exiting firms. Facing high entry costs, new firms may choose to either enter very small and avoid the bite of regulations (especially in developing countries), or enter with a large size and smooth the entry costs over a larger capital investment. This may lead to bimodal distributions of firm entry by size but not a lower total entry rate. Likewise, in countries with high barriers to entry (and in turn high implied survival probabilities of marginal incumbents), the average productivity of entrants will rise while the average productivity of incumbents and exiting businesses will fall. Similar predictions apply to policies that subsidize incumbents and/or restrict exit in some fashion. These institutional distortions might yield a larger gap in productivity between entering and exiting businesses, but this gap is not by itself sufficient to gauge the contribution or efficiency of the creative destruction process. In the empirical analysis presented in the remainder of this section and in the next section we focus on: • The period from 1989 onward, and use period averages instead of data for individual years to minimize business cycle effects and possible measurement problems.17 • Twenty-three aggregate industries that cover the entire business sector while maximizing country coverage from the forty-two three-digit (ISIC Rev. 3) industries that are available in some databases.18 1.5.1 Indicators Collected The use of annual data on firm dynamics implies a significant volatility in the resulting indicators. In order to limit the possible impact of measurement problems, it was decided to use definitions of continuing, entering, and exiting firms on the basis of three (rather than the usual two) time periods. Thus, the tabulations of firm demographics is based on the following variables: Entryi,s,t: The number of firms entering in industry i, in the size class s and in year t. Also tabulated, if available, was the number of employees in entering firms. Entrant firms (and their employees) were those observed as (out, in, in) in the register at time (t – 1, t, t 1). Exiti,s,t: The number of firms—and related employees—that leave the register. Exiting firms were those observed as (in, in, out) the register in time (t – 1, t, t 1). 17. For Finland, we use the sample 1992–1998 because in the first years of the 1990s a number of large firms changed legal form in Finland, thus obtaining a different firm code in the business register. This reregistration would inflate firm turnover rates for large firms and distort the assessment of firm characteristics among entrants and exiting firms. 18. These twenty-three industries also correspond to the sectoral disaggregation of the OECD Structural Analysis (STAN) database. See www.oecd.org/data/stan.htm

34

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

One-year firmsi,s,t: The number of firms and employees in those firms that were present in the register for only one year. These firms were those observed as (out, in, out) the register in time (t – 1, t, t 1). Continuing firmsi,s,t: The number of firms and employees that were in the register in a given year, as well as in the previous and subsequent year. These firms were observed as in the register in time (t – 1, t, t 1). In practice, a number of complications arise in constructing and interpreting data that conform to the definitions of continuing, entering, and exiting firms described above. In particular, the one-year category, in principle, represents short-lived firms that are observed in time t but not in adjacent time periods and could therefore be treated as an additional piece of information in evaluating firm demographics. However, in some databases this category also includes measurement errors and possibly ill-defined data. Thus, the total number of firms in our analysis excludes these oneyear firms. Given the method of defining continuing, entering, and exiting firms, a change in the stock of continuing firms (C ) relates to entry (E ) and exit (X ) in the following way: (4)

Ct Ct1 Et1 Xt.

This has implications for the appropriate measure of firm turnover. Given that continuing, entering, exiting and one-year firms (O) all exist in time t then the total number of firms (T ) is: (5)

Tt Ct Et Xt Ot.

From this, the change in the total number of firms between two years, taking into account equation 4, can be written as: (6)

Tt Tt1 Et Xt1 Ot Ot1.

Assuming that the one-year firms are measured with random noise, the difference of these firms in year t and t – 1 is expected to be equal to zero. Thus, a turnover measure that is consistent with the contribution of net entry to changes in the total number of firms should be based on the sum of contemporaneous entry with lagged exit. The above indicators were split into eight firm-size classes, including the class of firms without employees.19 The data thus allow detailed comparisons of firm-size distributions between industries and countries.20 Further, 19. The eight size classes are as follows: no-employees, 1–9 employees, 10–19, 20–49, 50– 99, 100–249, 250–499; 500. For the OECD countries there are only six size groups, with the two groups between one and twenty combined and the groups between 100 and 500 combined. 20. Available data also allow the calculation of total job turnover and the fraction of it due to the entry and exit of firms.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

35

the collected data allow for survival analysis for a selection of countries over varying time periods. 1.5.2 The Distribution of Firms by Size Firm size is an important dimension in our analysis for several reasons. The empirical literature suggests that small firms tend to be affected by greater churning, but also have greater potential for expansion.21 Thus, a distribution of firms skewed towards small units may imply higher entry and exit, but also greater post-entry growth of successful firms. Alternatively, it may point to a sectoral specialization of the given country towards newer industries, where churning tends to be larger and more firms experiment with different technologies. Another factor relevant here is that small businesses may not be subject to the same regulations as large businesses, because they may be exempted to certain laws or regulations (e.g., labor regulations) or because they can more easily avoid them in countries with weak enforcement. In addition, the distribution of firm by size is likely to be influenced by the overall dimension of the internal market—especially for firms in nontradable sectors—as well as the business environment in which firms operate that can discourage firm expansion. The analysis of firm size raises clear problems for cross-country comparability related to sample selection problems. For most of the countries in our sample, the data cover all firms with at least one employee, but the cutoff size is five employees in South Korea,22 ten employees in Chile, Colombia, and Indonesia. And for France and Italy, the data exclude firms with sales below a certain threshold. Second, even amongst the countries for which data cover all firms with at least one employee, the unit of reference is the plant instead of the firm in some countries, and the definition of both may vary across countries. Finally, from a sectoral perspective, community services and utilities are more difficult to compare, given the important role of the public sector, whose coverage changes from country to country, and of regulation in these sectors. Table 1.3 presents the share of firms—and associated employment—in the first two classes of our size distribution: firms with fewer than twenty employees (panel A) and firms with twenty to forty-nine employees (panel B). The table suggests that in all countries the population of firms is dominated by micro and small units. Micro units (fewer than twenty employees) account for at least 80 percent of the total firm population. Their share in total employment is much lower and ranges from less than 15 percent in some transition economies (e.g., Romania)—which still reflects the 21. See Sutton (1997) for a review of the literature. 22. The annual enterprise survey in Venezuela is representative of all firms with at least fifteen employees, and only includes a random sample of firms below this threshold. In our analysis, we have used the data for Venezuela with reference to firms with twenty or more employees, given the lack of coverage for the lower size classes.

Total economy

Nonagriculture business sectora Manufacturing

Firms

Small firms across broad sectors and countries, 1990s

Total business services Total economy

Nonagriculture business sectora

Manufacturing

Employment Total business services

Panel A: The share of micro firms in the total population of firms and in total employment (firms with fewer than 20 employees as a percentage of total) Industrial countries Denmark 91.3 89.5 76.6 92.3 32.7 31.1 17.6 35.0 France 82.1 82.3 77.9 82.0 15.9 16.0 19.9 13.6 Italy 93.8 93.8 88.6 96.0 35.9 39.6 31.3 36.4 Netherlands 96.3 96.5 88.3 97.1 31.8 36.8 18.3 32.9 Finland 93.6 92.7 85.4 95.3 29.5 32.7 13.5 39.1 West Germany 89.6 85.8 83.3 25.8 23.8 16.6 Portugal 89.2 88.9 75.3 93.8 32.2 31.4 18.9 42.9 U.K. 81.3 12.4 U.S. 88.0 88.0 72.6 88.7 18.4 19.3 6.7 19.9 Latin America Brazil 82.4 17.7 Mexico 90.1 90.0 82.8 92.2 23.2 24.5 13.9 28.5 Argentina 90.0 89.4 82.1 91.2 27.7 27.7 21.3 27.7 Transition economies Slovenia 87.7 88.0 71.6 93.1 13.4 13.5 5.1 26.0 Hungary 84.4 85.5 71.1 90.8 16.0 16.4 8.8 23.6 Estonia 80.6 81.3 64.6 87.1 22.8 22.6 11.5 34.2 Latvia 87.7 87.7 87.8 87.6 24.7 24.8 26.9 24.2 Romania 90.9 91.5 77.1 95.6 12.9 12.8 4.2 31.6 East Asia Koreab 57.0 11.1 Taiwan (China) 82.5 26.6

Table 1.3

This aggregates excludes agriculture (ISIC 1–5) and community services (ISIC3: 75–79). In Korea, data cover firms with 5 or more employees.

b

a

Panel B: The share of small firms in the total population of firms and in total employment (Firms with 20–49 employees as percentage of 20) Industrial Countries Denmark 67.6 66.9 60.4 69.7 22.5 22.9 19.5 France 53.2 53.3 63.0 49.9 12.9 12.9 20.0 Italy 67.3 69.4 67.0 65.5 20.0 22.8 23.0 Netherlands 58.8 62.9 53.9 58.5 15.3 18.6 14.2 Finland 61.0 62.0 54.3 65.2 16.3 18.9 10.3 West Germany 59.0 60.7 54.0 17.2 17.7 12.8 Portugal 64.0 63.5 59.1 69.2 22.6 22.0 21.5 U.K. 51.2 11.4 U.S. 62.7 65.0 55.0 63.1 12.2 13.5 7.3 Latin America Chile 51.4 15.3 Colombia 49.0 13.9 Mexico 59.0 58.9 51.2 62.9 15.1 16.0 11.5 Brazil 58.7 15.0 Venezuela 24.9 4.5 Argentina 61.1 61.7 59.8 60.6 18.4 18.6 19.0 Transition economies Slovenia 38.5 38.4 29.2 49.8 7.4 7.2 4.6 Hungary 54.6 56.2 48.3 61.9 12.9 12.5 10.1 Romania 45.3 46.2 39.2 55.1 5.7 5.5 3.3 Estonia 62.4 62.6 55.5 66.9 22.1 21.3 17.0 Latvia 58.1 57.9 60.1 58.0 17.8 17.9 20.1 East Asia Korea 59.4 17.3 Indonesia 49.6 7.3 Taiwan (China) 65.8 25.2 12.4 14.3 11.2 27.1 17.5

16.8

17.1

12.7

22.9

22.0 11.2 15.6 13.9 19.0

38

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

presence of large (formerly or still) state-owned firms inherited from the central plan period—to less than 20 percent in the United States and around 30 percent or more in some small European economies. To check the robustness of these results, we also look at the incidence of small firms (i.e., the population twenty to forty-nine over the total population of firms with twenty or more employees). This allows for a larger country sample and greater comparability as it is not affected by differences in the threshold of micro units. Small firms account for about 50 percent of the total population of firms with twenty or more employees, again with the exception of the transition economies (e.g., Romania and Slovenia) still dominated by large firms. It is also important to notice that the rank ordering of countries obtained by focusing on the share of micro units (fewer than twenty) is only loosely correlated with the rank order of the same countries based on the share of small firms (twenty to forty-nine).23 Cross-country differences in firm size may reflect specialization towards industries with a small efficient scale. To assess the role of sectoral specialization versus within sector differences, we first look at the average firm size across industries in table 1.4. The first column of the table presents the cross-country average size for each industry and the other columns present the country/industry average relative to the industry cross-country average. If technological factors were predominant in determining firm size across countries, we should find that the values in the country columns to be concentrated around one. If, on the contrary, the size differences were explained mainly by country-specific factors inducing a consistent bias within industries, then we would expect the countries with an overall value above (below) the average (i.e., in the “Total” category) to be characterized by values generally above (below) one in the subsectors. Among industrial countries, the United States has a very high proportion of industries with an above-average firm size, both in manufacturing and in business services. The Western European countries tend to have smaller firms in most industries, with several exceptions in heavy industries (e.g., Germany and Portugal), high-tech industries (e.g., Finland and, to a lesser extent, France and Italy), or some of the low-tech industries (e.g., United Kingdom) or in basic services (e.g., France and Portugal). Thus, it is not possible to map differences in firm size across countries according to either the overall size of the country (apart from the United States), the underlying technological level of the industry, or its degree of maturity. Another way to shed light on country-specific factors versus industryspecific technological factors is to use a shift-and-share decomposition. The decomposition identifies the component due to cross-country differences in firm size within each sector, the component due to differences in the sectoral composition across countries, and a cross term that can be 23. The country rank correlation is only 0.3.

0.67 0.76 0.87 1.01 0.79 0.82 0.92 0.89 1.04 0.87 0.82 0.71 0.76 0.68 1.71 0.82 1.00 0.95

143

115

81

125

870 218 107

125 272

86 153

140

202

204

148

0.87

Industrial

77 173 137

118

Crosscountry average

1.04

0.99

1.14

0.52

1.17 1.32

1.20 1.24

1.10 0.96 1.13

1.09

1.20

1.25

1.00

1.49 1.40 1.14

1.20

Other countries

0.77

0.65 0.26

0.10 0.58 0.81

0.66

0.71

0.55

0.93

0.44 0.27 0.69

0.76

Denmark

0.72

0.92

1.08

0.79

0.84

0.78 0.65 0.81

0.59

0.68

0.68

0.57

1.02 0.48 0.72

1.11

France

0.62

0.82

0.53

1.27

0.69 0.49

0.61 0.52

0.34 0.80 0.69

0.71

0.58

0.50

0.66

0.84 0.44 0.63

0.84

Italy

0.17

0.06

0.14

0.02

0.27 0.18

0.29 0.32

0.42 0.25

0.29

0.25

0.22

0.33

0.21 0.13 0.34

0.38

Netherlands

0.95

1.39

0.57

1.99

0.78 0.84

0.97 1.45

0.92 1.01 0.96

2.02

1.73

0.78

1.15

0.78 1.12

1.01

Finland

0.82

0.33 1.54 1.20

0.86

0.91

0.89

0.62

0.54 1.59 0.94

0.88

West Germany

Within-industry average firm size, firms with 20 or more employees (as a share of cross-country sectoral average)

Agriculture, hunting, forestry and fishing Mining and quarrying Total manufacturing Food products, beverages, and tobacco Textiles, textile products, leather, and footwear Wood and products of wood and cork Publishing, printing, and reproduction of recorded media Coke, refined petroleum products, and nuclear fuel Chemicals and chemical products Rubber and plastics products Other non-metallic mineral products Basic metals Fabricated metal products, except machinery and equipment Machinery and equipment, N.E.C. Office, accounting, and computing machinery Electrical machinery and apparatus, nec Radio, television, and communication equipment Medical, precision, and optical instruments

Total economy

Country

Table 1.4

0.90

1.58

0.98

0.46

0.74 0.43

0.66 0.37

2.52 0.52 0.64

0.58

0.70

0.76

0.61

0.74 0.39 0.62

0.72

Portugal

1.12

1.41

4.52

1.21 1.11

1.12 1.29

1.54 2.34 1.56

1.60

1.00

1.58

2.99

1.00 1.14 1.71

1.32

U.S.

2.13 (continued )

0.85 0.82

1.31 0.59

0.84

0.86

1.07

1.53

1.03

U.K.

(continued)

Agriculture, hunting, forestry, and fishing Mining and quarrying Total manufacturing Food products, beverages, and tobacco Textiles, textile products, leather, and footwear Wood and products of wood and and cork

Total economy

Country

Motor vehicles, trailers, and semi-trailers Other transport equipment Manufacturing nec; recycling Electricity, gas, and water supply Construction Services Market services Wholesale and retail trade; restaurants and hotels Transport, and storage, and communication Finance, insurance, Real Estate, and business services Community, social, and personal services

Country

Table 1.4

1.00 1.07 0.94

250

135

123

0.81 0.88 0.99 0.83

0.76

0.75

0.85

1.19

Colombia

0.93

79

Chile

0.94 1.21 0.87 1.24 0.79 0.95 0.94

Industrial

316 305 92 505 75 111 107

Crosscountry average

1.01

0.98

0.99

1.56 0.75 1.01

1.00

Mexico

1.06

0.91

1.00

1.09

1.04 0.85 1.13 0.62 1.32 1.07 1.08

1.72

2.50

1.35

2.05 3.14 1.49

1.40

0.95

1.24

1.18

1.20 1.23 1.08

1.11

1.21

1.58

0.71

0.76

1.60 0.85 0.60 1.55 0.76 1.13 0.98

Italy

3.43

2.82

0.95

1.49

Indonesia

1.04

1.29

0.38

1.45

0.92 0.89 0.95 0.15 0.93 1.30 1.36

France

Hungary

0.92

0.98

0.41

0.83

0.77 0.37 0.75 0.84 0.82

Denmark

Slovenia

Other countries

0.77

0.91

0.77

0.79

Korea

0.70

0.71

1.10

0.57

Taiwan (China)

0.52

0.37

0.21

0.36

0.10 0.08 0.88 0.29 0.32 0.43 0.37

Netherlands

0.70

0.97

0.83

0.83 2.92 0.73

0.72

Estonia

0.85

0.75

1.21

0.30 0.68 0.73 0.29 0.98 0.90 1.15

Finland

0.84

0.88

1.28

0.50 0.86

Brazil

0.41

0.88 0.36 0.74

1.06

0.81

0.97

0.42 0.65

0.83

3.00

1.77

2.20

1.55

1.23

0.86

1.36

2.45 4.16 1.39 0.97 0.83 1.35 1.24

U.S

0.71

0.73

0.90

0.71 0.71 0.70

0.85

Argentina

0.97

U.K.

Romania

0.58

1.52

3.64

0.76

0.53 0.58 0.56 5.38 0.97 0.80 0.88

Portugal

Latvia

West Germany

Publishing, printing, and reproduction of recorded media Coke, refined petroleum products, and nuclear fuel Chemicals and chemical products Rubber and plastics products Other non-metallic mineral products Basic metals Fabricated metal products, except machinery and equipment Machinery and equipment, N.E.C. Office, accounting, and computing machinery Electrical machinery and apparatus, nec Radio, television, and communication equipment Medical, precision, and optical instruments Motor vehicles, trailers, and semi-trailers Other transport equipment Manufacturing nec; recycling Electricity, gas, and water supply Construction Services Market services Wholesale and retail trade; restaurants and hotels Transport, storage, and communication Finance, insurance, Real Estate, and business services Community, social, and personal services 0.85 0.31 0.63 0.90 1.05 0.67 0.94 0.52 0.35 0.58 0.70 0.50 0.38 0.39 0.68

0.95

0.16 0.56 0.82 0.83 1.03

0.98 0.62

0.18

0.52

0.43

0.43

0.30 0.62 0.80

0.67 1.10 1.32 0.35 1.89 1.13 1.20 1.47 1.26 0.56 0.77

0.95 0.46 1.15 1.48

1.04

0.80

1.48

0.27

1.79 1.23

0.44 1.37 1.82 1.45 1.79

1.62

1.41 0.57 1.06 0.48 0.86 0.99 0.90

0.98

0.08 0.68 0.88 0.98 0.69

0.83

0.75

0.80

1.77

1.12

1.01 0.58 1.04 0.90 0.87 1.19 1.27

0.82

1.14

1.40

1.22

1.02 0.72

7.32 1.42 0.91 1.33 1.04

0.84

0.71 0.68 1.86

2.64

2.37

1.15

1.81 1.05

0.12 0.86 1.98 0.80 0.94

1.39

0.61 1.11 0.91

0.67

0.88

0.52

0.91

0.91 0.53

0.18 0.56 0.70 0.76 0.52

0.64

0.31 0.30 0.67

0.46

0.41

0.90

0.56 0.35

0.52 0.62 0.51 0.25

0.51

0.50

0.43

0.62

0.65

0.52 0.74 1.37 0.37 0.75 0.65 0.69

0.74

0.88

0.89

0.25

0.69 0.59

1.70 0.58 0.48 0.79 0.21

0.54

0.83 0.31 0.77

0.73

0.85

0.75

0.92

0.92 0.79

0.39 0.68 0.84 0.60 0.66

0.99

0.87

0.73

0.65

1.16

0.10 0.15 0.56 0.17 1.35 0.90 0.93

1.14

0.82

0.22

0.73

0.71 0.84

0.03 0.28 0.49 0.79 0.10

0.67

1.45

1.18

1.72

1.14

4.52 3.94 2.91 1.65 2.41 1.40 1.44

2.78

1.93

4.10

0.81

2.27 6.47

2.08 3.01 2.66 3.44 5.19

2.61

0.93

1.24

0.58

0.88

0.43 0.24 0.52 0.39 0.99 0.99 1.00

0.39

0.93

0.38

0.32

0.65 0.43

0.38 0.56 0.60 0.80 0.50

0.74

42

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

Table 1.5

Shift and share analysis of the determinants of firm size Contribution coming from differences in:

Country Denmark France Italy Netherlands Finland Portugal U.K. U.S. Canada Brazil Mexico Argentina Slovenia Hungary Estonia Latvia Romania Korea Taiwan (China)

Sectoral composition

Average size of firms

Interaction between sectoral comp. and size

Total

0.14 0.08 –0.02 0.01 –0.02 –0.05 –0.01 0.00 0.01 0.00 0.06 0.04 0.01 0.01 –0.03 –0.03 0.08 0.04 0.03

–0.03 –0.05 –0.17 –0.13 –0.05 –0.04 –0.02 0.42 0.03 –0.08 –0.06 –0.14 0.30 0.14 0.07 –0.20 0.97 0.12 –0.14

–0.09 –0.05 –0.01 –0.04 –0.02 0.02 –0.03 –0.07 –0.02 –0.01 –0.02 –0.02 –0.07 –0.02 0.02 0.04 –0.36 0.02 –0.03

0.01 –0.02 –0.20 –0.16 –0.09 –0.07 –0.06 0.34 0.01 –0.09 –0.02 –0.12 0.24 0.12 0.06 –0.20 0.68 0.18 –0.14

Note: The Total represents the percentage deviation of average size from the cross-country average; the other columns decompose the total into subcomponents.

interpreted loosely as an indicator of covariance: if it is positive, size and sectoral compositions deviate from the benchmark in the same direction.24 The decomposition (table 1.5) suggests that within-sector differences generally play the most important role in explaining differences in overall size across countries: this component is much larger (in absolute terms) that the sectoral composition component in many countries.25 The withinindustry size component is particularly large in the United States, con24. The decomposition is as follows: sj – s ∑ ij sij – ∑ i si ∑( ij – i )si ∑(sij – si) ii ∑(sij – si )( ij – i ) i i i i i s s where sj is the average firm size in country j, sij is the average firm size in subsector i, and ij is the share of firms in subsector i with respect to the total number of firms; s is the overall mean across countries and i is the share of overall number of firms in subsector j. 25. In a sensitivity analysis, we have also replicated the decomposition for the sample of OECD countries and the non-OECD countries (including also Hungary and Mexico) separately. The results are broadly unchanged in the two subsamples. Moreover, we have replicated the decomposition at a finer level of sectoral disaggregation and again the results are broadly unchanged.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

43

firming the idea that a larger internal market tends to promote larger firms, but also in some transition economies (e.g., Romania). However, the sectoral composition also plays an important role in some small European countries such as Denmark and Portugal, but also in a relatively larger country such as France and an emerging economy like Mexico.26 All in all, differences in average firm size seem to be largely driven by within-sector differences, although in some countries sectoral specialization also plays a significant role. Smaller countries tend to have a size distribution skewed towards smaller firms, but the average size of firms does not map precisely with the overall dimension of the domestic market. 1.5.3 Gross and Net Firm Flows The second step in our analysis is to look at the magnitude and characteristics of firm creation and destruction. We present entry and exit rates for all firms with more than one employee, and for those firms with twenty and more employees, to avoid comparability problems related to size cutoffs in some country data. As discussed in the previous section, we focus on time averages (1989 onwards) rather than annual data to minimize possible measurement problems. Figure 1.2 shows entry and exit rates for the business sector and for manufacturing. The results point to a high degree of turbulence in all countries (and confirm one of the regularities pointed out by Geroski [1995] for industrial economies). Many firms enter and exit most markets every year. Limiting the tabulations to firms with at least twenty employees to maximize the country coverage, total firm turnover (entry plus exit rates)27 is between 3 and 8 percent in most industrial countries and more than 10 percent in some of the transition economies. If we extend the analysis to include micro units (one to nineteen employees), we observe total firm turnover rates between one-fifth and one-fourth of all firms. These data also confirm previous findings that in all countries net entry (entry minus exit) is far less important than the gross flows of entry and exit that generate it. This suggests that the entry of new firms in the market is largely driven by a search process rather than augmenting the number of competitors in the market (a point also highlighted by Audretsch [1995]). There are also interesting differences across countries. The Latin American region shows a wide variety of experiences; for example, while Mexico 26. The decomposition also suggests that the two elements of the decomposition are not highly correlated; the interaction term is negative in most cases, and the sign of the two elements of the decomposition also tend to differ in most cases. In other words, there is no clear link between size structure and sectoral specialization tilted towards productions naturally characterized by large firms (see Davis and Henrekson [1999] for a discussion). 27. The entry rate is defined as the number of new firms divided by the total number of incumbent and entrants firms producing in a given year; the exit rate is defined as the number of firms exiting the market in a given year divided by the population of origin (i.e., the incumbents in the previous year).

A

B

C

Fig. 1.2 Firm turnover rates in broad sectors, 1990s: A, manufacturing, firms with 20 or more employees; B, manufacturing, firms with at least 1 employee; C, total business sector, firms with 20 or more employees; D, total business sector, firms with at least 1 employee

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

45

D

Fig. 1.2

(cont.)

and the manufacturing sector of Brazil show vigorous firm turnover, Argentina shows less turbulence, closer to the values observed in some continental European countries. The transition economies of Central and Eastern Europe provide other interesting features. In most of these countries, firm entry largely outpaced firm exit, while more balanced patterns are found in other countries. Obviously this is related to the process of transition to a market economy, and is not sustainable over the longer run. Still, it points to the fact that new firms not only displaced obsolete incumbents in the transition phase but also filled in new markets that were either nonexistent or poorly populated in the past. As stressed in the previous section, differences in sample selection and measurement error in longitudinal linkages can yield spurious differences in measures of firm turnover. It is very difficult without detailed information about the statistical processing in each of these countries—as well as within country validation studies—to assess this problem. Instead, our approach is to consider related measures of firm dynamics that, in some fashion, attempt to overcome these measurement concerns. We begin our inquiry into the validity of the turnover data by first weighting firm turnover by employment and then comparing the size of entrant firms with that of the average incumbent. If we focus on the entire population of firms with at least one employee, we see that less than 10 percent of employment is, on average, involved in firm creation and destruction. The difference between unweighted and employment-weighted firm turnover rates arises from the fact that both entrants and exiting firms are generally smaller than incumbents. For most countries, new firms are only 20 to 60 percent the average size of incumbents. But the small size of entrants relative to the average incumbents is driven by different factors across countries. In particular, we observe that entrant firms are relatively smaller in the

46

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

United States than in most of the other industrial countries. This is in part due to the larger market of the United States that leads to larger average size of incumbents.28 But the wider gap between entry size and the minimum efficient size in the United States may also reflect economic and institutional factors (e.g., the relatively low entry and exit costs may increase incentives to start up relatively small businesses). In the transition economies, new firms are substantially different from most of the existing firms that were drawn from the centrally-planned period. Indeed, the net entry of firms (entry rate minus exit rate) is particularly large among micro units (twenty or fewer employees); during the centrally planned system there were relatively few of these micro firms, which exploded during the transition in most of business service activities, however. Unfortunately, the observed differences in the relative size of entrants across countries may still reflect longitudinal linkages problems. If, in some countries spurious entry is more prevalent and the continuing businesses that are spuriously labeled entrants are larger than true entrants, then this will increase the relative size of entrants in the country. An alternative approach to overcoming measurement problems in firm turnover measures is to disaggregate by some key business characteristic and compare within-country variations in firm turnover. One interesting characteristic in this context is obviously the business size. Figure 1.3 presents entry rates by different size classes in manufacturing. In most countries, entry rates tend to decline with firm size, consistent with the view that firms tend to enter small, test the market, and, if successful, expand to reach the minimum efficiency scale. But in some European countries, we observe a flattening of the entry rate for firms greater than twenty employees, or even a U-shaped relation whereby entry rates tend to increase for larger firms compared with small firms.29 It is interesting to notice that those countries where we observe the flattening of the entry rates are those generally characterized by relatively high administrative costs to set up a business.30 The latter may stimulate firms to enter very small—and thus partly avoid some of the entry costs that kick in at a given size—or enter at a larger size and thus spread these fix entry costs over a larger investment plan. This is only a working hypothesis, which is however corroborated by more detailed econometric analysis (see Scarpetta et al. 2002). Of course, 28. Geographical considerations are also likely to affect the average size of firms; firms with plants spreading into different U.S. states are recorded as single units, while establishments belonging to the same firm but located in different EU states are recorded as separate units. 29. Focusing on the total business sector suggests a more monotonic relationship between entry rate and size classes; however, the steepness of the downward relations is less marked in those countries where we observe a flattening or even a U-shaped relation in manufacturing. 30. For example, France (3.5), Italy (4.6), the Netherlands (1.6), and Finland (1.8) all have indicators of the administrative costs of setting up a business (least regulated 0, most regulated 6) largely above the United States (0.7) or the United Kingdom (0.8). See Nicoletti, Scarpetta, and Boylaud (1999) for details on these indicators.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

Fig. 1.3

47

Entry rates by firm size, manufacturing, 1990s

Note: Data for Finland are from 1992 to 1998.

the specific difference-in-difference approach works only if the measurement error in firm turnover does not vary systematically by size class. Longitudinal linkage problems interacting with sample selection problems that vary by size may be a problem in some countries. Another dimension that can be used for this difference-in-difference approach is clearly the industry. Sectoral variation within and between coun-

48

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

tries may reflect a rich mix of the technological, cost, and demand factors driving firm dynamics, as well as market structure and institutions in the country. Table 1.6 presents sectoral gross firm turnover rates (entry plus exit rates weighted by employment) normalized by the overall crosscountry industry average. As before, if technological and cost factors were predominant in determining the heterogeneity of firm dynamics across countries, we should find that the values in the country columns of table 1.6 are concentrated around one. The first element to report is that the variability of turnover rates for the same industry across countries is comparable in magnitude to that across industry in each country. Turnover rates (especially if weighted by employment) are somewhat higher in the service sector (especially in trade) than in manufacturing.31 However, in most countries, some high-tech industries with rapid technological changes and market experimentation had relatively high entry rates in the 1990s (e.g., office; computing and equipments; and radio, TV, and communication). Transition countries, as well as Mexico, tend to have greater firm churning than industrial countries, on average. The finding of important industry effects that hold across countries suggests a possible future avenue for the difference-in-difference approach to shed light on the role of institutions in shaping firm dynamics. Taking the U.S. firm dynamics as a benchmark for the underlying churning that is needed by technological and costs factors, it is possible to compare the cross-industry variation in the United States with that of other countries with stricter business regulations. If these regulations were indeed constraining firm dynamics, we should observe smaller variance in countries with stricter regulations. Some recent studies have indeed found some preliminary evidence that this is indeed the case.32 1.5.4 The Post-Entry Performance of Firms Another useful metric to characterize firm dynamics is to examine postentry performance of firms. Understanding the post-entry performance sheds light on the market selection process that separates successful entrant firms that survive and prosper from others that stagnate and eventually exit. In addition, post-entry performance is a measure that exploits variation that may be less subject to measurement error. Conditional on

31. In Italy, however, there appears to be only small differences in churning between manufacturing and services. This is particularly evident for the employment-weighted turnover and likely reflects the small differences in average size of firms between manufacturing and services. The lower turnover rate in the French service sector compared with that in manufacturing is likely to depend on the existence of a size threshold in the French data, which tends to be more binding in the service sector than in manufacturing. As an indication, the French data also suggest a higher average size of firms in the service sector than in manufacturing, in contrast with all other countries. 32. See Micco and Pages (2006); Klapper, Laeven, and Rajan (2006).

1.02 0.90 0.89 0.89 0.94 0.85 0.87 0.75 0.94 0.87 0.89 0.76 0.83 0.94 0.96 0.97 1.00 0.97

17.6

22.2

19.8

19.6

19.7 15.5 16.8

18.1 18.0

19.1 17.5

24.1

17.6

19.5

17.1

0.90

Industrial

19.8 18.9 19.0

21.6

Crosscountry average

1.04

1.00

1.05

1.05

1.19 1.09

1.19 1.30

1.38 1.09 1.19

1.21

1.27

1.11

1.19

0.97 1.17 1.18

1.16

Other countries

0.98

0.92 0.98

1.17 0.88

0.95

0.93

1.02

1.06

0.98 0.79 0.97

0.91

Denmark

1.12

0.95

1.08

1.06

0.97

1.18 1.17 0.94

0.99

0.80

0.88

1.06

1.08 0.85 1.06

0.89

France

1.00

0.80

0.81

1.02

0.73 0.74

0.67 0.59

0.51 0.83 0.78

0.71

0.69

0.79

0.83

0.78 0.61 0.77

0.74

Italy

0.67

0.95

0.94

0.86

0.70 0.63

0.76 0.81

0.63 0.93 0.72

0.77

0.62

0.80

0.54

0.66 1.21 0.72

0.77

Netherlands

0.74

0.93

0.88

0.94

0.87 1.08

0.94 0.80

0.94 0.86 0.93

0.84

0.92

0.90

0.95

0.94 0.91

0.96

Finland

Gross firm turnover across countries and sectors (as a ratio of cross-country industry average)

Agriculture, hunting, forestry, and fishing Mining and quarrying Total manufacturing Food products, beverages, and tobacco Textiles, textile products, leather, and footwear Wood and products of wood and cork Publishing, printing, and reproduction of recorded media Coke, refined petroleum products, and nuclear fuel Chemicals and chemical products Rubber and plastics products Other non-metallic mineral products Basic metals Fabricated metal products, except machinery and equipment Machinery and equipment, N.E.C. Office, accounting, and computing machinery Electrical machinery and apparatus, Nec Radio, television, and communication equipment Medical, precision, and optical instruments

Total economy

Table 1.6

0.53

0.53 0.67 0.67

0.66

0.49

0.73

0.57

1.14 0.47 0.61

0.76

West Germany

1.01

1.17

1.03

2.49

1.01

0.95 0.86

0.88 0.89

0.87

1.13

1.03

1.36 1.02 0.96

1.00

Portugal

1.57

1.37

1.39

1.36

1.04 1.37

1.48 0.74

1.12

1.12

1.13

1.43

1.24

U.K.

0.90

0.86

0.48

0.69 0.83

0.88 0.55

0.69 1.01 0.95

0.91

0.91

1.02

0.89

1.03 1.12 0.91

1.05

Canada

0.82 (continued )

0.93

0.83

0.98

0.77 0.82

0.83 0.81

0.82 0.95 0.98

0.91

1.05

1.19

0.98

1.09 1.07 0.93

0.93

U.S.

(continued)

Agriculture, hunting, forestry, and fishing Mining and quarrying Total manufacturing Food products, beverages, and tobacco Textiles, textile products, leather, and footwear Wood and products of wood and cork

Total economy

Motor vehicles, trailers, and semi-trailers Other transport equipment Manufacturing Nec; recycling Electricity, gas, and water supply Construction Services Bus sector services Wholesale and retail trade; restaurants and hotels Transport, and storage, and communication Finance, insurance, Real Estate, and business services Community, social and personal services

Table 1.6

0.88 0.95 0.94 0.82

22.9

24.0

23.9

21.7

0.92 1.27 1.29 1.23 1.30 1.48

17.6

22.2

19.8

1.22

Mexico

1.20

1.08

1.08

1.16

1.10 1.08 1.18 1.00 1.23 1.14 1.13

Other countries

19.8 18.9 19.0

21.6

Crosscountry average

0.92 0.94 0.88 1.00 0.85 0.89 0.90

Industrial

17.5 20.7 20.4 13.5 23.3 22.7 23.3

Crosscountry average

1.13

1.02

1.06

1.27 1.00 1.12

1.21

Slovenia

0.63

1.03

0.80

0.95

0.97 0.89 0.82 0.89 0.95

Denmark

0.56

0.74

0.69

0.76

0.81 0.76 0.73 0.58 0.83 0.70 0.75

Italy

1.39

1.03

1.32

1.00 1.05 1.17

1.21

Hungary

0.85

0.91

0.76

0.78

0.95 0.91 0.88 1.47 0.81 0.84 0.83

France

0.96

0.65

0.78

0.77 0.35 0.81

0.81

Estonia

0.63

0.92

0.70

0.66

0.64 0.72 0.81 1.80 0.57 0.76 0.78

Netherlands

1.29

1.31

1.51

1.16 1.31

Brazil

0.90

0.94

0.88

0.93 0.97 0.85 0.54 1.04 0.92 0.90

Finland

1.42

1.31

1.54

2.15 1.48

1.30

Latvia

0.84

0.61 0.44 0.62

West Germany

0.97

1.13

1.74

0.93

0.69 0.71 0.64 1.85 1.10 0.97 0.95

1.51 1.39 1.48

U.K.

1.19

0.98

1.26

Romania

Portugal

0.67

0.90

1.00

0.95

0.94 0.92 1.00 0.54 0.97 0.87 0.94

U.S.

0.79

0.86

0.86

0.56 0.85 0.78

0.75

Argentina

1.31

0.92

0.96

0.99

0.83 0.99 0.93 1.39 0.94 1.04 0.97

Canada

Publishing, printing, and reproduction of recorded media Coke, refined petroleum products, and nuclear fuel Chemicals and chemical products Rubber and plastics products Other non-metallic mineral products Basic metals Fabricated metal products, except machinery and equipment Machinery and equipment, N.E.C. Office, accounting, and computing machinery Electrical machinery and apparatus, Nec Radio, television, and communication equipment Medical, precision, and optical instruments Motor vehicles, trailers, and semi-trailers Other transport equipment Manufacturing Nec; recycling Electricity, gas, and water supply Construction Services Bus sector services Wholesale and retail trade; restaurants and hotels Transport and storage and communication Finance, insurance, Real Estate, and business services Community, social, and personal services 0.92 1.28 1.12 1.11 0.60 1.11 1.23 1.19

1.26 1.10 1.41 0.69 1.76 1.16 1.16 1.18 1.10 1.11 1.03

17.1

17.5 20.7 20.4 13.5 23.3 22.7 23.3

22.9

24.0

23.9

21.7

1.60

1.04

1.18

1.27

0.93

19.5

1.27 1.12

1.06

1.32

19.1 17.5

0.98 1.11

17.6

1.33 1.25

18.1 18.0

1.81 0.96 1.39

0.73

1.07 1.11 1.22

19.7 15.5 16.8

1.30

24.1

1.25

19.6

1.39

1.18

1.09

1.21

1.04 1.32 1.24 1.43 1.10 1.23 1.20

1.09

1.00

1.14

0.94

1.20 1.03

1.32 1.36

1.16 1.12 1.25

1.26

0.83

0.79

0.71

0.82

0.77 0.69 0.86 0.90 0.74 0.82 0.80

0.73

0.96

0.86

0.69

0.70 0.62

0.86 2.78

5.07 0.57 1.02

0.76

1.31 1.21 1.22

1.38

1.38

1.27

1.22

1.20 1.19

1.18 1.25

0.54 1.42 1.22

1.20

1.32

1.17

1.12

1.22

1.27 1.31 1.41 2.01 1.22 1.23 1.20

1.54

1.27

1.09

1.36

1.56 1.78

1.72 2.32

2.49 1.44 1.29

1.52

1.32

1.28

1.14

1.26

0.92 0.92 1.03 1.23 0.95 1.27 1.24

1.05

0.95

1.14

1.33

1.08 0.96

1.10 0.93

0.55 1.32 1.12

1.23

0.57

0.73

0.92

0.82

0.63 0.77 0.77 0.48 1.03 0.75 0.82

0.60

0.74

0.69

1.16

0.71 0.76

0.74 0.68

0.86 0.73 0.76

0.75

52

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

the sample or register capturing an entrant firm, there is a reasonable chance that the sample or register will be able to follow the firm over time. Figure 1.4 presents nonparametric (graphic) estimates of survivor rates. The survivor rate specifies the proportion of firms from a cohort of entrants that still exist at a given age. In the figure, the survival rates are averaged over different entry cohorts (those that entered the market in the late 1980s and 1990s) to minimize possible business cycle effects and possible measurement problems. Looking at cross-country differences in survivor rates, about 10 percent (Slovenia) to more than 30 percent (in Mexico) of entering firms leave the market within the first two years (fig. 1.4). Conditional on overcoming the initial years, the prospect of firms improves in the subsequent period; firms that remain in the business after the first two years have a 40 to 80 percent chance of surviving for five more years. Nevertheless, only about 30 to 50

A

B

Fig. 1.4 Firm survival at different lifetimes, 1990s: A, manufacturing; B, total business sector

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

53

percent of total entering firms in a given year survive beyond the seventh year in industrial and Latin American countries, while higher survival rates are found in transition economies.33 For most countries, the rank ordering of survival is similar whether using a two-year, four-year, or seven-year horizon, suggesting that there is an important country effect that impacts the survival function. However, there are a few interesting exceptions. The United States has relatively low survival rates at the two-year horizon but relatively higher survival rates at the seven-year horizon. This pattern might reflect the relatively rapid cleansing of poorly performing firms in the United States. Table 1.7 provides details on the survival rates at four years of age across industries and countries. The structure of the table is similar to those presented previously. Notably, the variation across countries is more systematic than that across industries. Across industries, between 60 and 80 percent of firms survive after four years, while for example, the survival rate in office and computing equipment deviates across countries from 40 percent below to 40 percent above the cross-country average of 70 percent. The total employment in each given cohort tends to increase in the initial years because failures are highly concentrated among its smallest units and because of the significant growth of survivors. These facts are best presented by looking at gains in average firm size amongst surviving firms. Given differences in data collection, the reference average size of entrants is that at duration one for industrial countries and duration zero for other countries, but excluding firms with zero employment. The choice for the industrial countries is dictated by the fact that entrant firms include zero-employee firms. For example, in the United States, the time when the firm is registered and when its employment is recorded differ, giving rise to the possibility that firms are recorded as having zero employees in the entry year and positive employment in the second year.34 This, however, may represent an overcorrection as it eliminates employment growth in firms with positive employment at registration. Figure 1.5 shows the evolution in average firm size of survivors as they age, corrected for possible changes in entry size of the actual survivors by age. In the figure, the average size of survivors at different duration is compared with that at entry. The difference in post-entry behavior of firms in the United States35 compared with the western European countries is partially 33. Survivor rates for firms with twenty or more employees at age one are similar to those observed in the newly compiled EUROSTAT firm-level database (EUROSTAT 2004). 34. However, recent work by the U.S. Census Bureau shows that even after correcting for the zero-employee problem, the size expansion of entrant firms in the United States exceeds that in other industrial countries by a wide margin. The growth in firm size in the ensuing years shows that the United States continues to perform much better than other OECD countries. 35. The results for the United States are consistent with the evidence in Audretsch (1995). He found that the four-year employment growth among surviving firms was about 90 percent. See also Dunne, Roberts, and Samuelson (1988, 1989).

1.05 1.00 1.02 0.96 1.04 0.98 1.05 1.02 0.98 1.02 0.99 1.01 1.01 0.88 0.93 0.92 0.96 0.99 0.98

0.69

0.59

0.64

0.69

0.73 0.69 0.73 0.68 0.69

0.69 0.73

0.70

0.74

0.71

0.77

0.70 0.65

Industrial

0.69 0.67

Crosscountry average

1.01 1.01

1.04

1.08

1.06

1.10

0.99 0.99

0.96 0.99 1.01 0.98 1.01

1.01

0.97

1.03

0.98

0.94 1.00

Other countries

0.87 0.78

1.03

0.99

0.90

0.92

0.95 0.96

1.05 0.89 0.91 0.96 0.94

0.97

0.95

0.95

0.92

1.07 0.94

Finland

1.03 1.00

0.88

0.86

1.01

1.03

1.05 0.96 0.97 1.08

0.92

1.10

0.98

1.10

0.91 1.02

France

0.72 0.77

0.70

0.73

0.71

0.61

0.90 0.70

0.67 0.88 0.96 0.76 0.85

0.82

0.86

0.75

0.69

0.79

U.K.

1.20 1.09 1.00 1.11

1.04

1.12

0.99

1.10

1.14 1.10

West Germany

1.08 1.05

0.92

1.00

1.00

1.05

1.05 1.00

1.23 1.11 1.00 1.10 1.08

1.03

1.09

1.04

1.08

1.10 1.04

Italy

Survival rate (4 years of age) across countries and industries (as a ratio to cross-country sectoral average)

Mining and quarrying Total manufacturing Food products, beverages, and tobacco Textiles, textile products, leather, and footwear Wood and products of wood and cork Publishing, printing, and reproduction of recorded media Coke, refined petroleum products, and nuclear fuel Chemicals and chemical products Rubber and plastics products Other non-metallic mineral products Basic metals Fabricated metal products, except machinery and equipment Machinery and equipment, N.E.C. Office, accounting, and computing machinery Electrical machinery and apparatus, Nec Radio, television, and communication equipment Medical, precision, and optical instruments Motor vehicles, trailers, and semi-trailers Other transport equipment

Table 1.7

1.05 1.14

1.08

0.91

1.00

1.03

1.08 1.09

1.13 1.04 1.06 1.08 1.15

1.03

1.19

1.02

1.01

1.15 1.07

Netherlands

1.29 1.25

1.15

1.00

0.99

1.13

1.12 1.29

1.37 1.14 1.06 1.16 1.01

1.13

1.04

1.14

1.35

1.11 1.12

Portugal

0.92 0.95

0.95

0.95

0.91

0.80

1.00 0.99

0.79 1.01 0.90 0.97 0.93

0.94

0.99

0.81

0.91

0.85 0.95

U.S.

1.01

0.70

0.65

Total nonagricultural business sector

Mining and quarrying Total manufacturing Food products, beverages, and tobacco Textiles, textile products, leather, and footwear Wood and products of wood and cork Publishing, printing, and reproduction of recorded media Coke, refined petroleum products, and nuclear fuel Chemicals and chemical products Rubber and plastics products Other non-metallic mineral products Basic metals

0.98

0.66

1.05 1.00 1.02 0.96 1.04 0.98 1.05 1.02 0.98 1.02 0.99

0.69

0.59

0.64

0.69

0.73 0.69 0.73

0.68 0.69

Industrial

0.69 0.67

Crosscountry average

1.02

0.64

1.02

1.02 1.01 1.07 1.02

0.66 0.82 0.64 0.66

Manufacturing Nec; Recycling Electricity, gas, and water supply Construction Market services Wholesale and retail trade; restaurants and hotels Transport, storage, and communication Finance, insurance, Real Estate, and business services

0.98 1.01

0.96 0.99 1.01

1.01

0.97

1.03

0.98

0.94 1.00

Other countries

1.11

0.95 1.14

0.95

1.01

1.19

1.02

0.49 1.05

Estonia

0.99

0.99

1.02

0.98

0.98 0.99 0.94 0.98

1.04 0.97

0.97 1.04 1.10

1.04

1.08

1.21

1.03

1.11 1.10

Hungary

1.00

1.01

1.22

0.91

0.93 1.14 1.00 0.99

1.17 1.35

1.14 1.07 1.12

1.08

1.04

1.30

1.09

0.98 1.11

Latvia

0.99

0.85

1.05

1.01

0.99 0.98 1.00 0.96

1.09 1.03

1.37 1.09 1.05

1.06

1.09

Romania

0.82

0.78

1.22 1.32

1.37 0.95 1.20

1.23

1.26

1.20

1.15

1.40 1.22

Slovenia

1.05

1.00

1.00

1.02

1.14 1.01 1.10 1.01

0.89 0.90

0.83 1.02 0.94

0.93

0.83

0.91

0.86

0.84 0.89

Argentina

1.04

1.01

1.04

1.03

1.04 1.00 1.03 1.02

0.98 1.13

0.93 1.00 1.02

1.09

1.13

1.08

1.03

1.04

Chile

1.16

1.16

1.07

1.07

1.11 0.99 1.18 1.14

0.83 0.92

1.11 1.00 0.90

1.02

0.77

0.87

0.95

0.87

0.92 0.86 0.81

0.77

0.69

0.80

0.80

0.69 0.76

Mexico

0.97

0.95

0.94

0.96

0.92 0.95 0.98 0.96

0.74 0.78 (continued )

Colombia

1.13

1.10

0.45

1.12

1.29 1.01 1.18 1.09

0.93 0.92 0.96 0.99 0.98 1.02 1.01 1.07 1.02 1.02 0.98 1.01

0.74

0.71

0.77

0.70 0.65 0.66 0.82 0.64 0.66

0.64

0.66

0.70

0.65

Total nonagricultural business sector 1.02

0.88

0.70

Industrial

1.01 1.01

Crosscountry average

0.69 0.73

(continued)

Fabricated metal products, except machinery and equipment Machinery and equipment, N.E.C. Office, accounting, and computing machinery Electrical machinery and apparatus, Nec Radio, television, and communication equipment Medical, precision, and optical instruments Motor vehicles, trailers, and semi-trailers Other transport equipment Manufacturing Nec; recycling Electricity, gas, and water supply Construction Market services Wholesale and retail trade; restaurants and hotels Transport, storage, and communication Finance, insurance, Real Estate, and business services

Table 1.7

0.99

0.99

1.02

0.98

1.01 1.01 0.98 0.99 0.94 0.98

1.04

1.08

1.06

1.10

0.99 0.99

Other countries

1.09

1.06

1.15

1.06

1.07 1.37 1.05 0.95 1.16 1.07

1.30

0.95

1.02

1.42

1.12 1.01

Estonia

1.10

1.06

1.11

1.07

1.14 1.13 1.11 0.98 1.16 1.06

1.07

1.07

1.06

1.16

1.12 1.09

Hungary

1.15

1.13

1.22

1.13

1.43 1.43 1.20 1.12 1.21 1.12

1.15

1.27

1.05

1.10

1.21 0.96

Latvia

1.00

1.00

1.04

0.98

1.14 1.21 1.11 1.05 1.17 0.96

1.01

1.07

1.10

1.02

1.09 1.03

Romania

1.23

1.20

1.14

1.20

1.16 1.06 1.17 1.06 1.31 1.19

1.12

1.22

1.13

1.22

1.27 1.20

Slovenia

0.88

0.91

0.98

0.87

0.95 0.83 0.89 0.95 0.66 0.89

0.99

0.86

0.93

0.60

0.85 0.86

Argentina

1.07

0.96 0.88 1.07

1.04

1.06

1.14

1.42

1.00 0.97

Chile

0.90

0.83 0.88 0.78

0.81

1.04

0.98

1.42

0.82 0.75

Colombia

0.67

0.75

0.78

0.74

0.81 0.76 0.70 0.88 0.32 0.73

0.70

Mexico

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

57

A

B

Fig. 1.5 Average firm size relative to entry, by age: A, manufacturing; B, total business sector

due to the larger gap between the size at entry and the average firm size of incumbents (i.e., there is a greater scope for expansion among young ventures in the U.S. markets than in Europe). In turn, the smaller relative size of entrants can be taken to indicate a greater degree of experimentation, with firms starting small and, if successful, expanding rapidly to approach the minimum efficient scale.36 Latin American countries also offer a wide range of post-entry performance of firms. Argentina has very limited post-entry expansion of successful firms in manufacturing, while in Mexico selection of small firms is stronger than in all other countries. However, post-entry growth of successful firms is also very strong, pointing to vigorous market selection process but also to sizeable rewards for successful new firms. 36. This greater experimentation of small firms in the U.S. market may also contribute to explain the evidence of a lower than average productivity at entry.

58

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

Transition economies also show a different behavior from most other countries on firm survival. They tend to show higher survivor rates and large post-entry growth of successful firms, which confirm the hypothesis that new firms enjoyed a period of relatively low market contestability, especially in new low populated markets. Romania is obviously an outlier among transition economies; not only are failure rates higher than in the other countries, but even successful entrants have more limited opportunities of expanding. 1.6 The Effects of Creative Destruction on Productivity 1.6.1 Reallocation and Productivity: Growth versus Level Comparisons In the previous two sections we have presented evidence of significant cross-country differences in firm characteristics, their market dynamics, and post-entry performance, which cannot be fully explained by differences in sectoral composition of the economy but rather points to salient differences in market characteristics and in business environment. The next obvious question is, do these differences matter for aggregate performance? We address this question in a number of ways. First, we examine the connection between productivity growth and the reallocation dynamics that we have documented in the prior sections. We are particularly interested in the contribution of entering and exiting businesses as well as the contribution of the reallocation of activity among continuing businesses. However, this analysis of dynamic efficiency, while inherently interesting, is fraught with interpretational and measurement difficulties. We attempt to overcome some aspects of these difficulties by exploiting sectoral variation within countries and then, in turn, comparing such sectoral differences across countries. In addition, we explore static efficiency by viewing a cross-sectional decomposition of productivity. The latter turns out to be simpler and more robust in terms of theoretical predictions and measurement problems. The approach taken in much of the empirical literature is to use accounting decompositions that decompose aggregate growth into components that reflect the contributions of productivity growth within continuing firms, the firm turnover process, and the reallocation of resources across continuing firms. The decompositions are correct in an accounting sense but interpreting the results is, as noted, fraught with challenges. Part of the problem here is to develop tight links between theoretical models of productivity enhancing reallocation and these empirical decompositions. One way to think about the empirical decompositions is that they provide a set of moments that models should match. Lentz and Mortensen (2005) take this approach by using a model of reallocation where the key frictions are in the labor market (via search frictions). Levinsohn and Petrin (2005)

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

59

provide a related useful benchmark by showing that in a model without frictions and without entry and exit, aggregate productivity growth is given by the weighted average of the productivity growth of continuing firms. In other words, without frictions there is no contribution of reallocation to aggregate productivity growth. These two studies remind us that the role of reallocation in productivity growth is inherently related to underlying frictions in the markets. For example, for net entry to be important it must be the case that it is costly for firms to enter and exit. If it is costless in terms of time and resources for firms to enter or exit, we would not observe any difference in the productivity at the margin between entering and exiting businesses. While frictions are at the core of this connection between productivity and reallocation, precisely how these frictions interact with the connection between productivity and reallocation is complicated both conceptually and in the evidence that emerges from cross-country evidence. We turn to these issues and findings now. 1.6.2 Reallocation and Productivity Growth Let’s define the sector-wide productivity level in year t, Pt as: Pt ∑it pit

(7)

i

where i is the input share of firm i and Pt and pit are a productivity measure.37 In this chapter we focus on labor productivity based on gross output data, although other measures are available for a subset of countries/sectors. We also use a decomposition suggested by Baily, Hulten, and Campbell (BHC henceforth, 1992) and in turn modified by Foster, Haltiwanger, and Krizan (FHK henceforth, 2001). BHC and FHK decompose aggregate (or industry-level) productivity growth into five components, commonly called the within effect, between effect, cross effect, entry effect, and exit effect, as shown in order below: (8)

Pt ∑itk pit ∑ it ( pitk Ptk) ∑ it pit i∈C

i∈C

i∈C

∑it ( pit Ptk) ∑itk ( pitk Ptk) i∈N

i∈X

where means changes over the k-years’ interval between the first year (t – k) and the last year (t); it is as before; C, N, and X are sets of continuing, entering, and exiting firms, respectively; and Pt–k is the aggregate (i.e., weighted average) productivity level of the sector as of the first year (t – k).38 The FHK method uses the first year’s values for a continuing firm’s share 37. A variety of measures of productivity have been used in the literature including labor productivity, measures of total factor productivity that vary from estimated residuals from production functions to divisia index approaches to multilateral index number approaches. 38. The shares are usually based on employment in decompositions of labor productivity and on output in decompositions of total factor productivity.

60

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

(it–k), its productivity level (Pit–k), and the sector-wide average productivity level (Pt–k). One potential problem with this method is that, in the presence of measurement error in assessing market shares and relative productivity levels in the base year, the correlation between changes in productivity and changes in market share could be spurious, affecting the within- and between-firm effects. To tackle these potential problems, we have also used the approach proposed by Griliches and Regev (1995), which uses the time averages of the first and last years for them (i, pi, and P ). As a result the cross-effect, or covariance term, disappears from the decomposition. The averaging of market shares reduces the influence of possible measurement errors, but the interpretation of the different terms of the decomposition is less clear-cut as the time averaging makes the within-effect term affected by changes in the firms’ shares over time, and the between-effect term affected by changes in productivity over time. The results obtained using this method are qualitatively similar to those obtained using the FHK and are not presented in the chapter. As a final sensitivity analysis, we also use the method proposed by Baldwin and Gu (BG henceforth, 2002) that uses, as a reference for the calculations of the relative productivity of the different groups, the average productivity of exiting firms. With this method, the contribution from exiting firms disappears and the entry component is positive if, on average, their productivity is higher than those of firms they are supposed to replace (the exiting firms). In all of these decompositions, the baseline analysis is based on five-year rolling windows for all periods and industries for which data are available. We also present results for three-year rolling windows and test the hypothesis that the contribution from entry changes with the time horizon considered. However, care has to be taken in interpreting the entry and exit components as they do not always reflect a comparison between productivity levels at the same point in time. For example, in the version of the FHK decomposition used here, the entry component comprises the difference between average productivity among entrants at the end of the threeto five-year period with overall productivity at the beginning. Therefore, it is obvious that a positive entry component does not necessarily mean that productivity among entering firms is above average in relation to their contemporaries. Before discussing the results of these decompositions, it is important to notice that their interpretation is not always straightforward from a theoretical, as well as measurement, point of view. The working hypothesis that poor market structure and institutions will distort the contribution of the creative destruction process has complex implications when using these basic accounting decompositions. The reason is that distortions may affect the reallocation dynamics on different margins in a variety of ways. For example, artificially high barriers to entry will lead to reduced firm turnover

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

61

and to a less efficient allocation of resources. But given the high barrier to entry (and in turn the implied ability of marginal incumbents to increase survival probabilities), the average productivity of entrants will rise while the average productivity of incumbents and exiting businesses will fall. Similar predictions apply to policies that subsidize incumbents and/or restrict exit in some fashion. The point is that institutional distortions might yield a larger gap in productivity between entering and exiting businesses, which will contribute to larger net entry term in the previous decompositions. Alternatively, some types of distortions in market structure and institutions might make the entry and exit process less rational (i.e., less driven by market fundamentals but more by random factors). Such randomness may be associated with either a higher or lower pace of churning. Pure randomness would, in principle, increase the pace of churning, but the random factors might be correlated with other factors (e.g., firm size) and thus the impact would be to distort the relationship between churning and such factors with less clear predictions on the overall pace of churning. In any event, such randomness would imply less systematic differences between entering, exiting, and incumbent businesses—in the extreme when all entry and exit is random there should be no differences between entering, exiting, and incumbent businesses.39 Another related problem is that a business climate that encourages more market experimentation might have a larger long-run contribution but a smaller short-run contribution from the creative destruction process. That is, the greater market experimentation may be associated with more risk and uncertainty in the short run so that it is only after the trial and error process of the experimentation has worked its way out (through learning and selection effects) that the productivity payoff is realized. Thus, a business climate that encourages market experimentation might have a lower short-run contribution from entry and exit but a higher long-run contribution from entry and exit. Thus, in terms of these decompositions, the horizon over which the decomposition is measured may have a major effect on the contribution of net entry in a specific country in a manner that is idiosyncratic to that country, and therefore impact any cross-country comparisons.40 In short, the gap between the productivity of entering and exiting busi39. Oviedo (2005) models the randomness of institutional enforcement as a way of capturing variations in institutional quality across countries. She shows that such randomness reduces the link between firm turnover and allocative efficiency. 40. Foster, Haltiwanger, and Krizan (2001) found large differences in the contribution of entry and exit between five- and ten-year horizons in the United States. Their analysis suggests that this is because entering cohorts in the United States are very heterogeneous. The selection of the least productive entrants in the first several years as well as the relatively greater increases in productivity for surviving entrants, relative to more mature incumbents over the same period, imply that the impact of net entry is much larger at a ten-year horizon than a five-year horizon. They show that this holds even taking into account the inherently higher share of activity accounted for by entering and exiting businesses over a longer horizon.

62

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

nesses is not by itself sufficient to gauge the contribution or efficiency of the creative destruction process. In addition, different types of distortions might be acting simultaneously in a country. It might be that different policies act to subsidize incumbents (preferential treatment for incumbents), other policies artificially increase the barriers to entry (poorly functioning financial markets and/or regulatory barriers), while other policies make exit more random for some types of businesses (e.g., poorly functioning financial markets for young and small businesses). As such, there might be too little churning on some dimensions and too much on others, and the gap between entering and exiting businesses might be too large on some margins and too small on others. With all of these caveats in mind, figure 1.6 presents the decomposition of labor productivity growth in the total business sector and figure 1.7 presents the decomposition of labor productivity for the manufacturing over the 1990s for a large sample of countries. A number of elements emerge from these decompositions: • Productivity growth is largely driven by within-firm performance. In industrial and emerging economies (outside transition), productivity within each firm accounts for the bulk of overall labor productivity growth. This is particularly the case if one focuses on the three-year horizon (not reported). Over the longer run (i.e., five-year horizon),

Fig. 1.6

Firm-level labor productivity decomposition for Total Business Sector

Notes: Chile: 1985–1999; Estonia 2000–2001; West Germany 2000–2002; Latvia 2001–2002; Portugal 1991–1994. Excluding Brazil and Venezuela. Within within-firm productivity growth. Between productivity growth due to reallocation of labor across existing firms. Entry productivity growth due to entry of new firms. Exit productivity growth due to exit of firms. Firm turnover Entry plus exit rates.

Firm-level labor productivity decomposition for manufacturing

Notes: Argentina 1995–2001; Chile 1985–1999; Colombia 1987–1998; Estonia 2000–2001; Finland 2000–2002; France 1990–1995; West Germany 2000– 2002; Korea 1988 and 1993; Latvia 2001–2002; Netherlands 1992–2001; Portugal 1991–1994; Slovenia 1997–2001; Taiwan 1986, 1991 and 1996; U.K. 2000– 2001; U.S. 1992 and 1997. Excluding Brazil and Venezuela.

Fig. 1.7

64

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

reallocation (and, in particular, the entry component) plays a stronger role in promoting productivity growth. • The impact on productivity via the reallocation of output across existing enterprises (the between effect) varies significantly across countries. It is generally positive but small. This factor should be assessed together with the covariance (or cross) term, which combines changes in productivity with changes in employment shares. The covariance term is negative in most countries, including the transition economies. This implies that firms experiencing an increase in productivity were also losing market shares (i.e., their productivity growth was associated with restructuring and downsizing rather than expansion). This negative cross term, in a related way, is potentially associated with adjustment costs of labor. That is, in any given cross section there are some businesses that have recently had a productivity shock, but due to adjustment costs have not adjusted their labor inputs (at least fully). For businesses with a recent positive shock, the higher productivity will lead to a higher desired demand for labor and thus we will see such businesses increase employment, but due to diminishing returns (in the presence of any fixed factors at the micro level), a decrease in productivity. • Finally, the contribution of net entry to overall labor productivity growth is generally positive in most countries, accounting for between 20 percent and 50 percent of total productivity growth. The exit effect is always positive (i.e., the least productive firms exit the market contributing to raise the productivity average of those that survive). Data for European countries show that new firms typically make a positive contribution to overall productivity growth, although the effect is generally of small magnitude. By contrast, entries make a negative contribution in the United States for most industries. Interpreting these findings without more information is difficult. The weak performance of entrants in the United States might reflect greater experimentation, so that for each entering cohort of entrants there is more selection and potentially more learning by doing.41 In transition economies, in all but one country (Hungary over the three-year horizon) the entry of new firms makes a positive and often strong contribution to productivity. For most countries, while the contribution of net entry is posi41. Some evidence in favor of this interpretation is provided in Haltiwanger, Jarmin, and Schank (2003); Foster, Haltiwanger, and Krizan (2001, 2002); and Bartelsman and Scarpetta (2004). The former work provides evidence of greater market experimentation in the United States relative to Germany. The latter shows that as the horizon lengthens in the United States, the contribution of net entry rises disproportionately. Moreover, Foster, Haltiwanger, and Krizan (2001, 2002) show that the increased contribution of net entry is due to both selection of the low productivity entrants and due to learning by doing to successful entrants.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

65

tive, it is less than proportionate relative to the share of employment accounted for by firm turnover. An open question is whether the observed differences across countries are accounted for by differences in market institutions and policies or whether they reflect different circumstances and/or problems of measurement. As discussed above, drawing such inferences from cross-country evidence is difficult given that the policy environment may impact in a variety of ways and given the measurement problems. Consider, for example, the problem of measurement error in firm turnover that yields too high a measure of turnover for a country because of longitudinal linkage problems. Other things equal, spuriously high firm turnover will increase the share of activity associated with entering and exiting businesses and therefore increase the contribution of net entry to productivity growth. However, this same measurement error is likely to impact the differences in productivity between continuing, entering, and exiting businesses. If the true relationship is such that exiting businesses are less productive than continuing businesses, spurious entry and exit will tend to reduce this difference since some of the measured exiting businesses will in fact be continuing businesses. For entry, the relationship is potentially more complicated and also related to interpretation as well as differences across countries in the nature of their dynamics. For a country where entrants are immediately more productive than continuers, spurious measurement error will tend to reduce the gap and therefore decrease the contribution of net entry. For a country where entrants tend to be less productive than incumbents at entry perhaps due to market experimentation, as in the United States, spurious entry and exit will decrease the negative gap and therefore increase the contribution of net entry (since it will reduce a negative effect). One set of countries where these measurement and interpretation problems appear to be interacting in interesting ways is for the transition economies (Estonia, Hungary, Latvia, Romania, and Slovenia). In these countries, there is a very high rate of firm turnover as a share of total employment and entry accounts for a large (but less than proportionate to the share of turnover) share of productivity growth. The large contribution of entry partly reflects the large rate of firm turnover, but it also reflects by construction a positive gap between entrants and incumbents productivity. In interpreting the latter finding, it is useful to put it in the context of the high pace of turnover. In general, it is difficult to interpret differences across countries in the magnitude of the gap between entering and exiting businesses. For example, this gap might reflect fundamentals driving market selection with new businesses adopting the latest business practices (or in transition economies, new businesses adopting market business practices relative to incumbents), or it might reflect a very high entry barrier so that only very productive new businesses enter. However, the latter expla-

66

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

nation might suggest that firm turnover rates should be lower, which does not appear to be the case for the transition economies. Still, for these economies the contribution of net entry is far from proportionate, suggesting that there is substantial churning of businesses via entry and exit that is not productivity-enhancing. Our data also allow checking the sensitivity of the contribution of firm entry to differences in the time horizon. Table 1.8 presents the difference in the components of the decomposition as the horizon increases from three to five years for selected countries. To make the three and five year components comparable, the components have all been annualized. For the selected countries, lengthening the horizon increases the annual contribution of net entry, decreases the annual contribution of the between component, and has a mixed impact on the within component. The increase in the net entry component is largest for the transition economies, with a relatively large increase of almost three percent for Estonia. For the transition economies at least, these findings are consistent with the hypothesis that learning and selection effects increase the contribution of net entry over a longer horizon. There is also an important sectoral dimension to the process of restructuring, reallocation, and creative destruction. Figure 1.8 presents the productivity decompositions for two groups of industries in manufacturing: (a) the low technology industries, and (b) the medium-high-technology industries. The large negative cross-term discussed previously (i.e., the fact that firms with strong productivity growth downsized is evident in lowtech industries, while in medium-high-tech industries this effect, albeit still present, seems to be smaller). Even more interesting, the contribution of new firms to productivity growth is modest in low-tech industries, and even largely negative in a few countries, including the United States. But the entry effect is strongly positive in medium-high-tech industries. This result suggests an important role for new firms in an area characterized by stronger technological changes. Given our focus on measurement issues in this chapter, these findings provide another illustration why exploiting the

Table 1.8

Time horizon differences Difference in component from 5 to 3 years

Country Argentina Chile Colombia Estonia Latvia Slovenia

Net entry

Between

Within

0.001 0.002 0.001 0.028 0.019 0.007

–0.001 –0.005 –0.005 –0.006 –0.009 –0.001

0.028 –0.007 –0.004 –0.007 0.027 0.001

Productivity decomposition by technology groups

Notes: Argentina 1995–2001; Chile 1985–1999; Colombia 1987–1998; Estonia 2000–2001; Finland 2000–2002; France 1990–1995; West Germany 2000–2002; Korea 1988 and 1993; Latvia 2001–2002; Netherlands 1992–2001; Portugal 1991–1994; Slovenia 1997–2001; Taiwan 1986, 1991, and 1996; U.K. 2000–2001; U.S. 1992 and 1997. Excluding Brazil and Venezuela.

Fig. 1.8

Low Tech Industries

(cont.)

Notes: Argentina 1995–2001; Chile 1985–1999; Colombia 1987–1998; Estonia 2000–2001; Finland 2000–2002; France 1990–1995; West Germany 2000– 2002; Korea 1988 and 1993; Latvia 2001–2002; Netherlands 1992–2001; Portugal 1991–1994; Slovenia 1997–2001; Taiwan 1986, 1991 and 1996; UK 2000– 2001; U.S. 1992 and 1997. Excluding Brazil and Venezuela.

Fig. 1.8

Medium and High Tech Industries

Measuring and Analyzing Cross-Country Differences in Firm Dynamics Table 1.9

Argentina Chile Colombia Estonia Finland France Korea, Republic Latvia Netherlands Portugal Slovenia Taiwan (China) U.K. U.S. West Germany

69

Accounting for the differences between FHK and BG decompositions Net entry difference

Exit/entry share difference

Incumbent/exit productivity difference

–0.01 –0.007 0.003 –0.001 –0.002 0.003 –0.042 0.000 0.001 –0.011 0.010 –0.014 0.005 0.002 0.000

–0.012 –0.022 0.008 –0.031 –0.013 0.034 –0.122 –0.001 0.028 –0.039 0.059 –0.077 0.148 0.012 0.001

0.098 0.432 0.627 0.28 0.251 0.107 0.495 –0.037 0.025 0.394 0.252 0.264 0.051 0.299 0.274

Notes: The reported figures are the time series averages. The first column is the product of the second and third column. However, since the reported figures are averages over time, the identity may appear not to hold (the product of the averages is not the same as the average of the product).

cross-industry variation within countries is a useful approach in crosscountry analysis. Table 1.9 presents the difference in the net entry component (annualized) for the FHK and BG methodologies. Recall that a key difference is that FHK use the initial average productivity of all plants as the benchmark from which entering and exiting plants’ productivity are compared, while BG use the exiters’ productivity. Foster, Haltiwanger, and Krizan (2001) motivate their approach as having desirable accounting properties (i.e., entering plants contribute positively to industry productivity growth over time if they are above the initial average, while exiting plants contribute positively to industry productivity growth if they are below the initial average). Baldwin and Gu (2003) motivate their approach as being more appropriate to the extent that entrants are displacing exiting plants, so the correct reference group for entrants are the exiting businesses they are displacing.42 For most countries the difference is small. It is intuitive that the effects should in general be small because for both methods the net entry term depends critically on the difference between average productivity of entering and exiting businesses. In other words, both the entry and the exit term subtract off whatever base is used, so at first 42. One technical limitation of this alternative is that it implies, in turn, that the benchmark for the between component is the productivity of the exiters, which is difficult to motivate.

70

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

glance, it might appear that the base is irrelevant (the base term in each component cancels out in the net). Consistent with this perspective, computing the difference between the FHK and BG net entry terms yields: (9)

FHK BG (∑itk ∑it)(Ptk PXtk) i∈X

i∈N

X where Pt–k is the average productivity of incumbents and Pt–k is the average productivity of exiting businesses in the base year. Thus, if the share of activity (in this case employment) accounted for by entering and exiting businesses is the same, then the difference is zero. As seen in section 1.5, for most countries the share of activity accounted for by entry is about the same as that for exit, typically with the latter slightly larger since exiting businesses tend to be larger than entering businesses. Thus, this difference in weights does not matter for most countries. However, for Korea—and to a lesser extent Portugal and Taiwan (China)—the share of employment accounted for by exit is substantially less than the share of employment accounted for by entry, leading to larger differences between the two decomposition methods. This difference yields an especially big effect in Korea given that the gap between incumbents and exiting businesses is also large. To conclude this discussion of dynamic decompositions, it is worth highlighting the range of problems in drawing inferences from cross-country comparisons of the contribution of net entry across countries. For one, these decompositions depend critically on accurately measuring the extent of entry and exit. As we have noted, spurious entry and exit will have complex implications for the contribution of net entry with effects working in potentially opposite directions. For another, horizon may play a critical role in these decompositions and such horizon differences are arguably different across countries (and industries). The horizon problems are mitigated if very long differences are used (e.g., ten years), but this in turn poses problems of data limitations and measurement (e.g., the measurement problems may be worse over a longer horizon). We believe that these dynamic decompositions highlight some interesting patterns that appear to reflect rich actual differences in the firm dynamics.

1.6.3 The Cross-Sectional Efficiency of the Allocation of Activity So far, the creative destruction process has been discussed mostly from the point of view of productivity growth. This is natural in this context since the creative destruction process is inherently dynamic. However, as discussed previously at some length, measurement and interpretation problems raise questions about the comparisons of dynamic decompositions across countries. An alternative approach that is simpler and more robust is to ask the question, are resources allocated efficiently in a sector/ country in the cross section at a given point in time? Dynamics can also be examined here to the extent that the nature of the efficiency of the crosssectional allocation of businesses can vary over time.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

71

This approach is based upon a simple cross-sectional decomposition of productivity growth developed by Olley and Pakes (1996). They note that in the cross section, the level of productivity for a sector at a point in time can be decomposed as follows: (10)

Pt (1/Nt)∑ Pit ∑ it Pit i

i

where N is the number of businesses in the sector and is the operator that represents the cross-sectional deviation of the firm-level measure from the industry simple average. The simple interpretation of this decomposition is that aggregate productivity can be decomposed into two terms involving the unweighted average of firm-level productivity, plus a cross term that reflects the cross-sectional efficiency of the allocation of activity. The cross term captures allocative efficiency since it reflects the extent to which firms with greater efficiency have a greater market share. This simple decomposition is very easy to implement and essentially involves just measuring the unweighted average productivity versus the weighted average productivity. Measurement problems make comparisons of the levels of either of these measures across sectors or countries very problematic, but taking the difference between these two measures reflects a form of a difference-in-difference approach. Beyond measurement advantages, this approach also has the related virtue that theoretical predictions are more straightforward as well. Distortions to market structure and institutions unambiguously imply that the difference between weighted and unweighted productivity (or equivalently the cross term) should be smaller. With these remarks in mind, figure 1.9 shows the measure of the gap between weighted and unweighted average productivity for a sample of countries. The results are obtained by applying the Olley Pakes (OP) decomposition at the industry level and then taking the weighted average across industries for the countries in the harmonized database. For virtually all countries, the gap is positive, suggesting that resources are allocated to more productive businesses. The South East Asian economies are on top, followed by the United States, while the Latin American countries (except Argentina) show higher productivity boosts through resource allocation than the EU, but lower than in Asia. The transition economies are generally weaker in terms of this measure of allocative efficiency. For many countries, the gap is not only positive but large. For the Asian economies and the United States, the allocative efficiency term accounts for about 50 percent or more of labor productivity. In the EU, the productivity boost is smaller, ranging from 15 to 38 percent. The findings in figure 1.9 are striking and suggest that this measurement approach has great potential in a cross-country context. Moreover, the allocative efficiency measures can be computed for different years or for specific industries and/or other classifications of firms, suggesting that a

72

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

Fig. 1.9

The gap between weighted and unweighted labor productivity, 1990s

Note: Based on the three-year differences.

pooled country, firm-type data set of allocative efficiency measures would be valuable for further analysis. Note, however, that the allocative efficiency measures are not without problems and limitations. A key problem is that the measures by construction do not permit decomposing the contribution of entering, exiting, and continuing businesses. As such, in an analysis of the impact of institutions on reallocation and productivity dynamics, these allocative efficiency measures cannot be used to investigate the impact of institutions on such measures of firm dynamics and in turn, the contribution of those effects on productivity. Measurement error will also cloud the interpretation of the allocative efficiency measures. Classical measurement error in productivity at the micro level that is uncorrelated with market share will tend to drive the allocative efficiency to zero. Classical measurement error in productivity that is also correlated with market share (put differently, classical measurement error in output measures at the micro level) will work in the opposite direction. 1.7 Concluding Remarks In this chapter we assess the measurement and analytic challenges for studying firm dynamics within and across countries. We use recently collected indicators of firm dynamics for a sample of more than twenty countries. Our cross-country data set has been assembled, paying great care to the harmonization of key concepts. Such harmonization is essential to conduct meaningful comparisons, but we acknowledge that our effort should probably be extended, as there remain measurement problems. While simple comparisons of firm dynamics across countries remain difficult to interpret, interesting inferences can be made by examining multiple indicators and by carefully considering the nature of the measurement errors. Since much of these errors are country-specific, using some form of

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

73

difference-in-difference approach that eliminates overall country-specific effects helps enormously. Bearing in mind these measurement problems, there is evidence in our data of a significant heterogeneity of firms in each market and country. This heterogeneity is manifested in large disparities in firm size, firm growth, and productivity performance. More in detail, we found: • The average size of incumbent firms varies widely across sectors and countries. Differences in firm size are largely driven by within-sector differences, although in some countries sectoral specialization also plays a significant role. Smaller countries tend to have a size distribution skewed towards smaller firms, but the average size of firms does not map precisely with the overall dimension of the domestic market. An important message emerging from our analysis is that in the empirical analysis of firm dynamics, differences in the size composition across sectors and countries ought to be controlled for. • Firm churning, taken at face value, is large; gross firm turnover is in the range of 10 to 20 percent of all firms in industrial countries, and even more in transition and other emerging economies. Entering, but also exiting, firms tend to be small and thus firm flows affect only about 5 to 10 percent of total employment. This may suggest that the entry of small firms is relatively easy while larger-scale entry is more difficult, but survival among small firms is also more difficult—many small newcomers fail before reaching the efficient scale of production. Given the measurement and interpretation issues related to firm turnover data, we suggest exploring the variation in firm turnover across sectors and firms of different sizes to shed some light on the different nature of creative destruction. • Market selection is pretty harsh. About 20 to 40 percent of entering firms fail within the first two years of life. Confirming previous results, failure rates decline with duration; conditional on surviving the first few years, the probability of survival becomes higher. But only about 40 to 50 percent of total entering firms in a given cohort survive beyond the seventh year. • Successful entrants expand rapidly. Surviving firms are not only relatively larger but also tend to grow rapidly. The combined effect of exits being concentrated among the smallest units and the growth of survivors makes the average size of a given cohort increase rapidly towards the efficient scale. Measuring the post-entry performance within countries appears to be somewhat more robust than the analysis of firm dynamics, since it implies following a cohort over time within a country. • Creative destruction is important for promoting productivity growth. While the continuous process of restructuring and upgrading by incumbents is essential to boost aggregate productivity, the entry of new

74

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

firms and the exit of obsolete units also play an important role. In virtually all of our countries, the net entry process contributes positively to productivity growth. While measurement and interpretation problems associated with firm turnover cloud rank orderings across countries, within-country variations in the contribution of firm turnover to productivity growth may be an interesting avenue of research. For example, we observe a stronger contribution of net entry to productivity growth in high-technology industries compared with low-technology ones, and the differences between these two groups vary significantly across countries. This in turn may suggest a different role of creative destruction in promoting technological adoption and experimentation. Moreover, this pattern helps highlight the usefulness of exploiting the cross-industry variation within countries and in turn comparing that cross-industry variation across countries within this context. • Allocative efficiency is important in productivity levels, rank ordering of countries, and in productivity growth. Allocative efficiency can be measured using cross-sectional data within a country or industry, or by using the covariance between market share and efficiency (i.e., measures of productivity). In using this measure, we find that virtually all countries exhibit positive allocative efficiency. Further, the rank ordering of countries on this basis appears more reasonable than other measures of the contribution of the reallocation process to growth.

References Ahn, S. 2000. Firm dynamics and productivity growth: A review of micro evidence from OECD countries. OECD Economics Department Working Paper no. 297. Paris: Organization for Economic Cooperation and Development. Audretsch, D. B. 1995. Innovation, growth and survival. International Journal of Industrial Organisation 13 (1995): 441–57. Aw, B. Y., S. Chung, and M. Roberts. 2003. Productivity, output, and failure: A comparison of Taiwanese and Korean manufacturers. Economic Journal 113 (491): F485–F510. Baily, M. N., C. Hulten, and David Campbell. 1992. Productivity dynamics in manufacturing establishments. Brookings Papers on Economic Activity: Microeconomics: 187–249. Washington, D.C.: Brookings Institution. Baldwin, J. and W. Gu. 2002. Plant turnover and productivity growth in Canadian manufacturing. OECD Science, Technology, and Industry Working Papers 2002/2. OECD Publishing. Barro, R. J., and X. Sala-i-Martin. 1995. Economic growth. New York: McGrawHill. Bartelsman, E. J. 2004. The analysis of microdata from an international perspective. OECD Statistics Directorate, STD/CSTAT, 12. Paris: Organization for Economic Cooperation and Development.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

75

Bartelsman, E., and S. Scarpetta. 2004. Experimentation within and between firms: Any role for policy and institutions? Paper presented at the 2004 American Economic Association Meeting. 3–5 January, San Diego. Bartelsman, E. J., and M. Doms. 2000. Understanding productivity: Lessons from longitudinal microdata. Journal of Economic Literature 38 (3): 569–95. Bartelsman, E. J., J. Haltiwanger, and S. Scarpetta. 2004. Distributed analysis of firm-level data from industrial and developing countries. Mimeograph. Bastos, F., and J. Nasir. 2004. Productivity and the investment climate: What matters most? World Bank Policy Research Working Paper no. 3335. Bernard, A., and C. J. Jones. 1996a. Productivity across industries and countries: Time series theory and evidence. The Review of Economics and Statistics 78 (1): 135–46. ———. 1996b. Productivity and convergence across U.S. states and industries. Empirical Economics 21:113–35. Brown, D. J., and J. S. Earle. 2004. Economic reforms and productivity-enhancing reallocation in post-Soviet transition. Upjohn Institute Staff Working Paper no. 04-98. Caves, R. E. 1998. Industrial organization and new findings on the turnover and mobility of firms. Journal of Economic Literature 36 (4): 1947–82. Davis, S. J., J. Haltiwanger, and S. Schuh. 1996. Job Creation and Destruction. Cambridge, MA: The MIT Press. Davis, S. J., and M. Henrekson. 1999. Explaining national differences in the size and industry distribution of employment. Small Business Economics 12:59–83. Dickens, W. T., and E. L. Groshen. 2003. Status of the international wage flexibility project after the authors’ conference. The Brookings Institution and the Federal Reserve Bank of New York, May. Dollar, D., M. Hallward-Driemeier, and T. Mengistae. 2003. Investment climate and firm performance in developing economies. World Bank Policy Research Working Paper no. 3323. Washington, D.C.: World Bank, Development Research Group. Dunne, T., M. Roberts, and L. Samuelson. 1989. The growth and failure of U.S. manufacturing plants. Quarterly Journal of Economics 104:671–98. ———. 1988. Patterns of firm entry and exit in U.S. manufacturing industries. RAND Journal of Economics 19 (4): 495–515. Doppelhofer, G., R. Miller, and X. Sala i Martin. 2004. Determinants of long-term growth: A bayesian averaging of classical estimates (BACE) approach. American Economic Review 94 (4): 813–35. Eslava, M., J. Haltiwanger, A. Kugler, and M. Kugler. 2004. The effects of structural reforms on productivity and profitability enhancing reallocation. Journal of Development Economics 75 (2): 333–71. EUROSTAT. 1998. Enterprises in Europe, data 1994–95. Fifth Report European Commission. ———. 2004. Business demography in Europe—Results for 10 member states and Norway. Luxembourg: European Commission. Foster, L., J. Haltiwanger, and C. J. Krizan. 2001. Aggregate productivity growth: Lessons from microeconomic evidence. In New developments in productivity analysis, ed. Edward Dean, Michael Harper, and Charles Hulten pp. 303–418. Chicago: University of Chicago Press. ———. 2002. The link between aggregate and micro productivity growth: Evidence from retail trade. NBER Working Paper no. 9120. Cambridge, MA: National Bureau of Economic Research, August. Geroski, P. 1995. What do we know about entry? International Journal of Industrial Organization 13:421–40.

76

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

Griliches, Z., Regev, H. 1995. Firm productivity in Israeli industry: 1979–1988. Journal of Econometrics 65 (1): 175–203. Griffith, R., S. Redding, and J. Van Reenen. 2000. Mapping the two faces of R&D: productivity growth in a panel of OECD industries. Institute for Fiscal Studies Working Papers no. 2. Hallward-Driemeier, M., S. Wallsten, L. C. Xu. 2004. The investment climate and the firm: Firm-level evidence from China. World Bank Policy Research Working Paper no. 3003. Haltiwanger, J., R. Jarmin, and T. Schank. 2003. Productivity, investment in ICT and market experimentation: Micro evidence from Germany and the U.S.. Center for Economic Studies Working Paper no. 03-06. Haltiwanger, J., and H. Schweiger. 2004. Firm performance and the business climate: Where does ICA fit in? Mimeograph. Klapper, L., L. Laeven, and R. Rajan. 2006. Entry regulation as a barrier to entrepreneurship. Journal of Financial Economics 82:591–629. Levinsohn, A., and J. Petrin. 2005. Measuring industry productivity growth using plant-level data. Mimeograph. Lentz, R., and D. Mortensen. 2005. Productivity growth and worker reallocation. International Economic Review 46 (3): 731–49. Martin, R. 2005. Providing evidence based on business micro data: Methods and results. London School of Economics. Unpublished Manuscript. Micco, A., and C. Pages. 2006. The economic effects of employment protection: Evidence from international industry-level data. IZA Discussion Papers no. 2433. Nicoletti, G., and S. Scarpetta. 2003. Regulation, productivity and growth: OECD evidence. Economic Policy 18 (36): 9–72. Nicoletti, G., S. Scarpetta, and O. Boylaud. 1999. Summary indicators of product market regulation with an extension to employment protection legislation. OECD Economics Department Working Paper no. 226. Paris: Organization for Economic Cooperation and Development. Olley, G. S., and A. Pakes. 1996. The dynamics of productivity in the telecommunications equipment industry. Econometrica 64 (6): 1263–97. Organization for Economic Cooperation and Development (OECD). 2004. The economic impact of ICT: Measurement, evidence, and implications. Paris: OCED. Oviedo, A. M. 2005. Doing business in developing economies: The effect of regulation and institutional quality on the productivity distribution. Unpublished Manuscript. Roberts, M., and J. Tybout. 1997. Producer turnover and productivity growth in developing countries. The World Bank Research Observer 12 (1): 1–18. Scarpetta, S., ed. 2004. The sources of economic growth in OECD countries. Paris: OECD. http://ariel.sourceoecd.org/vl1234676/cl68/nw1/rpsv/cgi-bin/ fulltextew.pl?prpsv/ij/oecdthemes/99980134/v2003n1/s11/p11.idx Scarpetta, S., P. Hemmings, T. Tressel, and J. Woo. 2002. The role of policy and institutions for productivity and firm dynamics: Evidence from micro and industry data. OECD Economics Department Working Papers, no. 329. Sutton, J. 1997. Gibrat’s legacy. Journal of Economic Literature 35 (1): 40–59. World Bank. 2004. World development report: A better investment climate for everyone. Washington, D.C.: World Bank.

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

Comment

77

Timothy Dunne

Over the last twenty years, there has been a substantial growth in the empirical literature on firm dynamics. This literature has documented the tremendous churning of firms through the entry and exit process. It is now a well-established fact that industry gross entry and exit rates and the concomitant labor flows exceed net rates by a substantial amount. The impact of this churning process of firms has been examined in a number of distinct literatures. The chapter by Bartelsman, Haltiwanger, and Scarpetta is an important addition to the literature on firm dynamics and the microeconomics of productivity. First, the chapter provides a detailed comparison of the patterns of firm dynamics across a wide range of countries and focuses on the role of firm dynamics in the evolution of industry productivity across countries. Second, the chapter takes a relatively novel empirical approach to a cross-country comparison project by working with individual researchers from each country to homogenize data construction methodologies. This is important, as the measurement of firm turnover can vary markedly across countries. A main contribution of this chapter is the development of the crosscountry data set on firm dynamics. Most data on firm dynamics are generated as a by-product of a country’s administrative data collection systems or from business registers used as the basis of statistical frames in national statistics systems. The data on business dynamics are constructed by linking these cross-sectional data sources across time to create a panel structure on businesses. The definitions of what a firm is, when it is considered an entrant and an exit, how to deal with mergers and acquisitions and other such issues defining the life of the firm in the data are often determined by the administrative data collection systems (e.g., tax or unemployment insurance systems) or the nature of the data collected by the statistical agency. This creates challenges for using the systems to measure firm dynamics within an individual country but also creates challenges for comparing statistics across countries. For example, the inclusion rules for very small firms in business registers often differ across countries. Since firm turnover in very small firms can be quite high, differences in inclusion rules can greatly affect the firm turnover rates. One can see how such size cutoff differences affect firm turnover statistics by comparing the panels in figure 1.2. Alternatively, the methods that countries use to handle merger and acquisitions in various business registers can differ as well. These differences in measurement affect the entry and exit statistics produced and make cross-country comparisons from existing studies of firm dynamTimothy Dunne is a senior economic advisor in the Research Department at the Federal Reserve Bank of Cleveland.

78

Eric Bartelsman, John Haltiwanger, and Stefano Scarpetta

ics statistics problematic. A real contribution of this chapter is that the authors have made a serious attempt to make their cross-country data more comparable by developing a set of measurement protocols and having researchers in various countries apply these protocols to the underlying microdata. This approach is referred to in the chapter as the analysis of distributed microdata. Although differences in measurement procedures certainly remain across countries and the authors are careful to point these out, this chapter reports on the development of the most comprehensive and comparable set of cross-country industry-level statistics on firm turnover and a related set of productivity decompositions to date. Besides the basic data development contributions the chapter makes, it is also loaded with new facts about firm dynamics. In all countries, the turnover of firms (entry plus exit) greatly exceeds the net entry rates. These high turnover rates occur in large countries and small countries and in high income and moderate income countries alike. Surprisingly, a country like France—often thought to have institutions that restrict firm dynamics— has firm turnover rates similar to the United States (fig. 1.2). In fact, the United States—perceived to have low institutional barriers to the development of new firms—is usually ranked toward the middle of the distribution of countries with regard to firm turnover. Overall, industrial countries have lower firm turnover rates than less-developed countries, and manufacturing industries generally have lower turnover rates than service industries. What these striking patterns imply for thinking about the evolution of industries is that models of industry competition need to focus on equilibrium firm turnover (such as the models developed by Hopenhayn [1992] and Apslund and Nocke [2006]) and not simply on the equilibrium number of producers in a market. Firm turnover is high, and it is a persistent feature across countries and across industries. The cross-industry and cross-country turnover patterns presented in the chapter raise the question of whether the variation in country-industry turnover rates is driven primarily by industry or country effects. Strong industry effects suggest that industry-specific technologies are an important driver of firm turnover. Alternatively, if country effects dominate, this suggests that country-specific institutional factors may play an important role. Though, to be sure, strong country effects are also consistent with persistent differences in measurement procedures across countries. I analyze this issue using a simple model and the statistics presented in table 1.6 of the chapter. The model estimated is yci c i ic where yci is firm turnover in country c and industry i, c represents a set of country effects, i controls for industry effects, and ic is the error term. The adjusted R 2 from the model estimated with both industry and country effects is .348, the adjusted R 2 with country effects only is .246, and with in-

Measuring and Analyzing Cross-Country Differences in Firm Dynamics

79

dustry effects only the adjusted R 2 is .079. Both sets of controls are statistically significant at conventional levels of significance. The results indicate that country effects explain more of the variation in cross industry-country turnover than industry effects. This finding is true if one focuses only on the industrial countries in the sample as well. This suggests the differences in turnover rates in countries are not simply driven by differences in industrial mix across countries, but that there are either systematic differences in firm turnover across countries or perhaps systematic differences in measurement. The authors are careful throughout the chapter to emphasize this latter possibility. Even with this caveat, a surprising result is the relatively low amount of the variation explained by industry controls. In a comparison of job flow data between Canada and the United States, Baldwin, Dunne, and Haltiwanger (1998) find that industry effects play a dominant role in explaining cross country-industry differences in job turnover. The chapter finishes up with a set of cross-country labor productivity decompositions that show the relative importance of within-firm changes in productivity, between firm shifts in productivity and the contribution of firm turnover to overall changes in productivity. This analysis shows the novelty of the distributed microdata approach, as researchers in each country were sent computer programs to run on the microdata. The authors of the chapter only have access to a small subset of underlying microdata used in these productivity decompositions. As previous studies have found, the within-firm component dominates the between-firm component in explaining productivity growth of continuing firms in most countries. Entry and exit accounts for 20 to 50 percent of labor productivity growth across countries. Exit has the most consistent effect, as the failure of low productivity firms boosts aggregate productivity in all countries. The productivity analysis illustrates the important role that firm dynamics play across a wide range of countries in the evolution of aggregate productivity growth. Overall, the chapter makes an important contribution to the empirical literature on producer dynamics. It provides many new facts and offers a novel approach to analyzing cross-country data based on confidential firm and establishment-level records. References Asplund, M., and V. Nocke. 2006. Firm turnover in imperfectly competitive markets. Review of Economic Studies 73 (2): 295–327. Baldwin, J., T. Dunne, and J. Haltiwanger. 1998. A comparison of job creation and job destruction in Canada and the United States. The Review of Economics and Statistics 80 (3): 347–56. Hopenhayn, H. 1992. Entry, exit and firm dynamics in long-run equilibrium. Econometrica 60 (5): 1127–50.

2 Studying the Labor Market with the Job Openings and Labor Turnover Survey R. Jason Faberman

2.1 Introduction In recent years, the Bureau of Labor Statistics (BLS) has released several new data products that describe the dynamics of the labor market. One of these is the Job Openings and Labor Turnover Survey (JOLTS). The survey is the only existing data source to measure vacancies, hires, and separations at the establishment level at a regular (monthly) frequency in the United States. The public data were released in 2002, and with its aggregate estimates, the JOLTS has already provided valuable insight on the behavior of worker recruiting and worker turnover. This chapter details the characteristics of the JOLTS data and provides some descriptive evidence at both the aggregate and establishment level. The discussion is primarily for researchers wishing to use the data in their own studies. As such, it characterizes the data scope, composition, measurement, and estimation, as well as the research potential these data have. The chapter also presents some basic evidence on the aggregate and establishment-level relations of vacancies and worker flows to state-level unemployment and other labor market conditions. The JOLTS is an evolution of earlier data series, notably the BLS Labor

R. Jason Faberman is an economist at the Federal Reserve Bank of Philadelphia. I thank Eva Nagypál, Steve Davis, John Haltiwanger, Jim Spletzer, Rick Clayton, John Wohlford, and the editors and conference participants for this volume for helpful comments on this chapter. Any remaining errors are my own. Work on this project was done while I was an employee at the Bureau of Labor Statistics. The views expressed are my own and do not necessarily reflect the positions of the U.S. Bureau of Labor Statistics, the Federal Reserve Bank of Philadelphia, or the Federal Reserve System.

83

84

R. Jason Faberman

Turnover Survey.1 The survey also builds on the research on vacancies, worker turnover, and unemployment done by Abraham (1987), Blanchard and Diamond (1989, 1990), and others, as well as theories of labor market search and matching.2 This research, and the rapidly developing research that has followed, underscores the importance of understanding labor market dynamics. As such, the BLS designed the JOLTS to capture these dynamics. The result is a high-frequency, timely survey with several major advantages over previous data. The first is its reporting of hires and separations directly by an establishment. Other data (e.g., administrative wage records, the Current Population Survey [CPS]) forced researchers to infer these flows from observed changes in a worker’s employment status. The second is its reporting of job openings or vacancies directly by an establishment. Previously, researchers had to rely on indices (such as the Help Wanted Index) for a measure of vacancies. This approach did not lend itself to studying vacancy behavior at the micro level. This was an issue because theories of labor market search often model behavior at the level of workers and firms. The final advantage is its distinction between quits and layoffs. The two types of separations have opposing cyclical patterns, and in general, they represent voluntary and involuntary severances, respectively. Existing research using JOLTS is currently sparse, but thanks to the ballooning of research on the theory and evidence of labor dynamics, it is expanding rapidly.3 Clark (2004) summarizes the aggregate evidence since the inception of JOLTS. Hall (2005a) and Shimer (2007a) use the JOLTS data to study whether standard theories of labor market search can match the volatility of vacancies relative to unemployment. Valetta (2005) uses the JOLTS data to study the Beveridge Curve. Besides this chapter, Davis, Faberman, and Haltiwanger (2006, 2007) and Faberman and Nagypál (2007) are the first to present analyses of the establishment-level JOLTS data. The data have also become popular with the press and various industry and policy groups. In all, the JOLTS data complement existing data and can vastly improve our understanding of the labor market. The following section defines the concepts and terminology used throughout the chapter, discusses the data sample and estimation process, and highlights the survey’s research strengths and limitations. The next section explores the relation between vacancies and unemployment at both the aggregate and establishment level. An exploration of the relations between worker flows and aggregate and local labor market conditions comes 1. The Labor Turnover Survey measured vacancies, ascensions, and separations for the manufacturing industry; the BLS discontinued the survey in 1982. See Davis and Haltiwanger (1998) and Clark and Hyson (2001) for more on this survey. 2. See, for example, Pissarides (1985) and Mortensen and Pissarides (1994). 3. Davis and Haltiwanger (1999) review the empirical work on labor dynamics, while Mortensen and Pissarides (1999) and Rogerson, Shimer, and Wright (2005) review the theoretical work on labor search.

Studying the Labor Market: the Job Openings and Labor Turnover Survey

85

next. The final section concludes and discusses potential avenues of future research. 2.2 Data and Measurement 2.2.1 Source Data The BLS uses the JOLTS data to publish monthly estimates of job openings (i.e., vacancies), hires, and separations, with separations reported as quits, layoffs and discharges, and other separations (e.g., retirements).4 The data start in December 2000 and are updated monthly, with the latest estimates available within two months of a month’s end. The survey covers all nonfarm establishments, the same sample frame as the Current Establishment Statistics (CES) survey. The aggregate estimates are available nationally, for four major regions and by 2-digit North American Industry Classification System (NAICS) sector.5 The BLS reports JOLTS estimates in levels and as rates. The primary unit of observation for the JOLTS survey is the establishment, which covers the operations of a firm at a single physical location. Firms can have one or more establishments. Like the CES, the JOLTS coverage of nonfarm payrolls implies that it generally excludes the selfemployed and nonprofit organizations not covered under a state unemployment insurance program. The JOLTS data are a sample of roughly 16,000 establishments surveyed each month. Establishments report their employment, hires, separations (broken out by type), and job openings for the month within the framework of the survey definitions. The survey consists of overlapping panels that are each sampled for eighteen months, and is weighted so that its employment estimates match those of the CES.6 For the analyses in this chapter, I use the JOLTS establishment data pooled over the December 2000–January 2005 period. For most aggregate statistics, I use the unrestricted sample of all observations. For the establishment-level analyses, I use a restricted sample of all observations with positive employment reported in two consecutive months. This minimizes the potential spurious effects of outliers and inconsistent data reporters. The resulting sample contains 372,288 observations, which represent 92.8 percent of the pooled observations (and 92.3 percent of the pooled employment). Due to the requirement of reporting in consecutive months, the 4. The published statistics are available at http://www.bls.gov/jlt/home.htm 5. The NAICS replaces the older Standard Industrial Classification (SIC) system. The most notable change in NAICS is its classification of the service sector into several separate sectors, such as information, professional and business services, education and health, and travel and hospitality. In general, two-digit NAICS sectors correspond to major SIC industry sectors (e.g., manufacturing, services, etc.) 6. See Crankshaw and Stamas (2000) for details on the JOLTS sample weighting procedure.

86

R. Jason Faberman

restricted sample excludes the December 2000 observations.7 Results in my analyses are all sample-weighted, and where noted, also employmentweighted. Estimates are not seasonally adjusted, unless otherwise noted. 2.2.2 Concepts and Definitions The JOLTS survey form has four major data elements: employment, hires, separations, and job openings, with separations broken into three subcategories. Elements differ in their timing, and their definitions are succinct in what they do (and do not) capture. These definitions are created so that BLS can optimize its measurement of changes in employment dynamics and to minimize respondent confusion in reporting. 1. Employment. Establishments report their employment for the pay period that includes the twelfth of the month. As such, it is a point-in-time measure of the employment level. An individual is counted as employed if they are on an establishment’s payroll. The reference period and definition are standard for all federal statistical establishment surveys and allows the BLS to accurately benchmark the survey to the CES. 2. Hires. Hires are new additions to the workforce of an establishment. They include new hires, rehires, seasonal and short-term hires, recalls after a layoff lasting more than seven days, and transfers from other worksites. The JOLTS hires are a flow measure that is meant to capture all occurrences between the first and last day of the month. 3. Separations. Separations are subtractions from the workforce of an establishment. These removals include quits, layoffs lasting more than seven days, firings and other discharges, terminations of short-term and seasonal workers, retirements, and transfers to other worksites. The JOLTS separations are also a flow measure meant to capture all occurrences between the first and last day of the month. 4. Quits. Quits are the subset of separations initiated by an employee. 5. Layoffs and discharges. Layoffs and discharges are the subset of separations initiated by the employer that include all layoffs lasting more than seven days, firings and other discharges, and terminations of short-term and seasonal workers. 6. Other separations. Other separations include retirements, transfers, and all other separations not covered by the previous two categories. 7. Job openings (or vacancies). These are all unfilled, posted positions available at an establishment on the last day of the month. The vacancy must be for a specific position where work can start within thirty days, and an active recruiting process must be underway for the position. Vacancies are a point-in-time estimate, and its definition has two notable measurement implications. First, JOLTS does not capture vacancies for hires that 7. Even with the noted restrictions, the aggregate estimates from the unrestricted and restricted samples match each other very closely.

Studying the Labor Market: the Job Openings and Labor Turnover Survey

87

start more than a month after their posting. Second, JOLTS does not capture vacancies that are both posted and filled within the month. Note that the unemployment measure from the Current Population Survey (CPS), which is also a point-in-time measure, has a similar feature, since it must deal with individuals who both enter and leave unemployment between survey periods.8 Hires and separations are expressed as rates by dividing each by employment. The vacancy rate is slightly different. It uses the sum of vacancies and employment in its denominator, making this rate a fraction of filled and unfilled jobs. This is analogous to the unemployment rate, which uses the labor force as its denominator (i.e., it is a fraction of employed and unemployed labor). Given the definitions of employment and worker flows, an individual who stops receiving a paycheck may not count as part of employment, but also may not count as a separation. Examples of this occurrence include teachers, temporary help workers retained but not assigned to a particular job (i.e., on call), and layoffs of less than seven days.9 2.2.3 Some Notes on Research with the JOLTS Data The published JOLTS data have already provided interesting evidence about the labor market, yet the survey remains relatively new and continues to evolve. The passage of time will lengthen the time series, making the survey even more useful in understanding the cyclical behavior of worker flows and vacancies. Researchers should be aware that the JOLTS sample is only representative nationwide, by major industry, and by region. With a sample size of 16,000 establishments, exploiting the data at finer industrial or geographic detail will likely face issues of precision and selection. The multiple reference periods for employment, worker flows, and vacancies can complicate some research studies (Davis, Faberman, and Haltiwanger [2007], however, have one method to deal with the timing issue). The survey does not have data on wages or other establishment characteristics, though the possibility exists for linking JOLTS data to other microdata sources, like the Quarterly Census of Employment and Wages, to obtain this information. A significant issue for JOLTS is the accurate measurement of hires and separations. Nagypál discusses some of these issues later in this volume, while Wohlford et al. (2003) and Faberman (2005) have BLS research 8. “Active recruiting” in the JOLTS is a very broad definition that includes networking and word-of-mouth recruiting. The time aggregation issue (i.e., the posting and filling of vacancies within the month) may have notable macroeconomic implications, as Shimer (2007) argues is the case with unemployment. Davis, Faberman, and Haltiwanger (2007) study the effects of time aggregation on the JOLTS vacancy measure. 9. In light of this issue, the JOLTS has separate surveys for education and temporary help establishments.

88

R. Jason Faberman

Fig. 2.1

Measurement issues with labor turnover and employment

aimed at understanding and improving measurement. An important finding from this work is that the measurement of hires and separations is not as simple as theory would dictate. As noted earlier, the relations between hires, separations, and the level of employment are complicated by the fact that employed workers can exist empirically in one of two states: employed and working, or employed but not working (where working is defined as on the payroll). Other complications also exist—for instance, hires may occur months prior to the start of work.10 These nuances make hires and separations more difficult to measure than a point-in-time count of employees on payroll. Figure 2.1 illustrates the possible transitions a worker can undertake (and the relative difficulty of measuring each) based on internal analyses by BLS program staff. As one might expect, the easiest flows to measure are those where an employed and working individual either is hired or separates. Flows that deal with employed individuals not currently on payroll are where measurement difficulties arise, with the greatest difficulties occurring when an individual separates from a job match during a period of nonwork. Wohlford et al. (2003) find that separations are disproportionately harder to measure, creating an asymmetry between the measurement issues of hires and separations. Faberman (2005) further finds that contracting establishments are less likely than other establishments to respond to the survey. This asymmetry in turn results in a disparity between the CES employment trend and the cumulative difference between JOLTS hires and separations in the aggregate data. 10. The JOLTS defines a hire when the work is actually started, and asks respondents to not to count a hire until that time.

Studying the Labor Market: the Job Openings and Labor Turnover Survey

89

The BLS has taken steps (such as the creation of separate survey forms for schools and temporary help firms) to improve worker flow measurement. The BLS also continues research on JOLTS data measurement, which is obviously important for improving data quality, but can also prove useful in understanding how the employment behavior of establishments translates into the measured statistics. 2.3 Vacancies and the Beveridge Curve 2.3.1 Aggregate Relations The publicly available JOLTS estimates present a wealth of new evidence for the aggregate labor market. While the time series is relatively short, it spans a recession and slow labor market recovery, allowing researchers a glimpse of the cyclical behavior of vacancies and labor turnover. The National Bureau of Economic Research (NBER) states that a recession begins in March 2001 and ends in November of 2001, though losses in payroll employment (based on CES estimates) continue through August 2003. Figure 2.2 illustrates the aggregate behavior of vacancies and unemployment between December 2000 and January 2005. The unemployment rate estimates come from the CPS. Throughout the period, the two move in opposite directions, and the patterns are consistent with the behavior of employment growth during this period. In 2001, unemployment rises while

Fig. 2.2

Vacancy and unemployment rates, December 2000–January 2005

Source: Vacancies are from public JOLTS nonfarm estimates and unemployment is from the CPS. Both are seasonally adjusted.

90

R. Jason Faberman

vacancies fall. Unemployment rates hover around 6 percent and vacancy rates remain near 2 percent for most of 2002 and 2003. Beginning in mid2003, the unemployment rate begins to fall while the vacancy rate starts to rise; these patterns continue into the beginning of 2005. An important relation in the theory of labor search and matching is the Beveridge Curve, which predicts that the cyclical movements of vacancies and unemployment should have an inverse relation. Figure 2.3 plots the aggregate Beveridge Curve, with the JOLTS vacancy rate on the vertical axis and the CPS unemployment rate on the horizontal axis. The solid line represents the quadratic trend of the monthly vacancy-unemployment relation over the sample period. The dotted line charts the path of the vacancyunemployment relation. The labor market begins the period relatively tight, with a ratio of vacancies to unemployment of 0.85. Vacancies then fall as unemployment rises, leading to a movement downward along the trend line. This pattern continues until mid-2003, when the unemployment rate peaks and the vacancy rate reaches a trough. At this point, the ratio of vacancies to unemployment is at a low of 0.38. The relation then loops around and moves back up along the trend line, with labor market tightness increasing as a result. Given the economic downturn and recovery during this period, the evidence is consistent with the theoretical predictions of the Beveridge Curve. One can also use the aggregate JOLTS estimates to evaluate the magni-

Fig. 2.3 Vacancy vs. unemployment rates (Beveridge Curve), December 2000–January 2005 Source: Vacancies are from public JOLTS nonfarm estimates and unemployment is from the CPS. Both are seasonally adjusted. Notes: The dotted line represents the time-series path of the unemployment-vacancies relation, while the solid line represents the quadratic trend of the relation.

Studying the Labor Market: the Job Openings and Labor Turnover Survey

91

tudes, volatility, and comovement of worker flows and vacancies. Table 2.1 presents the aggregate means, standard deviations, and correlations (contemporaneous and dynamic) of vacancies, hires, and separations with relevant labor market variables (i.e., employment growth and unemployment). The vacancy rate averages 2.4 percent. It is the most volatile and persistent of the JOLTS statistics. It is strongly negatively correlated with unemployment, strongly positively correlated with hires, and to a lesser extent, positively correlated with employment growth. The dynamic correlations of vacancies to unemployment remain persistently high for both lagging and leading values, with the contemporaneous correlation being the strongest. The dynamic correlations of vacancies to net growth are significant and positive for lagging values of net growth, but insignificant, and in some cases Table 2.1

Vacancy and labor turnover aggregate summary statistics

Mean (Standard Deviation) Correlation with Unemployment (Ut) Net growth (Nt) Vacancies (Vt) Hires (Ht) Autocorrelations AR(1) AR(2) AR(3) Dynamic correlations with unemployment Ut–3 Ut–2 Ut–1 Ut Ut+1 Ut+2 Ut+3 Dynamic correlations with net growth Nt–3 Nt–2 Nt–1 Nt Nt+1 Nt+2 Nt+3

Vacancies (Vt)

Hires (Ht)

Separations (St)

Quits (Qt)

Layoffs (Lt)

0.024 (0.003)

0.033 (0.002)

0.032 (0.002)

0.018 (0.002)

0.014 (0.001)

–0.97** 0.22 1.00

–0.78** 0.54** 0.82** 1.00

–0.77** –0.29** 0.73** 0.68**

–0.93** 0.06 0.92** 0.83**

0.05 –0.75** –0.12 –0.13

0.97** 0.94** 0.90**

0.77** 0.68** 0.63**

0.78** 0.79** 0.64**

0.93** 0.91** 0.84**

0.37** 0.37** 0.00

–0.86** –0.91** –0.95** –0.97** –0.96** –0.95** –0.93**

–0.60** –0.67** –0.73** –0.78** –0.84** –0.85** –0.89**

–0.88** –0.85** –0.84** –0.77** –0.74** –0.69** –0.60**

–0.88** –0.91** –0.92** –0.93** –0.92** –0.90** –0.84**

–0.50** –0.31** –0.21** 0.05 0.09 0.16 0.24

0.43** 0.37** 0.36** 0.22 0.05 –0.14 –0.28

0.45** 0.38** 0.37** 0.54** 0.14 0.04 –0.09

0.13 –0.04 0.02 –0.29** –0.25 –0.42** –0.41**

0.31** 0.21 0.21 0.06 –0.09 –0.29** –0.39**

–0.25 –0.49** –0.37** –0.75* –0.38** –0.38** –0.18

Source: Author’s calculations based on public JOLTS and CPS aggregate data (seasonally adjusted). Notes: Net growth rates are the difference between the hires and separations rates. Statistics are based on data from December 2000 through January 2005. Asterisks (**) denote significance at the 5 percent level.

92

R. Jason Faberman

negative, for leading values of net growth, implying that growth is a good predictor of vacancies, but vacancies are not a good predictor of growth. Table 2.2 lists the summary statistics for vacancies, hires, and separations by industry and region. Vacancy rates vary considerably by industry, though industries with high worker turnover are not necessarily the industries with the highest vacancy rates. Instead, vacancy rates tend to be highest in industries with considerable expansions during the sample period, such as professional and business services, and education and health services. Education and health services have the highest vacancy rate despite also having some of the lowest turnover rates.11 Manufacturing, which underwent a large employment decline over this period, has one of the lowest vacancy rates (along with construction and resources). To a lesser extent, vacancies vary by region. In general, the South and West, which have relatively high employment growth, have higher rates of vacancies. 2.3.2 Vacancy Postings and the Local Labor Market Since the JOLTS data are collected at the establishment level, they are especially powerful for a micro-level study. Most theories of labor market search model the relation of vacancies to unemployment as the outcome of firm-level decisions of whether to post vacancies in response to current labor market conditions. Theory dictates that, controlling for outside factors, the negative aggregate relation of unemployment to vacancies should also hold at the micro level. To test this, I estimate the relation of establishment vacancy rates to local (i.e., state) unemployment rates.12 I start with the basic statistical properties of establishment-level vacancies, particularly since empirical evidence on them is sparse. Table 2.3 lists these properties for the pooled estimates of vacancy rates for establishment i in state j at month t (Vijt). The table lists separate vacancy rate statistics for all observations and for the subsample of observations with at least one vacancy reported. Statistics are employment-weighted. Only 12 percent of establishment-month observations have a vacancy posted at the end of the month, though these represent 53 percent of employment. This statistic is somewhat misleading, however, since at the monthly frequency many establishments have no net change in employment (79 percent) or hires (81 percent), and likely do not need a vacancy posting. Nevertheless, conditional on changing employment levels, only 34 percent of establishmentmonth observations (representing 67 percent of employment) have a 11. Davis, Faberman, and Haltiwanger (2007) note that the JOLTS vacancy rates tend to be higher in industries with more formal hiring practices. 12. Note that there is a timing difference in the reporting of vacancies and unemployment for a given month. Reported vacancies are those posted at the end of the month, while the unemployed are those who actively looked for work in the four weeks prior to the week of the 19th. This is true for both national and state-level unemployment. Thus, the vacancy rates used in this study will lead unemployment rates by about two weeks.

Table 2.2

Vacancy and labor turnover summary statistics by industry and region

Major industry Resources Construction Manufacturing Transportation and utilities Retail trade Information Financial activities Professional and business services Education and health Leisure and hospitality Other services Government Region Northeast Midwest South West Across-industry correlations with Net growth (Nj) Vacancies (Vj) Hires (Hj)

Vacancies (Vj)

Hires (Hj)

Separations (Sj)

Quits (Qj)

Layoffs (Lj)

Quit share (Qj /Sj)

0.011 (0.003) 0.014 (0.004) 0.014 (0.003) 0.016 (0.003) 0.019 (0.004) 0.020 (0.005) 0.021 (0.002) 0.029 (0.005) 0.033 (0.005) 0.028 (0.006) 0.019 (0.004) 0.018 (0.003)

0.031 (0.008) 0.054 (0.013) 0.022 (0.004) 0.025 (0.005) 0.044 (0.009) 0.021 (0.004) 0.022 (0.004) 0.043 (0.006) 0.027 (0.005) 0.063 (0.013) 0.032 (0.007) 0.015 (0.005)

0.031 (0.006) 0.055 (0.007) 0.027 (0.004) 0.026 (0.003) 0.043 (0.007) 0.023 (0.005) 0.023 (0.004) 0.039 (0.007) 0.023 (0.004) 0.059 (0.011) 0.032 (0.009) 0.012 (0.004)

0.013 (0.004) 0.020 (0.004) 0.012 (0.002) 0.013 (0.002) 0.027 (0.005) 0.013 (0.003) 0.013 (0.003) 0.020 (0.004) 0.015 (0.003) 0.039 (0.008) 0.019 (0.004) 0.006 (0.002)

0.013 (0.006) 0.033 (0.008) 0.012 (0.003) 0.011 (0.003) 0.013 (0.005) 0.008 (0.003) 0.007 (0.002) 0.016 (0.004) 0.007 (0.002) 0.018 (0.005) 0.011 (0.006) 0.004 (0.002)

0.421

0.021 (0.003) 0.020 (0.003) 0.023 (0.003) 0.022 (0.004)

0.029 (0.006) 0.032 (0.006) 0.035 (0.005) 0.033 (0.005)

0.028 (0.005) 0.031 (0.005) 0.034 (0.004) 0.033 (0.004)

0.014 (0.003) 0.017 (0.004) 0.020 (0.003) 0.018 (0.003)

0.012 (0.003) 0.012 (0.002) 0.012 (0.002) 0.013 (0.002)

0.498

0.74** 1.00

0.23 0.33 1.00

0.05 0.21 0.98**

0.21 0.38 0.94**

–0.20 –0.07 0.80**

0.370 0.445 0.500 0.626 0.577 0.589 0.512 0.638 0.661 0.593 0.488

0.549 0.585 0.545

0.47 0.66** 0.32

Source: Author’s tabulations from JOLTS data. Notes: Net growth rates are the difference between the hires and separations rates. Means are reported, with standard deviations in parentheses. Statistics are based on data from December 2000 through January 2005. **Significant at the 5 percent level.

94

R. Jason Faberman

Table 2.3

Local unemployment and establishment vacancy summary statistics

Mean Standard deviation Median 10th, 90th percentiles Number of observations Share of employment [Estabs.] with Vijt > 0 Share of empl. [estabs.] with Vijt > 0 | Net ≠ 0 Percent of variation explained by Month effects State effects Establishment effects

All establishments

Establishments with positive vacancies only

0.021 0.039 0.003 0.000, 0.063 372,288 0.533 [0.122] 0.674 [0.336]

0.040 0.046 0.026 0.005, 0.089 175,981 n.a.

0.5 0.7 40.7

n.a.

0.8 0.6 66.0

Source: Author’s tabulations from pooled JOLTS microdata. Notes: Estimates are based on data from December 2000 through January 2005. Estimates (except the share of establishments with positive vacancies) are weighted by employment. n.a. = not applicable.

vacancy posted at the end of the month. The vacancy rate for these observations is nearly double the rate for all observations. When looking at these statistics, remember that the JOLTS vacancy definition does not capture long-term vacancy postings or vacancies that are posted and filled within the month. Nevertheless, the statistics may reflect the fact that establishments use less formal hiring practices than vacancies with some frequency, or that some establishments may have relatively short vacancy durations. Davis, Faberman, and Haltiwanger (2007) explore these conjectures. Table 2.3 also shows that state and month differences account for less than 1 percent of the establishment-level vacancy variation. Establishment effects account for 41 percent of the variation of all vacancies and 66 percent of the variation conditional on an establishment’s posting of at least one vacancy. This suggests that much of the micro-level variation stems from different vacancy-posting behaviors among establishments rather than varying behaviors within local labor markets, or during certain points in the business cycle. To explore the relation between establishment vacancy postings and state unemployment, I regress establishment vacancy rates on state unemployment rates. The unemployment rates come from the BLS Local Area Unemployment Statistics (LAUS) data, which use the CPS and other data sources to produce its estimates. In terms of magnitudes, unemployment rates for many states are similar to the national rate, though the average rates for several states are several percentage points higher or lower than the national rate. The cyclical volatility of unemployment for some states

Studying the Labor Market: the Job Openings and Labor Turnover Survey

Fig. 2.4

95

Establishment vacancies and their relation to the local unemployment

Source: Author’s estimation of establishment vacancy rates on a fourth-order polynomial of the state unemployment rate using JOLTS establishment microdata and LAUS unemployment estimates. State and establishment fixed effects are used where noted. See text for details.

also tends to be higher than the volatility at the national level. To allow for a nonlinear relation, I use a fourth-order polynomial of unemployment. Nonparametric analyses of the data (not reported here) suggest that a polynomial of this order fits the data well. I weight the regressions by employment and run separate regressions that include state and establishment fixed effects.13 The predicted relations of vacancies to unemployment from these regressions are in figure 2.4. There are separate predicted trends for the unconditional relation, the relation with state effects removed, and the relation with establishment effects removed. As theory predicts, vacancy postings are inversely related to the local unemployment rate. The polynomial coefficients for each regression are all jointly significant at the 5 percent level. The relation is steeper once I control for state or establishment effects. This is likely due to the large variation in trend unemployment rates across states, suggesting that not controlling for this trend variation understates the responsiveness of vacancies to unemployment. It also suggests that the covariation of vacancies and unemployment occurs more from time variation within states than from level differences across states. Controlling for establishment rather than state effects, however, makes little difference for the results. This suggests that much of the between13. Note that state fixed effects are a subset of establishment fixed effects, in the sense that establishments cannot change their location in the data.

96

R. Jason Faberman

establishment variation in the relation is between states, and not necessarily between establishments within states. Overall, the results suggest that a Beveridge Curve relation in fact exists at both the establishment and aggregate levels. 2.4 Worker Flows and the Labor Market 2.4.1 Aggregate Evidence I now focus on the JOLTS worker flow estimates. Figure 2.5 plots the time series of aggregate hires and separations rates over the sample period. Their patterns reflect the downturn and recovery during this time. Hires decline during the recession and remain low through mid-2003. The hiring rate then begins a gradual, steady increase though the start of 2005. Separations are high throughout most of 2001. They then decrease in early 2002, and reach a low in mid-2003. Separations then increase gradually through the end of the sample period, even though net growth is strong during this time; evidence not reported here shows that movements in the quits rate drive this increase. In figure 2.6, I plot quarterly worker flow rates calculated from the JOLTS against the gross job losses estimates from the Business Employment Dynamics (BED) program.14 Hires and gross job gains move together for the most part, though hiring has a more pronounced decline during the 2001 recession and a more pronounced rise during 2004. Gross job losses, relative to separations, show a considerably larger rise during the 2001 recession and a decline thereafter, whereas separations begin to rise again starting in mid-2003. The difference between the two series at the end of the period can be attributed to the increase in the quits rate during this time. As with vacancies, the aggregate estimates of worker flows are summarized in tables 2.1 and 2.2. Table 2.1 shows that over this period the hires rate averages 3.3 percent, while the separations rate averages 3.2 percent. More than half (54 percent) of separations, on average, are quits. Hires and separations are both negatively correlated with unemployment—the latter correlation comes primarily from a negative correlation of quits with unemployment. Layoffs are uncorrelated with unemployment, but strongly negatively correlated with employment growth, leading to a negative correlation between growth and total separations. Hires are positively correlated with growth, but quits are essentially uncorrelated with growth. Hires, quits, and vacancies are all strongly positively correlated with each other. Hires and quits exhibit considerable persistence, while layoffs exhibit little to no persistence. The latter is consistent with the notion that 14. For more on the BED, see Spletzer et al. (2004), as well as chapter 4 by Clayton and Spletzer in this volume.

Fig. 2.5

Hires and separations rates, December 2000–January 2005

Source: Public JOLTS nonfarm estimates, seasonally adjusted.

Fig. 2.6

Quarterly worker flow and job flow rates, JOLTS and BED data

Source: Quarterly worker flows are from the published JOLTS estimates and quarterly job flows are from the published BED statistics. All estimates are seasonally adjusted.

98

R. Jason Faberman

layoffs tend to be episodic events rather than persistent, dynamic processes. The dynamic correlations suggest that hires are a leading factor for lower future unemployment. The contemporaneous correlation between quits and unemployment is stronger than either the lagging or leading dynamic correlations. The same can be said of the contemporaneous correlation between layoffs and employment growth and their dynamic correlations. Because of the short sample period, one should interpret the time-series correlations with caution. Nevertheless, the patterns illustrated (particularly by quits and layoffs) shed some light on the cyclical behavior of worker flows. Hires and quits are clearly procyclical, though the latter are more related to unemployment than job growth. Layoffs, on the other hand, are countercyclical, but only with respect to job growth—they have little relation to the stock of unemployment. This evidence has implications for the recent debate on whether recessions are primarily periods of high job loss or reduced hiring. Hall (2005b) and Shimer (2007b) argue that the job-finding rate, and not necessarily the separations rate, drives cyclical movements in unemployment. The correlations in table 2.1 support that claim, but only to the extent that movements in the quits rate drives the relationship between separations and unemployment. This suggests that separations and the job-finding rate are not mutually exclusive, and that the relative importance of separations versus the job-finding rate may depend critically on the cyclical behavior of employer-to-employer transitions (described by Fallick and Fleishmann 2004; Nagypál 2005) since quits tend to dominate these flows. Table 2.2 illustrates that worker flow patterns vary widely by industry and, to a lesser extent, by region. The industry evidence is consistent with the findings of Anderson and Meyer (1994) and Burgess, Lane, and Stevens (2000). Turnover is highest in seasonal industries, such as construction, leisure, and hospitality, and low in other industries, such as manufacturing and government. Turnover is also slightly higher in the South and West than in the Northeast and Midwest. Industries and regions also vary widely in the share of their separations accounted for by quits. The majority of separations tend to be layoffs in goods-producing industries (resources, construction, manufacturing) and quits in other industries, such as services and retail trade. A large fraction of separations in the Northeast and Midwest, where shares of goods-producing industries are relatively high, are layoffs. The across-industry correlations suggest that both vacancies and growth are positively related to the share of separations made up by quits. Intuitively, expanding industries should have less layoffs, all else equal. The correlations also illustrate that high-turnover industries tend to have high rates of hires, quits, and layoffs.

Studying the Labor Market: the Job Openings and Labor Turnover Survey

99

2.4.2 Worker Flows and Establishment Growth Hires, quits, and layoffs are the result of continuous, dynamic interactions between workers and firms. In any period, a worker with a better job offer may choose to quit a successful, expanding firm at the same time a declining firm looks to hire new employees as it restructures its workforce. Anecdotal evidence of such occurrences is quite common. Yet, even with aggregate data on labor turnover, it is difficult to know what role, if any, such interactions play in the cyclical behavior of hires and separations. Another advantage of the JOLTS microdata is its ability to illustrate the relationship between establishment-level employment behavior and the aggregate behavior of worker flows. When, how, and to what extent establishments create or destroy jobs has been a topic of research for nearly two decades (e.g., Dunne, Roberts, and Samuelson 1989a, 1989b; Davis and Haltiwanger 1990, 1992). Evidence from this research shows that large fractions of establishments simultaneously create and destroy jobs each period. There is little evidence, though, on the relation between these establishment-level decisions and patterns of worker turnover. To explore this relation, I split the JOLTS microdata into three groups: establishments with expanding employment (i.e., more hires than separations); establishments with contracting employment (i.e., more separations than hires); and establishments with constant employment (i.e., either offsetting hires and separations or no turnover at all). I then calculate the aggregate monthly labor turnover estimates for each group, using factors calculated from the public JOLTS estimates to seasonally adjust the data. Figures 2.7 and 2.8 show the patterns of hires and separations, respectively, by type of employment change. The figures show analogous pictures. Expanding establishments have high hires rates, while contracting establishments have high separations rates. These rates are also considerably more volatile than the other labor turnover series, with standard deviations that are between 1.5 and 3.6 times greater than those for the other groups. Establishments with no employment change have the lowest hires and separations rates. Their rates are also the least volatile. This evidence suggests that the relation of establishment-level hires and separations to net growth is nonlinear and nonmonotonic—contracting establishments have more hires and expanding establishments have more separations than establishments with no employment change. Finally, even though figure 2.5 shows a long, persistent drop in hiring during the downturn and a mild pickup in separations during the recession, the series depicted in figures 2.7 and 2.8 show little to no cyclical variation—the only exception is a moderate movement of the separations rate among contracting establishments during the 2001 recession and during the 2003–2004 recovery period. How can the evidence in the two figures be reconciled? As Davis, Faberman, and

Fig. 2.7

Hiring rates by type of establishment-level employment change

Source: Author’s tabulations of JOLTS microdata. Estimates are seasonally adjusted using factors from the aggregate public estimates.

Fig. 2.8

Separation rates by type of establishment-level employment change

Source: Author’s tabulations of JOLTS microdata. Estimates are seasonally adjusted using factors from the aggregate public estimates.

Studying the Labor Market: the Job Openings and Labor Turnover Survey

Fig. 2.9

101

Quit rates by type of establishment-level employment change

Source: Author’s tabulations of JOLTS microdata. Estimates are seasonally adjusted using factors from the aggregate public estimates.

Haltiwanger (2006) illustrate, cyclical shifts in the distribution of establishment growth account for the differences between the figures.15 Figures 2.9 and 2.10 show two notable caveats for quits and layoffs. In figure 2.9, the quits rate mimics the procyclical behavior of its aggregate estimates among contracting establishments and, to a lesser extent, among expanding establishments. In figure 2.10, layoffs among contracting establishments exhibit a mild spike in late 2001, but are otherwise acyclical. Table 2.4 summarizes worker flow rates for different intervals of the growth rate distribution. Quit rates exceed layoff rates for all but the largest contractions, but remain relatively high for all contracting establishments. Only job losses at establishments with large contractions are dominated by layoffs. Finally, there is an asymmetry between the tails of the growth rate distribution: separations at expanding establishments are considerably higher than hires at rapidly contracting establishments. This may suggest that a shakeout process within the hiring patterns of expanding establishments exists, but further research is warranted. 2.4.3 Worker Flow Relations to the Local Labor Market Understanding how worker flows relate to local labor market conditions can be an important aspect of understanding their aggregate movements. 15. Davis, Faberman, and Haltiwanger (2006) also note that the patterns illustrated are robust to size, industry, and establishment controls.

102

R. Jason Faberman

Fig. 2.10

Layoff rates by type of establishment-level employment change

Source: Author’s tabulations of JOLTS microdata. Estimates are seasonally adjusted using factors from the aggregate public estimates.

Table 2.4 Net growth interval (Nijt) (–2, –0.3) [–0.3, –0.1) [–0.1, 0) 0 (0, 0.1) [0.1, 0.3) [0.3, 2)

Labor turnover rates by establishment growth rate interval Hiring rate (Hijt)

Separations rate (Sijt)

Quits rate (Qijt)

Layoffs rate (Lijt)

0.018 0.028 0.017 0.011 0.042 0.199 0.541

0.554 0.191 0.039 0.011 0.019 0.037 0.034

0.132 0.089 0.023 0.008 0.013 0.024 0.020

0.393 0.088 0.013 0.003 0.005 0.017 0.013

Source: Author’s tabulations from pooled JOLTS microdata. Notes: Estimates are based on data from December 2000 through January 2005. Estimates are weighted by employment.

One basic yet important question the JOLTS microdata can address is how do local worker flow rates relate to the local unemployment rate? Table 2.5 reports the basic relations of pooled establishment-month observations of hires (Hijt), quits (Qijt), and layoffs and discharges (Lijt) to state-level labor market statistics. These statistics include the state unemployment rate, its change from the previous month (Ujt), and the state employment growth rate (Njt) (obtained from the CES). The reported correlations appear very weak, yet nearly all are significant at the 5 percent

Studying the Labor Market: the Job Openings and Labor Turnover Survey

Table 2.5

103

Establishment labor turnover variation and local labor market conditions

Pooled correlation with Net growth rate (Njt) Unemployment (Ujt) Unemployment change (ΔUjt) Percent of variation explained by Establishment effects State × month effects

Hiring rate (Hijt)

Quits rate (Qijt)

Layoffs rate (Lijt )

0.026** –0.025** –0.012**

0.008** –0.036** –0.010**

–0.009** 0.001 0.009**

28.5 1.9

27.9 2.2

21.0 1.1

Source: Author’s tabulations from pooled JOLTS microdata (worker flows), supplemented by LAUS state data (unemployment), and CES state data (net growth). Notes: Estimates are based on data from December 2000 through January 2005. All estimates are weighted by employment. The variations explained are from the regression of each worker flow estimate on either 14,573 establishment effects or 1,887 state × month effects. **Significant at the 5 percent level.

level. This is a consequence of using pooled establishment observations, which tend to have large idiosyncratic components to their variation regardless of the variable examined. Therefore, the most relevant characteristics of these correlations are their sign and their magnitudes relative to each other.16 Establishment fixed effects only explain between 21 and 29 percent of the variations of these flows; state-month effects explain 1 to 2 percent. The evidence suggests a procyclical pattern for establishment hires and quits and a countercyclical pattern for layoffs—higher growth, lower unemployment, and decreases in unemployment at the state level are related to more hires and more quits. Layoffs are negatively related to growth and positively related to increases in unemployment, but consistent with the national evidence, they are essentially uncorrelated with the unemployment rate. I also estimate the establishment-level relations of hires, quits, and layoffs to the change in the state unemployment rate. I focus on the change rather than the level because it is more comparable to a flow measure.17 In the previous section, vacancies were a stock measure, so the level of unemployment was the appropriate metric. I regress each establishment-month observation on a fourth-order polynomial of Ujt, weighting the regressions by employment separately for each of the three labor turnover rates.18 16. Ideally, I would calculate state-level worker flow estimates and use them to estimate the reported correlations. Unfortunately, the JOLTS sample size and weighting structure do not allow for reliable estimates below the detail of its four geographic regions. 17. Note that the change in unemployment is the net effect of the flows into unemployment and flows out of unemployment. 18. As with the regressions of section 2.4, the fourth-order polynomial results are consistent with similar nonparametric fits of the data.

104

R. Jason Faberman

Fig. 2.11 Establishment hirings and their relation to changes in local unemployment Source: Author’s estimation of establishment vacancy rates on a fourth-order polynomial of the state unemployment rate using JOLTS establishment microdata and LAUS unemployment estimates. State and establishment fixed effects are used where noted. See text for details.

As before, I perform separate regressions for the unconditional relation, the relation with state effects removed, and the relation with establishment effects removed. Figures 2.11, 2.12, and 2.13 plot the results for hires, quits, and layoffs, respectively. Figure 2.11 shows that establishments hire less when the local unemployment rate is rising. The relation is nonlinear, with hires changing the most during large decreases in unemployment. Figure 2.12 shows that quits also decrease as unemployment rises. This relationship is also nonlinear, with quits changing the most during large increases in unemployment. In Figure 2.13, layoffs increase with increases in local unemployment. The relationship is close to linear. This establishment-level evidence parallels the patterns in the aggregate evidence. Controlling for state or establishment fixed effects does not alter these results. 2.5 Conclusions and Further Research Potential The JOLTS data provide a wealth of labor market information at both the aggregate and establishment level. The data are the most comprehensive data source for vacancies in the United States, and have the timeliest, most frequent, and most direct measures of worker turnover. While its time series is still relatively short, the JOLTS already presents rich new evidence

Fig. 2.12

Establishment quits and their relation to changes in local unemployment

Source: Author’s estimation of establishment vacancy rates on a fourth-order polynomial of the state unemployment rate using JOLTS establishment microdata and LAUS unemployment estimates. State and establishment fixed effects are used where noted. See text for details.

Fig. 2.13

Establishment layoffs and their relation to changes in local unemployment

Source: Author’s estimation of establishment vacancy rates on a fourth-order polynomial of the state unemployment rate using JOLTS establishment microdata and LAUS unemployment estimates. State and establishment fixed effects are used where noted. See text for details.

106

R. Jason Faberman

on the time-series and cross-sectional patterns of these statistics. Vacancies, hires, and quits all exhibit persistent, procyclical behavior between 2001 and 2005, while layoffs exhibit an episodic, countercyclical pattern. Vacancies also exhibit a cyclical relation to unemployment consistent with the Beveridge Curve. The micro-level estimates provide several new insights into the behavior of vacancies and worker flows. Establishment-level vacancy postings are negatively related to local unemployment rates, suggesting that the Beveridge Curve relation holds even at the micro level. This result holds even though many establishments (even the ones who change their employment) often do not post vacancies. Expanding establishments have high hiring rates while contracting establishments have high separation rates. Establishments with no change exhibit a steady pattern of turnover, but have the lowest worker flow rates. The evidence suggests nonlinear, nonmonotonic relations of hires and separations to establishment growth. Finally, the evidence suggests that hires are strongly related to changes in local unemployment rates, falling nonlinearly with increases in unemployment. Quits also fall with increases in the local unemployment rate, while layoffs rise with these increases. These findings barely scratch the surface of what the JOLTS data can say about the labor market. I highlight three areas where the aggregate estimates and microdata can aid labor market research. The first is how firms use vacancies to attract workers. Earlier works, such as Abraham (1987) and Blanchard and Diamond (1989, 1990), study vacancies and their relation to unemployment using estimates from the Help Wanted Index. The JOLTS vacancy data has a major advantage over this index (and others like it) in that it is reported directly by establishments. This provides a representative, tangible measure of job openings and allows micro-level studies of vacancy posting behavior similar to previous work by Holzer (1994) and current work by Davis, Faberman, and Haltiwanger (2007) and Faberman and Nagypál (2007). Evidence in this chapter already suggests that the micro patterns of firms who post vacancies may differ from existing theories of their behavior. The second area of potential research deals with separations and job loss. The JOLTS data can provide a better understanding of separations since it differentiates between quits and layoffs. This is important for macroeconomic analyses of employment adjustment, since quits are procyclical, while layoffs are countercyclical. The distinction between quits and layoffs and its importance for labor market movements is highlighted by the models of Akerlof, Rose, and Yellen (1988) and McLaughlin (1991), and the importance of this distinction is evident in the recent debate on whether recessions are hiring-driven, as argued by Hall (2005b) and Shimer (2007), or job-loss driven, which was the conventional wisdom. A final area of potential research deals with worker turnover more broadly. The aggregate national, regional, and industry estimates already

Studying the Labor Market: the Job Openings and Labor Turnover Survey

107

present many new findings. Future work with these and the micro-level estimates can build on the earlier work of Anderson and Meyer (1994), Burgess, Lane, and Stevens (2000), and others. The existence of vacancy, employment, and worker flow data reported by each establishment allows a micro-level study of their joint behavior that was previously impossible, but is essential for evaluating theories of labor market search and the matching of workers to firms. Overall, the JOLTS data provide many opportunities to increase our understanding of labor market dynamics.

References Abraham, K., and M. Wachter. 1987. Help wanted advertising, job vacancies, and unemployment. Brookings Papers on Economic Activity, Issue no. 1:207–43. Akerlof, G. A., A. K. Rose, and J. L. Yellen. 1988. Job switching and job satisfaction in the U.S. labor market. Brookings Papers on Economic Activity, Issue no. 2:495–594. Anderson, P., and B. R. Meyer. 1994. The extent and consequences of job turnover. Brookings Papers on Economic Activity, Microeconomics: 177–249. Blanchard, O. J., and P. Diamond. 1989. The Beveridge curve. Brookings Papers on Economic Activity, Issue no. 2:1–60. ———. 1990. The cyclical behavior of the gross flows of U.S. workers. Brookings Papers on Economic Activity, Issue no. 2:85–143. Burgess, S., J. I. Lane, and D. Stevens, 2000. Job flows, worker flows, and churning. Journal of Labor Economics 18 (3): 473–502. Clark, K. A. 2004. The job openings and labor turnover survey: What initial data show. Monthly Labor Review 127 (11): 14–23. Clark, K. A., and R. Hyson. 2001. New tools for labor market analysis: JOLTS. Monthly Labor Review 124 (12): 32–37. Crankshaw, M., and G. Stamas. 2000. Sample design in the job openings and labor turnover survey. 2000 Proceedings of the Annual Statistical Association. CDROM. Alexandria, VA: American Statistical Association. Davis, S. J., R. J. Faberman, and J. C. Haltiwanger. 2006. The flow approach to labor markets: New evidence and micro-macro links. Journal of Economic Perspectives 20 (3): 3–24. ———. 2007. The Establishment-level behavior of vacancies and hiring. Working Paper. Davis, S. J., and J. C. Haltiwanger. 1990. Gross job creation and destruction: Microeconomic evidence and macroeconomic implications. NBER Macroeconomics Annual 5:123–68. ———. 1992. Gross job creation, gross job destruction and employment reallocation. Quarterly Journal of Economics 107 (3): 819–63. ———. 1998. Measuring gross worker and job flows. In Labor Statistics Measurement Issues, ed. J. Haltiwanger, M. E. Manser, and R. Topel, 79–119. Chicago: The University of Chicago Press. ———. 1999. Gross job flows. In Handbook of labor economics, volume 3, ed. Orley Ashenfelter and David Card, 2711–2805. Amsterdam: Elsevier. Dunne, T., M. J. Roberts, and L. Samuelson. 1989a. Plant turnover and gross em-

108

R. Jason Faberman

ployment flows in the U.S. manufacturing sector. Journal of Labor Economics 7 (1): 48–71. ———. 1989b. The growth and failure of U.S. manufacturing plants. Quarterly Journal of Economics 104 (4): 671–98. Faberman, R. J. 2005. Analyzing the JOLTS hires and separations data. 2005 Proceedings of the Annual Statistical Association. CD-ROM. Alexandria, VA: American Statistical Association. Faberman, R. J., and É. Nagypál. 2007. The effect of quits on worker recruitment: Theory and evidence. Working Paper. Fallick, B., and C. A. Fleischman. 2004. Employer-to-employer flows in the U.S. labor market: The complete picture of gross worker flows. Board of Governors of the Federal Reserve System (U.S.), Finance and Economics Discussion Series, paper no. 2004-34. Hall, R. E. 2005a. Employment fluctuations with equilibrium wage stickiness. American Economics Review 95 (1): 50–65. ———. 2005b. Job loss, job finding, and unemployment in the U.S. economy over the past fifty years. In 2005 NBER Macroeconomics Annual, ed. Mark Gertler and Kenneth Rogoff, 101–37. Cambridge, MA: National Bureau of Economic Research. Holzer, H. J. 1994. Job vacancy rates in the firm: An empirical analysis. Economica 61 (1): 17–36. McLaughlin, K. J. 1991. A theory of quits and layoffs with efficient turnover. Journal of Political Economy 99 (1): 1–29. Mortensen, D. T., and C. A. Pissarides. 1994. Job creation and job destruction and the theory of unemployment. Review of Economic Studies 61 (3): 397–415. ———. 1999. New developments in models of search in the labor market. In Handbook of Labor Economics, Volume 3, ed. Orley Ashenfelter and David Card, 2567–628. Amsterdam: Elsevier. Nagypál, E. 2005. On the extent of job-to-job transitions. Northwestern University, Working Paper. Pissarides, C. 1985. Short-run equilibrium dynamics of unemployment, vacancies, and real wages. American Economic Review 75 (4): 676–90. Rogerson, R., R. Shimer, and R. Wright. 2005. Search-theoretic models of the labor market: A survey. Journal of Economic Literature 43 (4): 959–88. Shimer, R. 2007a. Reassessing the ins and outs of unemployment. NBER Working Paper no. 13421. Cambridge, MA: National Bureau of Economic Research, September. Shimer, R. 2007b. The cyclical behavior of equilibrium unemployment and vacancies. American Economic Review 95 (1): 25–49. Spletzer, J. R., R. J. Faberman, A. Sadeghi, D. M. Talan, and R. L. Clayton. 2004. Business employment dynamics: New data on gross job gains and losses. Monthly Labor Review 127 (4): 29–42. Valetta, R. 2005. Why has the U.S. Beveridge Curve shifted back? New evidence using regional data. Federal Reserve Bank of San Francisco Working Paper no. 2005-25. Wohlford, J., M. A. Phillips, R. Clayton, and G. Werking. 2003. Reconciling labor turnover and employment statistics. 2003 Proceedings of the Annual Statistical Association. CD-ROM. Alexandria, VA: American Statistical Association.

3 What Can We Learn About Firm Recruitment from the Job Openings and Labor Turnover Survey? Éva Nagypál

The Job Openings and Labor Turnover Survey (JOLTS) has quickly captured the attention of macroeconomists studying labor markets after the survey’s launch in December 2000. The enthusiasm of macro-labor economists about JOLTS is easy to understand: job openings (more commonly referred to as vacancies) play a crucial role in equilibrium models of unemployment that have been developed in the 1980s and 1990s. These models (following the pioneering work of Diamond [1981, 1982a, 1982b], Mortensen [1982a, 1982b], and Pissarides [1984, 1985]) have proved to be very fruitful in analyzing a wide range of aggregate labor-market issues: the existence of unemployment as an equilibrium phenomenon, the ongoing high rate of worker reallocation observed in labor markets, or the effect of policies that influence the operation of labor markets. Data on vacancies comparable to the series available in JOLTS had never been collected previously in the United States. Moreover, not only did JOLTS provide a much-needed superior measure of vacancies, it did so at a time when research on models emphasizing the role of vacancies has been very active. In addition, the JOLTS series had the unintended fortunate timing of beginning just as the long expansion of the 1990s was coming to an end. Capturing the state of the labor market just before the start of the 2001 recession thus allowed JOLTS to be informative about cyclical variation with a relatively short time span. In light of these facts, it is hard to overstate the enthusiasm of the macro-labor research community in response to the availability of the JOLTS. Faberman’s chapter in this volume (chapter 2) is an excellent overview of this new data source and should be on the reading list of anyone wishing to work with the JOLTS data. Éva Nagypál is an assistant professor of economics at Northwestern University.

109

110

Éva Nagypál

Despite the enthusiasm that the launch of JOLTS has created, this new data source has not yet been closely scrutinized to determine how it can be used to validate prevailing theories of recruitment. My chapter’s intent is to push the discourse on JOLTS in this direction. I start by reviewing some methodological and conceptual issues that arise when using JOLTS data. In particular, I first discuss the issue of labor turnover measurement and the problem of missing separations in the JOLTS data. I then discuss how the JOLTS definition of vacancies relates to the definition of vacancies used in theoretical models, and highlight how possible discrepancies between the definitions need to be taken into account when doing empirical work using JOLTS data. In the second part of the chapter, I use the publicly available JOLTS data to study empirically the widely used theoretical construct of the matching function. This allows me to demonstrate one of the many ways that the JOLTS data can serve to test existing theories of labor-market dynamics and provide new evidence to inform the development of these models in new directions. Throughout, the concepts and definitions that I use are equivalent to those used in Faberman’s chapter, though I limit myself to using the publicly available aggregate and industry data. 3.1 Consistency of JOLTS Turnover Data A distinct advantage of JOLTS is that it directly measures gross worker flows from the employer perspective (i.e., hires and separations) as opposed to simply measuring net employment change at establishments. Thus, the JOLTS gives a richer picture than available from other data sources about the margins that firms use to adjust their level of employment. There is, of course, a tight relationship between hires, separations, and net employment change at the level of an establishment, since, by definition, ejt ejt1 ejt hjt sjt where ejt is the level of employment at establishment j at the beginning of period t, and hjt and sjt are the number of hires and separations at establishment j during period t. Summing over all establishments in some set J (for example, the set of all nonfarm establishments, or the set of establishments in a particular industry) gives two alternative measures of employment growth over period t: (1)

e1Jt ∑ejt = ∑ejt1 ∑ejt eJt1 eJt j∈J

(2)

j∈J

j∈J

e ∑hjt ∑sjt hJt sJt 2 Jt

j∈J

j∈J

where eJt is the level of employment across all establishments in J at the beginning of period t, and hJt and sJt are the total number of hires and sepa-

Firm Recruitment and the Job Openings and Labor Turnover Survey

111

rations at all establishments in J during period t. Equation (1) gives a way to measure aggregate employment growth using employment data, the best measure of which, for the same universe of establishments as the one covered by JOLTS, is given by the Current Employment Survey (CES). This measure of employment growth can then be compared with the aggregate employment growth calculated using equation (2) based on labor turnover data in JOLTS, giving a way to assess the consistency of the JOLTS turnover data. To the extent that the JOLTS and the CES cover the same universe of establishments and the JOLTS weighting scheme is explicitly adjusted to match the CES level of employment, the correspondence between the two measures of employment growth should be very close. Beyond sampling error, there is only one reason that the correspondence between the two measures of employment growth cannot be expected to hold month by month—the difference in reference periods. The JOLTS turnover data refer to the period between the first day of the month and the last day of the month, while employment in the CES measures employment during the pay period that includes the twelfth of the month. Calculating employment growth over horizons longer than a month, however, should diminish both the effect of any sampling error and the effect of the difference in the reference period. Figure 3.1 plots aggregate employment growth from December 2000 onwards calculated from the CES data and from the JOLTS data using equations (1) and (2). According to the CES data, total employment declined by 59,000 workers in the United States between December 2000 and December 2004, which is in line with the poor employment performance of the U.S. economy during and following the 2001 recession. At the same time, according to the JOLTS data, the number of employed grew by 4.64 million during the same period, representing over 3.5 percent of total employment. This is a large discrepancy. To the extent that (a) the CES is a much larger survey that is designed explicitly to determine the level of employment in the United States and (b) the stock of employment is easier to measure than the flow into and out of employment, one can attribute all the discrepancy between the two measures of employment growth to measurement problems in the JOLTS turnover data. This discrepancy has been identified earlier by Wohlford et al. (2003). In fact, as a result of internal studies by BLS staff that uncovered the same discrepancy, there have been some changes in 2002 in the way the JOLTS data were collected, with the survey instrument redesigned for schools and temporary help agencies. These changes have reduced the size of the above discrepancy, but have not eliminated it. To show this, figure 3.2 plots aggregate employment growth for four year-long periods based on the CES and the JOLTS data. The overstatement of employment growth by JOLTS was the largest early in the survey, between December 2000 and December 2001 (2.29 million), but it

Fig. 3.1 Aggregate employment growth in the JOLTS and in the CES data since December 2000

Fig. 3.2 Aggregate employment growth in the JOLTS and in the CES data since the beginning of the year for each year between 2001 and 2004

Firm Recruitment and the Job Openings and Labor Turnover Survey

113

remained positive in all subsequent years; it was 0.57 million between December 2001 and December 2002, 1.00 million between December 2002 and December 2003, and 0.84 million between December 2003 and December 2004. Moreover, the aggregate annual employment growth discrepancy of 0.7 percent for 2003–2004 masks substantial industry variation in annual em2 1 ployment growth discrepancy (measured as 1/2 ΣDec2004 tJan2003(eit – eit) for industry i), which is plotted on the vertical axis of figure 3.3. As can be seen for 2003–2004, the annual overstatement of employment growth by JOLTS varies from a high of 2.58 percent in the Federal Government to a low of – 3.13 percent in construction. This large industry variation implies that the mismeasurement of labor turnover in the JOLTS is a larger problem than seems at first from the aggregate data. There is reason to believe that the discrepancy in the JOLTS arises in large part due to the mismeasurement of the separation rate. To show this, I calculated for each two-digit North American Industry Classification System (NAICS) industry the average JOLTS separation rate for the period January 2003–December 2004 and the average separation rate from the Current Population Survey (CPS) for the same period.1 On average, the separation rate calculated from the CPS is 1.9 times as large as the separation rate calculated from JOLTS. This is due both to the understatement of separations in JOLTS and to the overstatement of separations in the CPS due to the well-known classification problem (Nagypál 2006). There is large cross-industry variation in the ratio of the JOLTS to the CPS separation rate, however, ranging from the JOLTS separation rate being a third of the CPS separation rate in education to two-thirds in mining. Moreover, it is exactly the industries that have a very low measured JOLTS separation rate compared to the CPS separation rate that have the largest overstatement of their employment growth in the JOLTS hires and separation data. This can be seen from figure 3.3, where I plot the average annual employment growth discrepancy for the period January 2003–December 2004 between the JOLTS and the CES against the ratio of the JOLTS separation rate to the CPS separation rate for each industry. This evidence is suggestive that the understatement of the separation rate is a key reason that the JOLTS data overstate employment growth in the U.S. economy. Further examination of the JOLTS employment growth discrepancy across industries also reveals that a relevant characteristic of industries that is correlated with the size of this discrepancy is the average level of 1. The CPS started using the NAICS industry classification of the JOLTS after January 2003. The separation rate in the CPS can be derived by matching the Basic Monthly Survey across two consecutive months and calculating the ratio of the sum of employer-to-employer and employment-to-nonemployment transitions between the two months to the number of employed workers during the first month.

114

Éva Nagypál

Fig. 3.3 Annual employment growth discrepancy between the JOLTS and the CES plotted against the JOLTS separation rate as a fraction of CPS separation rate by industry

education in the industry. Figure 3.4 plots the average years of education in each two-digit NAICS industry, calculated using CPS data from 2003– 2004 against the average annual employment growth discrepancy for the period January 2003 to December 2004 between the JOLTS and the CES. Clearly, this figure implies that the overstatement of employment growth is a larger problem for more educated workers, a pattern that is worthy of further investigation and that could inform future revisions of JOLTS data collection. To assess the impact of the employment growth discrepancy between the JOLTS and the CES on the measurement of labor turnover, I use a simple procedure to adjust hires and separations for this discrepancy by industry according to h˜ it hit max (0, e1it e2it) ˜sit sit max (0, e2it e1it) where hit (sit) and h˜it (s˜it) is the measured and adjusted number of hires (separations) in industry i in month t, respectively. To do this adjustment, I estimate the employment growth for month t for industry i in the CES by extrapolating the employment numbers for the pay period containing the twelfth of the month. To the extent that this adjustment merely requires that employment growth numbers match up industry-by-industry at the two-digit level as opposed to establishment-by-establishment, this procedure underadjusts the hires and separations numbers, thus giving a lower

Firm Recruitment and the Job Openings and Labor Turnover Survey

115

Fig. 3.4 Annual employment growth discrepancy between the JOLTS and the CES plotted against the average level of education by industry

bound on the true hiring and separation rate.2 This procedure results in an adjusted aggregate hiring rate of 3.62 percent as opposed to the measured hiring rate of 3.31 percent, and in an adjusted aggregate separation rate of 3.62 percent as opposed to the measured separation rate of 3.23 percent, a significant change. 3.2 What do the JOLTS Job Openings Measure? Beyond giving a more detailed view of labor turnover, a distinct advantage of JOLTS is that it provides information on the number of job openings for a representative sample of U.S. establishments, thereby giving a much more direct measure of vacancy creation in the U.S. economy than was previously available (primarily through the use of the Help Wanted Advertising Index). Of course, to develop a measure of job openings, the BLS had to construct an appropriate empirical definition. Faberman reviews this definition in chapter 2. Here, I would like to discuss the impact of two choices in the construction of this empirical definition: the choice to measure the stock of vacancies at a point in time as opposed to their flow during a period, and the choice to include only vacancies for positions that can start within thirty days. 2. At the same time, given that the employment growth number is estimated using extrapolation and could contain errors, this procedure could possibly overadjust the hires and separations numbers.

116

Éva Nagypál

To focus the discussion, consider the following simple continuous-time model of vacancy creation, where time is measured in months. Assume that a firm wishes to hire someone to start working at some known future date ts. Due to search frictions in the relevant labor market, appropriate candidates are not always immediately available for hire; rather, they arrive to the firm at random times if the firm has a vacancy open. In particular, assume that if the firm has a vacancy open, suitable candidates arrive at Poisson rate , which (approximately) means that during a short period of length , the probability that a suitable candidate shows up is . Assume that hiring a candidate at time te ts has a cost of ce (ts – te) to the firm. Such a cost could arise due to having to incur some expenses to keep the candidate available between the time he or she is offered the position at time te and the time he or she starts working at time ts. Assume that hiring a candidate at time td ts has a cost of cd (td – ts) to the firm. Such a cost could arise due to forgone profits from starting the position late. Finally, assume that the firm chooses the time to open a vacancy to minimize the expected cost of hiring too early or too late compared to time ts. Under these assumptions, one can show3 that the firm will optimally open the vacancy at time tv ts – l, where l, the lead time to open a vacancy, is given by log (1 rd) l(, rd)

where rd cd /ce is the relative cost of delay. This simple model has the intuitive implication that the harder it is to find a suitable candidate (i.e., the lower is ) and the higher is the cost of delay relative to the cost of early hiring, the earlier will the firm decide to open a vacancy relative to the time of the intended start of the job. To see why this simple model is useful to think about the measurement of job openings in the JOLTS, assume that at each point in time firms wish to hire a fixed measure of workers. Then one can calculate the probability that a vacancy that is open at some point during the month [to – 1, to ] is observed at time to (without any restrictions on when the position starts) to be 1 Pu () . 1 The solid line in figure 3.5 plots this probability of observing a vacancy as a function of . The interpretation of this probability is simple: jobs with a higher arrival rate of suitable applicants have a vacancy open for a shorter period of time, hence these vacancies have a lower probability of being observed given a fixed frequency of observation. This is a well-known issue in duration models—whenever duration events are sampled using stock sampling (as in the JOLTS), events of short duration are less likely to be 3. Derivations of all the results shown are available upon request.

Firm Recruitment and the Job Openings and Labor Turnover Survey

117

Fig. 3.5 Probability of observing a vacancy as a function of the arrival rate of candidates and of the relative cost of delay in the simple model of vacancy creation Note: Pu is the unrestricted probability while Pr is the probability restricted to include only vacancies where the position is available within one month.

sampled. One can use statistical methods developed in duration analysis to address this issue (see Lancaster 1990) and reconstruct the flow of vacancies from the stock data. The dependence of the probability of observation on the rate of arrival of suitable applicants has at least two important implications. First, different probability of observing vacancies due to different arrival rate of suitable applicants could be one explanation for why the vacancy-to-hires ratio varies substantially across industries, from a ratio of 0.30 in construction to a ratio of 1.48 in health. It is possible that the number of new vacancies opened per new hire is the same in these industries, and what is different is how long the average vacancy in the industry is open due to the relative ease with which a vacancy in construction can be filled and the relative difficulty with which a vacancy in health can be filled. This interpretation of the data is supported by the strong positive correlation between the vacancy-to-hires ratio and the average education of workers across industries, shown in figure 3.6. Second, to the extent that there is systematic variation in over the business cycle, with recessions being times when openings are easier to fill and hence is higher, the probability of observ-

118

Éva Nagypál

ing a vacancy is procyclical in the above simple model, implying that the cyclical variation in the stock of vacancies overstates the variation in the flow of vacancies. The above simple model also helps us understand a second potential problem with the JOLTS measurement of vacancies. Recall that the JOLTS definition of a vacancy requires that work could start within one month of the day of measurement. This means that vacancies that are opened with long lead times (either because is low or because the relative cost of delay in hiring is high) are not counted in the JOLTS definition of job openings. In particular, one can show that the probability that a vacancy that is open at some point during the month [to – 1, to] is observed at time to given that only vacancies where the position is available within a month (i.e., where ts to 1) are counted is

1 e

if log (1 rd ) 1 1 rd Pr (, rd) 1

if log (1 rd ). 1 The two dashed lines in figure 3.5 plot this probability of observation as a function of for two different values of the relative cost of delay, a low value of rd 1 and a higher value of rd 3. Under the JOLTS definition, this simple model implies that jobs with a higher relative cost of delay and where suitable workers arrive less frequently are less likely to be observed and counted compared to the case where all vacancies are counted irrespective of the time the position is available. The reason for this is simple: for jobs with a higher relative cost of delay and where suitable workers arrive less frequently, it is optimal to open vacancies with a long lead time and, as a consequence, often workers are hired for these jobs long before they start working for the employer. Vacancies for such jobs (for example, those for academics) are systematically under measured using the JOLTS definition. This under measurement could go some way toward explaining why the education industry in figure 3.6 lies much below the regression line. The statistical tools to address this measurement issue are less readily available than the tools to use in case of stock sampling. In my opinion, the best way to address this measurement issue would be to acquire additional data on vacancies where work is expected to start further into the future than one month. 3.3 Using JOLTS to Study the Matching Function The JOLTS data on vacancies allows for the empirical examination of the theoretical construct of a matching function using more direct measure of vacancies and new hires than previously available. In equilibrium models

Firm Recruitment and the Job Openings and Labor Turnover Survey

119

Fig. 3.6 Vacancy-to-hires ratio plotted against the average level of education by industry

of unemployment, the matching function is a theoretical construct that is used to describe how workers and firms meet in a frictional labor market. In particular, it posits that the flow of new matches between workers and firms is a function of the number of workers looking for employment and the number of vacancies that are opened by firms. Assuming that only unemployed workers look for employment (a commonly maintained assumption), the matching function posits that mt m(vt1, ut1) where mt is the number of new matches created during period t, and vt–1 and ut–1 are the number of vacancies and unemployed workers looking to form employment relationships at the end of period t – 1. The number of new matches created can be measured using the hires data in JOLTS (i.e., mt ht), so the JOLTS data provides two of the three data series necessary to estimate an aggregate matching function. Assuming a log-linear functional form for the matching function and an additive error term gives the empirical specification (3)

log ht c v log vt1 u log ut1 εt.

In terms of empirical implementation, there are several issues that need to be addressed. First, should one use seasonally adjusted or unadjusted data? Second, should the matching function be estimated using aggregate or industry-level data? Both of these questions turn out to be empirically relevant.

120

Éva Nagypál

Table 3.1

Matching function estimation results using seasonally adjusted and unadjusted data

Dependent variable log vt–1 log ut–1 Seasonally adjusted Month dummies Number of observations R2

log ht

log ht

0.668 (0.180) 0.378 (0.198) Yes No

0.531 (0.158) 0.185 (0.177) No Yes

47 0.579

47 0.958

To show this, I first use the seasonally adjusted JOLTS data and the seasonally adjusted number of unemployed from the CPS and estimate equation (3) by ordinary least squares (OLS). Table 3.1 reports estimation results for this empirical specification using data from December 2000 to November 2004. Under this specification, the hypothesis that the matching function has constant returns to scale cannot be rejected, and the elasticity of the matching function with respect to vacancies is estimated to be 0.67. This estimate of the elasticity is substantially larger than the matching function elasticity of 0.3 to 0.5 derived by Petrongolo and Pissarides (2001), though given the small sample size, the standard errors on the estimates are rather large. The second column of table 3.1 reports estimation results when seasonally unadjusted data are used and cm is allowed to vary with the month m. Now, the coefficients both on the vacancy rate and the unemployment rate are lower, though the hypothesis that the matching function has constant returns to scale still cannot be rejected given the small sample size. Even with the small sample size, however, one can reject the hypothesis that the scale parameter cm is the same for all months m even at the 99 percent level of confidence. Figure 3.7 plots the estimate of e , which can be thought of as an estimate of matching efficiency, for each month m. There is a clear seasonal pattern in this matching efficiency, with the summer months representing a time when the same number of inputs into the matching function produce a significantly higher number of new hires. This strong seasonal pattern can also be clearly seen in the raw data for hires and vacancies plotted in figure 3.8, which shows that the number of hires is much more volatile over the year than the number of vacancies. There are two ways to interpret these findings. First, it is possible that there is seasonal variation in the process of matching. This could be due to the nature of employment relationships created over the seasonal cycle, with more temporary jobs filled by young workers being created over the summer, for example. Second, it is possible that there is consistent seasonal mismeasurement of vacm

Firm Recruitment and the Job Openings and Labor Turnover Survey

Fig. 3.7

121

Estimated matching efficiency for different months of the year

Fig. 3.8 Aggregate vacancies and hires between December 2000 and December 2004 (not seasonally adjusted)

cancies over the seasonal cycle, due to respondents’ interpretation of job openings referring to openings for permanent employment relationships. Next, I estimate industry matching functions using the empirical specification (4)

log hit i m vi log vit1 ui log uit1 εit

122

Éva Nagypál

where i are industry and m are month scale parameters4. I measure uit–1 by the number of unemployed workers whose last employment was in industry i.5 Even with the limited amount of data available, estimating this specification allows one to decisively reject the hypothesis that the elasticity of the matching function is the same across industries (i.e., v1 v2 . . . v18 and u1 u2 . . . u18), and the hypothesis that the matching efficiency is the same across industries (i.e., 1 2 . . . 18), thereby rejecting the hypothesis that the matching function is stable across industries.6 Again, there are two ways to interpret these findings. First, it is possible that there is variation in the process of matching across industries due to the different characteristics of jobs and workers in these industries. Second, it is possible that the measurement issues that I discussed above systematically affect the measurement of vacancies and hires across industries. In any event, the lack of similarity in the matching function across industries raises the question whether there exists an aggregate matching function at all, as assumed in theoretical studies. 3.4 Conclusion The Job Openings and Labor Turnover Survey (JOLTS) contains important new information that is useful to test existing theories of vacancy creation and to provide new insights into the process of matching in the labor market. In this volume, chapter 2 by Faberman is an excellent introduction to the data available in the JOLTS for anyone wishing to do research using these data. In this chapter, I have focused on several measurement issues that researchers using the JOLTS will have to confront and suggested ways that one might use the JOLTS data to further our understanding of labor market dynamics.

References Diamond, P. 1981. Mobility costs, frictional unemployment, and efficiency. Journal of Political Economy 89 (4): 798–812. ———. 1982a. Aggregate demand management in search equilibrium. Journal of Political Economy 90 (5): 881–94.

4. Since the CPS started using the NAICS industry classification of the JOLTS only after January 2003, this equation is estimated using data from January 2003 to November 2004. Given the small number of observations, it is not possible to separately estimate the month scale parameter for each industry. 5. This implicitly abstracts from industry mobility, which is a strong assumption, but without it, it is not clear what the second input of an industry matching function should be. 6. Estimation results are available upon request.

Firm Recruitment and the Job Openings and Labor Turnover Survey

123

———. 1982b. Wage determination and efficiency in search equilibrium. Review of Economic Studies 49 (2): 217–27. Lancaster, T. 1990. The econometric analysis of transition data: Econometric society monographs. Cambridge: Cambridge University Press. Mortensen, D. T. 1982a. The matching process as a noncooperative bargaining game. In The Economics of Information and Uncertainty, ed. J. J. McCall, 233–54. Chicago: University of Chicago Press. ———. 1982b. Property rights and efficiency in mating, racing, and related games. American Economic Review 72:968–79. Nagypál, E. 2008. Worker reallocation over the business cycle: The importance of employer-to-employer transitions. Northwestern University Unpublished Manuscript. Petrongolo, B., and C. Pissarides. 2001. Looking into the black box: A survey of the matching function. Journal of Economic Literature 39:390–431. Pissarides, C. A. 1984. Search intensity, job advertising, and efficiency. Journal of Labor Economics 2 (1): 128–43. ———. 1985. Short-run equilibrium dynamics of unemployment, vacancies and real wages. American Economic Review 75:968–79. Wohlford, J., M. A. Phillips, R. Clayton, and G. Werking. 2003. Reconciling labor turnover and employment statistics. Proceedings of the Annual Statistical Association. CD-ROM.

4 Business Employment Dynamics Richard L. Clayton and James R. Spletzer

4.1 Introduction One of the most watched economic indicators in the United States is the monthly change in nonfarm payroll employment released by the Bureau of Labor Statistics (BLS). This statistic measures the net change in the number of jobs from one month to the next. But when we think about how employment grows or declines, we realize that some establishments have opened, some establishments have expanded, some establishments have contracted, and some establishments have closed. In this chapter, we describe the new gross job gains and gross job loss statistics from the BLS Business Employment Dynamics program. These statistics not only measure the large gross job flows that underlie the substantially smaller net employment changes, but also enhance our understanding of producer dynamics across various stages of the business cycle. The development of the BLS Business Employment Dynamics data was motivated in large part by research in the academic community. The creation of longitudinal establishment datasets at the U.S. Census Bureau during the past several decades led to influential publications by Dunne, Roberts, and Samuelson (1988, 1989a, 1989b), Davis and Haltiwanger (1990, 1992), and Davis, Haltiwanger, and Schuh (1996). From this literature, we have learned that there is a large amount of establishment-level employment volatility not evident at the aggregate level, and the gross job flow Richard L. Clayton is the Division Chief of the Division of Administrative Statistics and Labor Turnover at the Bureau of Labor Statistics. James R. Spletzer is the Senior Research Economist in the Employment Research and Program Development Staff at the Bureau of Labor Statistics. We thank Ken Troske for his discussant comments at the April 2005 NBER-CRIW Producer Dynamics conference.

125

126

Richard L. Clayton and James R. Spletzer

statistics have fascinating business cycle properties. Yet despite all that we have learned about the labor market from this literature, the empirical analysis in these works was restricted to data from the manufacturing sector, and the call for more comprehensive and more timely data always resonates. The second generation of analysis using longitudinal microdata from the States’ Unemployment Insurance Systems illustrates how gross job flows in manufacturing are not representative of the entire U.S. economy (see Anderson and Meyer 1994; Foote 1998; Burgess, Lane, and Stevens 2000; and Spletzer 2000). The research resulting from the creation of these longitudinal establishment data sets has not only stimulated the review and updating of existing labor market theories, but has also stimulated the U.S. statistical agencies to develop their administrative data sets in such a way so as to produce longitudinal job flow statistics. This chapter begins with a definition of gross job gains and gross job losses, followed by a description of the source data used by the BLS to generate these statistics. Because the quality of longitudinal statistics computed from administrative cross-sectional microdata depends crucially on the longitudinal linkage algorithm, we pay particular attention in this chapter to describing our record linkage methodology. We then present highlights from the new BLS Business Employment Dynamics data series; these data show that in the first quarter of 2005, the number of gross job gains from opening and expanding establishments was 7.6 million, and the number of gross job losses from closing and contracting establishments was 7.3 million. The new BLS Business Employment Dynamics data also show that the 2001 recession was characterized by a temporary spike in gross job losses accompanied by a substantial and persistent decline in gross job gains. In this chapter we introduce a new seasonally adjusted time series of the distribution of quarterly gross job flows. This new time series is motivated by several interesting questions about gross job flows over the business cycle. For example, did the temporary spike in gross job losses during the 2001 recession occur at a few establishments with large declines, or at many establishments with small declines? And did the substantial fall in gross job gains during the 2001 recession occur at a few establishments cutting back significantly on hiring, or many establishments not adding a few new positions? Our new time series shows that the relatively few establishments with large gross job gains and large gross job losses were the drivers of the 2001 recession. 4.2 The Business Employment Dynamics Program at BLS 4.2.1 Concepts and Definitions The employment statistics that are published by the Bureau of Labor Statistics are invaluable for policymakers, researchers, and the business community. The BLS report on the monthly net change in employment

Business Employment Dynamics

127

affects stock market movements and interest rate decisions considerably. Yet this single macroeconomic statistic is the net result of the millions of decisions by millions of business establishments in the U.S. economy changing their employment levels. Each decision reflects the businessspecific economic conditions that face managers every day: supply, demand, labor availability, market share goals, investments in research and development, and so on. While the aggregate net employment change statistic identifies the overall growth or decline of the labor market, it does not summarize the underlying heterogeneity of the many establishments opening and expanding, or the many establishments contracting or closing. The definitions of gross job gains and gross job losses are easily derived from the definition of net employment growth. Notationally, let Ee,t denote the employment of establishment e in quarter t. Net employment growth in quarter t is defined as the change in aggregate employment from one quarter to the next: (1)

∑

Net Employment Growth (t)

∑

Ee,t

all establishments

Ee,t1.

all establishments

Noting that establishments can be classified based upon their employment dynamics from one quarter to the next, this equation for net employment growth can be manipulated as:

∑

(2) Net Employment Growth (t)

Ee,t

all establishments

∑

Ee,t1

all establishments

∑

(Ee,t Ee,t1)

∑

(Ee,t Ee,t1)

all establishments

establishments increasing employment

∑

(Ee,t Ee,t1)

∑

(Ee,t Ee,t1)

establishments decreasing employment

establishments with no change in employment

∑

(Ee,t 0)

opening establishments

∑

(Ee,t Ee,t1)

contracting establishments

∑

∑

(Ee,t Ee,t1)

expanding establishments

(0 Ee,t1).

closing establishments

128

Richard L. Clayton and James R. Spletzer

Note that the quarterly employment change for the set of establishments that do not change their level of employment from one quarter to the next is zero, and this term drops out of the final version of equation (2). In the Business Employment Dynamics data, there are 3.2 million establishments with positive employment that do not change their employment between the fourth quarter of 2004 and the first quarter of 2005. The definitions for gross job gains and gross job losses fall immediately out of the previous equation. Gross job gains are the sum of all employment increases at opening and expanding establishments: (3)

∑

Gross Job Gains (t)

(Ee,t 0)

opening establishments

∑

(Ee,t Ee,t1).

expanding establishments

Gross job losses are the sum of all employment losses at contracting and closing establishments: (4)

Gross Job Losses (t)

∑

(Ee,t Ee,t1)

contracting establishments

∑

(0 Ee,t1).

closing establishments

An expanding establishment is defined as a continuous unit that increases its employment from a positive level in the previous quarter to a higher level in the current quarter, and a contracting establishment is a continuous unit that decreases its employment from the previous quarter to a lower positive level in the current quarter. An opening establishment is one that has positive employment in the current quarter, and either had zero employment or was not in the database the previous quarter. A closing establishment is one that had positive employment in the previous quarter, and has either zero employment or is not in the database the current quarter. Because it is not possible to define business deaths on a contemporaneous basis, the definitions of establishment openings and closings used in the BLS Business Employment Dynamics program are conceptually different than the more familiar definitions of establishment births and deaths. In the State Unemployment Insurance (UI) systems, businesses are allowed to and often do report zero employment for several quarters after they have effectively closed. This undoubtedly occurs when a business owner temporarily shuts down but anticipates starting up the business again when economic conditions improve. By reporting zero employment and wages on the quarterly contributions form, the business owner can keep their UI account active. This results in many observed business closings, but which of these closings will start up again and which will die will not be observed for several more quarters. It is important to note that gross job gains and gross job loss statistics measure the sum of establishment-level net employment changes, and do not measure the flow of workers into and out of the establishment. For ex-

Business Employment Dynamics

129

ample, if an establishment increases employment from fifty workers to sixty workers, these ten additional jobs are classified as gross job gains. This addition of ten jobs during the quarter might have occurred with the addition of ten new hires, or by the net of twenty new hires and ten separations. Counts of hires and separations are published monthly by the Job Openings and Labor Turnover Survey (JOLTS) program at the BLS. Both Clark (2004) and Faberman (chapter 2, this volume) present a thorough description of the conceptual foundations and the empirical estimates from the JOLTS program. 4.2.2 Source Data The quarterly BLS Business Employment Dynamics data series is constructed from microdata originating from the Quarterly Census of Employment and Wages (QCEW), also known as the ES-202 program. A complete description of the underlying source data and the data flows can be found in the longer conference version of this chapter (Clayton and Spletzer 2005) and in the April 2004 Monthly Labor Review (Spletzer et al. 2004); the following is a bare-bones description of the source data. All employers subject to state Unemployment Insurance (UI) laws are required to submit quarterly contribution reports detailing their monthly employment and quarterly wages to the State Employment Security Agencies. The raw UI data require substantial edit and review. In addition, the BLS directs the states to conduct two supplemental surveys that are necessary to yield accurate data at the local level. The first is the Annual Refiling Survey (ARS), where nearly two million businesses each year are contacted to obtain or update business name, addresses, industry codes, and related contact information. The second is the Multiple Worksite Report (MWR), which collects employment and wages for each establishment in multiunit firms within the state. The MWR covers about 110,000 businesses (1.4 percent of all businesses, 16 percent of all establishments, and 39 percent of employment) each quarter, allowing the accurate distribution of employment and wages to the correct county and industry. Without these two additions to the UI data, the resulting QCEW economic information would not be accurate at the industry level or at the MSA, county, or city level. In addition, state QCEW staffs review and reconcile complex cases including mergers and acquisitions where correctly determining and linking predecessors and successors is critical to the accuracy of the QCEW and the Business Employment Dynamics data. After the microdata are augmented and thoroughly edited by the State Labor Market Information staff, the states submit these data and other business identification information to the Bureau of Labor Statistics as part of the federal-state cooperative QCEW program. The data gathered in the QCEW program are a comprehensive and accurate source of employment and wages, and provide a virtual census (98 percent) of employees on

130

Richard L. Clayton and James R. Spletzer

nonfarm payrolls. In the first quarter of 2005, the QCEW statistics show an employment level of 129.8 million, with 8.5 million establishments in the U.S. economy. The BLS publishes the Business Employment Dynamics data approximately seven-and-a-half months after the end of the quarter. 4.2.3 Longitudinal Linkages The quarterly gross job gains and gross job loss statistics created in the BLS Business Employment Dynamics program are tabulated by linking establishments across quarters; establishments are then classified as opening, expanding, contracting, closing, or not changing their employment level. The accuracy of the Business Employment Dynamics statistics depends on two primary factors: the quality of the establishment-level microdata being reported by businesses to the states, and the record linkage methodology used by the BLS to link establishments across quarters. Following establishments across time using administrative UI microdata is a complex and challenging exercise. Creating the Business Employment Dynamics data series requires a thorough understanding of how businesses operate and how they file their UI tax forms. The manner in which businesses report administrative changes and ownership changes can result in establishments changing UI identifiers even though no economic changes occurred. Failing to identify and link such noneconomic changes would result in an overstatement of establishment openings and closings, and thus an overstatement of gross job gains and gross job losses. The BLS has developed a multistep process to accurately link business establishment microdata over time. This linkage process consists of four steps: two distinct administrative matches, a probability-based weighted match, and an analyst intervention match. The linkage process is based on the unique establishment identifier maintained by the states. This identifier is composed of two pieces: the UI number and the reporting unit number. The UI number refers to the taxpaying entity within the state. The reporting unit number refers to establishments within the firm. Although the reporting unit number is not used in the administration of the UI system, it is assigned by the state using information collected from the Multiple Worksite Reports. The first step in the Business Employment Dynamics record linkage methodology is to link establishments that maintain the same establishment identifier across quarters. This step identifies almost all of the establishments linked as continuous across quarters. This is followed by a match using predecessor and successor information. Predecessors and successors refer to establishments that are continuous across quarters, yet the establishment identifier changes as a result of a change in ownership or a change in the reporting configuration of a multi-establishment company. The vast majority of predecessor and successor linkages are businesses buying another business (the assumption of liability for UI taxes must be reported to

Business Employment Dynamics

131

the state); other predecessor and successor linkages are identified by the State Labor Market Information Staff. The third step in the linkage process, conducted by the BLS, is a probability-based weighted match process. This probability-based weighted match uses information such as establishment name, street address, and telephone number to link—as continuous—a closing establishment in the previous quarter with an opening establishment in the current quarter. The theoretical foundation for the BLS record linkage methodology is based on the work of Ivan P. Felligi and Alan B. Sunter, and is more fully explained in Robertson et al. (1997). The final step in the matching process is an analyst review and possible manual linkage of selected large unmatched records. Although this analyst review and manual linkage is very resource intensive, it is crucial for the quality of the detailed industry and geography statistics. The BLS has undertaken many detailed reviews and analyses of the quality of its longitudinal linkage algorithm, and continues to conduct research to explore the sources and consequences of any additional valid establishment links. Furthermore, as part of the annual cooperative agreement between BLS and the states, the BLS is now requiring that the states examine and attempt to explain any unlinked records with employment above a certain threshold; this review of opening and closing records by state analysts before it is transmitted to the BLS will certainly increase the quality of the Business Employment Dynamics data. 4.3 The Business Employment Dynamics Data The basic products from the new BLS Business Employment Dynamics program are statistics measuring quarterly gross job gains and gross job losses. The gross job gains can be decomposed into the gains from both expansions and openings, and the gross job losses can be decomposed into the losses from both contractions and closings. The Business Employment Dynamics program also publishes the establishment counts underlying the employment gains and losses. All these statistics are available from the BLS website (http://www.bls.gov/bdm) as both levels and percents, and seasonally adjusted or unadjusted. The time series of historical statistics starts in the third quarter of 1992. The following summary of the data is a shortened version of what can be found in the longer conference version of this chapter (Clayton and Spletzer 2005) and in the April 2004 Monthly Labor Review (Spletzer et al. 2004). 4.3.1 Point-in-Time Results The seasonally adjusted gross job gains and gross job loss statistics for the first quarter of 2005 are presented in table 4.1 (data for the first quarter of 2005 were the most recent available data when we submitted this article for publication in January 2006). We see that the economy gained 325,000

132 Table 4.1

Richard L. Clayton and James R. Spletzer Gross job gains and job losses, March 2005a Net Change, Employment Gross Job Gains Total Expanding Establishments Opening Establishments Gross Job Losses Total Contracting Establishments Closing Establishments

a

325

7,635 6,171 1,464 7,310 5,852 1,458

Seasonally adjusted quarterly data, in thousands.

net new jobs (seasonally adjusted) between December 2004 and March 2005. This growth in employment is the net result of two components: the gross job gains of 7.635 million jobs and the gross job losses of 7.310 million jobs. The gross job gains and gross job loss statistics are substantially larger than the net employment change. Gross job gains come from both expanding and opening establishments. In table 4.1, we see that employment in expanding establishments grew by 6.171 million jobs and employment in opening establishments grew by 1.464 million jobs. These statistics indicate that expanding establishments account for 81 percent of quarterly gross job gains, whereas opening establishments account for 19 percent of quarterly gross job gains. With regard to gross job losses, employment in contracting establishments declined by 5.852 million jobs, and closing establishments accounted for the loss of 1.458 million jobs. Contracting establishments account for 80 percent of quarterly gross job losses, whereas closing establishments account for 20 percent of quarterly gross job losses. Expanding and contracting establishments account for most jobs gained and most jobs lost when measured on a quarterly frequency. An important component of the Business Employment Dynamics data series is the establishment counts underlying the gross job gains and gross job losses. These establishment counts for the first quarter of 2005, on a seasonally adjusted basis, are reported in table 4.2. There were 1.506 million expanding establishments and 1.504 million contracting establishments during the first quarter of 2005. There were 345,000 establishments opening during the quarter, and 347,000 establishments closing during the quarter. The difference between the number of opening and closing establishments (–2,000) is the net change in the number of active establishments during the quarter. By revealing the tremendous amount of churning underlying the net growth rates, the Business Employment Dynamics data enhance the labor market statistics currently available from the Bureau of Labor Statistics.

Business Employment Dynamics Table 4.2

Number of establishments, by direction of employment change, March 2005a Net Change, Establishments Establishments Gaining Jobs Total Expanding Establishments Opening Establishments Establishments Losing Jobs Total Contracting Establishments Closing Establishments

a

133

–2

1,851 1,506 345 1,851 1,504 347

Seasonally adjusted quarterly data, in thousands.

The traditional measure of net employment change produced by the BLS indicates that employment grew by 325,000 jobs during the first quarter of 2005 (seasonally adjusted). The gross job gains and gross job loss statistics indicate that this net employment loss is the result of 6.171 million jobs added at 1.506 million expanding establishments, 1.464 million jobs added at 345,000 opening establishments, 5.852 million jobs lost at 1.504 million contracting establishments, and 1.458 million jobs lost at 347,000 closing establishments. These gross job flows that underlie the net employment growth statistic demonstrate that there are a sizable number of jobs and establishments that appear and disappear in the short time frame of three months. These statistics are calculated without additional data collection efforts or additional respondent burden. 4.3.2 Time-Series Results—Business Cycle Analysis The business cycle, to a large degree, is defined by the growth of employment (or lack thereof). The new BLS Business Employment Dynamics data will enable researchers to analyze the extent to which economic recessions and expansions are characterized by changes in business expansions and openings, by changes in business contractions and closings, or by a combination of the two. The seasonally adjusted time series of quarterly net employment growth is shown in figure 4.1. The recent recession, which was dated by the National Bureau of Economic Research (NBER) as occurring between March 2001 to November 2001, is clearly evident in this chart. Prior to the recession, between the third quarter of 1992 and the fourth quarter of 2000, net employment growth had been positive every quarter, averaging 637,000 net new jobs per quarter. But during the recession, as seen in figure 4.1, net employment growth was negative for all quarters of 2001, with a low of 1.380 million net jobs lost in the third quarter of 2001. The seasonally adjusted gross job gains and gross job loss statistics are

134

Fig. 4.1

Richard L. Clayton and James R. Spletzer

Quarterly net employment growth (seasonally adjusted, in thousands)

plotted in figure 4.2. The difference between the gross job gains and the gross job losses in figure 4.2 is the familiar net employment change depicted in figure 4.1. The most recent business cycle is evident in figure 4.2. Between 1992 and 1999, both the gross job gains and the gross job loss series were climbing at relatively constant rates. The gross job gains started to decline in early 2000, and then dropped substantially in 2001. After a peak of 9.144 million gross job gains in the fourth quarter of 1999, the gross job gains fell to 7.749 million jobs in the third quarter of 2001. The gross job losses continued to increase through 2001, rising from 8.354 million gross jobs lost in the fourth quarter of 2000 to a high of 9.129 million gross jobs lost in the third quarter of 2001. Thus, the declining net employment growth during the first three quarters of 2001 can be attributed to both falling gross job gains and rising gross job losses. As the official NBER-dated recession ended in late 2001, the gross job losses significantly declined and by early 2002 had returned to a level comparable to its prerecessionary level in early 2000. The same cannot be said for the gross job gains. Following the recession, the gross job gains statistic has remained in the range of 7.4 to 8.1 million jobs gained each quarter, which is substantially lower than its prerecessionary levels (the gross job gains in calendar year 2000 averaged 8.8 million jobs per quarter). The gross job gains started to increase in late 2003. There has been positive net employment growth since the third quarter of 2003, as this recent increase in gross job gains has been accompanied by a gross job loss series that steadily declined through 2003 and remained relatively constant through 2004.

Business Employment Dynamics

Fig. 4.2

135

Quarterly gross job gains and losses (seasonally adjusted, in thousands)

The seasonally adjusted time series of gross job gains at expanding and opening establishments—and the gross job losses at contracting and closing establishments—are presented in figure 4.3. Immediately obvious is the prior-stated observation that, for any given quarter, expanding and contracting establishments account for roughly 80 percent of gross jobs gained and gross jobs lost, respectively, when measured on a quarterly frequency. Also obvious in figure 4.3 is that the business cycle is most evident in the expansionary and contractionary establishments. The difference between the gross job gains due to expansions and the gross job losses due to contractions mirrors the overall difference between the gross job gains and the gross job losses. The difference between the gross job gains due to openings and the gross job losses due to closings does exhibit some business cycle properties, but this difference is quite small relative to the difference between expansions and contractions. 4.3.3 Additional Research Results In addition to the basic results just described, the BLS has also released several other data products from the Business Employment Dynamics program. Statistics for major industry sectors were released in May 2004, statistics by firm size class were released in December 2005 (Butani et al. [2006], discuss and empirically analyze the interesting methodological issues underlying longitudinal size class statistics), and statistics by state were released in August 2007. There have also been several recent research papers using the longitudinal establishment microdata from the Business Employment Dynamics program—Pinkston and Spletzer (2004) present

136

Fig. 4.3

Richard L. Clayton and James R. Spletzer

Quarterly gross job gains and losses (seasonally adjusted, in thousands)

annual tabulations of gross job gains and gross job losses, Knaup (2005) and Knaup and Piazza (2007) present survival statistics of business births, Sadeghi (2008) computes establishment birth and death statistics, Butani, Werking, and Kapani (2005) analyze how net employment growth differs in single-establishment employers versus multi-establishment firms, Clayton and Mousa (2004) describe linking the Business Employment Dynamics data with state wage records, Hyson and Spletzer (2002) analyze the employment and wage dynamics associated with mass layoffs, Brown and Spletzer (2005) analyze the employment and wage dynamics of businesses involved in offshoring, and Faberman (2004) creates quarterly gross job gains and gross job loss statistics for the 1990 to 1991 recession. 4.3.4 Comparison to Other Data We have been asked many times how the Business Employment Dynamics data compares to gross job flow statistics from other datasets. This is a difficult question to answer precisely due to differences in time periods, differences in industry sectors, differences in reporting frequency, and differences in definitions. We are aware of two research papers that have attempted to compare the Business Employment Dynamics data to the manufacturing statistics in the heavily cited work of Davis, Haltiwanger, and Schuh (1996). Pinkston and Spletzer (2004) compute annual gross job gains and losses statistics for the manufacturing sector, and conclude that the Business Employment Dynamics statistics are broadly similar to those of Davis, Haltiwanger, and Schuh. Faberman (2004) plots the quarterly

Business Employment Dynamics

137

Business Employment Dynamics manufacturing statistics on the same chart as the 1972 to 1993 quarterly statistics from Davis, Haltiwanger, and Schuh, and concludes that the data are relatively comparable. There is also interest in how the Business Employment Dynamics data compare to the data from the Job Openings and Labor Turnover Survey (JOLTS). The JOLTS data are from a sample of approximately 16,000 U.S. business establishments collected by the BLS. The JOLTS program publishes monthly data on hires, separations (quits, layoffs and discharges, and other separations), and job openings. These data are meant to serve as demand-side indicators of labor shortages at the national level. Further information about the JOLTS and some research using the JOLTS can be found in Clark and Hyson (2001), Clark (2004), Faberman (chapter 2, this volume), and Nagypál (chapter 3, this volume). Several previous authors have compared the JOLTS hires and separations data to the gross job gains and gross job losses data from the Business Employment Dynamics. Davis, Faberman, and Haltiwanger (2006) characterize the relationship of hires, separations, quits, and layoffs to the employer-level gross job gains and gross job loss statistics. In table 4.1 of their article, Davis, Faberman, and Haltiwanger report average job and worker flow rates for the U.S. economy measured at various frequencies using the JOLTS and the Business Employment Dynamics data. Boon et al. (2008) compare the concepts and the data from the JOLTS, the Business Employment Dynamics, and the CPS gross flows. In charts 7 and 8 of their article, Boon et al. compare the time series movements of the JOLTS and the Business Employment Dynamics data. 4.4 The Distribution of Gross Job Gains and Gross Job Losses 4.4.1 Concepts and Definitions The Business Employment Dynamics data have given us several interesting facts about producer dynamics during and immediately following the 2001 recession. As seen in figure 4.2 of this chapter, the recent business cycle is characterized by a large temporary spike in gross job losses accompanied by a substantial and persistent decline in gross job gains. In this section of the chapter, we present seasonally adjusted time series of the distribution of gross job gains and gross job losses underlying the BLS Business Employment Dynamics statistics. Distribution statistics will allow us to analyze (a) whether the temporary spike in gross job losses occurred at a few establishments with large declines, or at many establishments with small declines, and (b) whether the decline in gross job gains occurred at a few establishments cutting back significantly on hiring or at many establishments not adding a few new positions. Recall from equation (2) earlier in this chapter that the net employment

138

Richard L. Clayton and James R. Spletzer

growth in any given quarter can be written as the sum of gross job gains from establishments increasing employment and the sum of gross job losses from establishments decreasing employment: (2) Net Employment Growth (t)

∑

(Ee,t Ee,t1)

establishments increasing employment

∑

(Ee,t Ee,t1).

establishments decreasing employment

This equation can be rewritten as:

(5)

Net Employment Growth (t) ∑

∑

(Ee,t Ee,t1)

x1 establishments increasing employment by x jobs

∑

∑

(Ee,t Ee,t1).

x1 establishments decreasing employment by x jobs

In equation (5) we have decomposed both gross job gains and gross job losses into an empirical distribution defined by the number of jobs gained or lost. For practical purposes, it is infeasible to calculate and report statistics for every possible level of net employment change x in equation (5). We have calculated gross job gains and gross job losses for establishments gaining or losing {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11–14, 15–19, 20–24, 25–29, 30– 39, 40–49, 50–74, 75–99, 100} jobs. However, for the graphical analysis we wish to present, 19 series is too many, and we have aggregated further. We have chosen to present statistics for the following intervals of gross job gains and gross job losses: {1–3, 4–19, 20}. In the fourth quarter of 2004, 16 percent of employment is in establishments that do not change their employment level, 33 percent of employment is in establishments that change their employment level by 1 to 3 jobs, 30 percent of employment is in establishments that change their employment level by 4 to 19 jobs, and 21 percent of employment is in establishments that change their employment level by 20 or more jobs. We have looked extensively at other possible aggregations and have determined that the main conclusions we present in this section are not sensitive to the particular aggregation we have chosen. To be precise, we have decomposed gross job gains in quarter t as: (6)

∑

∑

(Ee,t Ee,t1)

establishments increasing employment

(Ee,t Ee,t1)

establishments increasing employment by 1–3 jobs

∑

∑

(Ee,t Ee,t1)

establishments increasing employment by 4–19 jobs

(Ee,t Ee,t1).

establishments increasing employment by 20+ jobs

Business Employment Dynamics

139

Similarly, we have decomposed gross job losses in quarter t as: (7)

∑

(Ee,t Ee,t1)

establishments decreasing employment

∑

(Ee,t Ee,t1)

establishments decreasing employment by 1–3 jobs

∑

∑

(Ee,t Ee,t1)

establishments decreasing employment by 4–19 jobs

(Ee,t Ee,t1).

establishments decreasing employment by 20+ jobs

The issue of whether to present our distribution statistics in levels or in rates deserves mention. Much of the existing literature has used rates; for example, figure 2.2 of Davis, Haltiwanger, and Schuh (1996) reports the distribution of job creation rates and job destruction rates for intervals spanning 5 percentage points. We have chosen to use levels because we are concerned about the interpretation of rates for small establishments. Based upon analysis of the QCEW microdata, most establishments in the United States are small: 61 percent of establishments have less than five employees, and 88 percent of establishments have less than twenty employees. The comparable statistics for the employment distribution are as follows: 7 percent of employment is in establishments with less than five employees, and 26 percent of employment is in establishments with less than twenty employees. When calculating percentages using average employment in the denominator, as is standard, a small establishment with less than five employees that grows or declines by one job has a percentage change (in absolute value) of between 22 and 200 percent, whereas a large establishment with more than 500 employees that grows or declines by one job has a percentage change (in absolute value) of less than 0.2 percent. Because we are interested in decomposing the time series variation of net employment growth based upon the distribution of establishment-level changes, the use of levels as expressed in equation (5) strikes us as most appropriate for our first pass through the microdata. Research that calculates rates rather than levels, and that conditions on the size of the establishment to make rates comparable across establishments, is in progress. 4.4.2 Empirical Results In the top panel of figure 4.4, we present the establishment counts for establishments gaining or losing 1 to 3 jobs, 4 to 19 jobs, and 20 or more jobs. The bottom panel of figure 4.4 reports the net number of establishments gaining 1 to 3 jobs, 4 to 19 jobs, and 20 or more jobs, where the net is calculated as the number of establishments gaining minus the number of establishments losing a given amount of jobs. In the fourth quarter of 2004, there were 1.456 million establishments (seasonally adjusted) that gained 1 to 3 jobs, and 1.400 million establishments that lost 1 to 3 jobs. This indicates that 56 thousand more establishments were gaining 1 to 3 jobs than

140

Richard L. Clayton and James R. Spletzer

Fig. 4.4 Quarterly gross job gains and losses, establishment counts (seasonally adjusted)

were losing 1 to 3 jobs; this 56 thousand figure is plotted in the bottom panel of figure 4.4. There were 381 thousand establishments gaining 4 to 19 jobs, and 344 thousand establishments losing 4 to 19 jobs. There were 56 thousand establishments gaining 20 or more jobs, and 50 thousand establishments losing 20 or more jobs. The establishment counts in figure 4.4 clearly show business cycle properties. Looking at the bottom panel of figure 4.4, the net number of establishments gaining 1 to 3 jobs falls from 87 thousand in the fourth quarter of

Business Employment Dynamics

141

1999 to negative 69 thousand in the third quarter of 2001. The net number of establishments gaining 20 or more jobs also falls from 8 thousand in the fourth quarter of 1999 to negative 11 thousand in the third quarter of 2001. The statistics in figure 4.5 show the employment gains and losses associated with the establishments gaining or losing 1 to 3 jobs, 4 to 19 jobs, and 20 or more jobs. The ordering of the series in figure 4.5 is opposite than in figure 4.4. In the top panel of figure 4.5, we see that the 1.5 million estab-

Fig. 4.5

Quarterly gross job gains and losses (seasonally adjusted)

142

Richard L. Clayton and James R. Spletzer

lishments gaining 1 to 3 jobs contributed 2.2 million jobs to the gross job gains in the fourth quarter of 2004. The 381 thousand establishments growing by 4 to 19 jobs contributed 2.8 million jobs to the count of gross job gains in the fourth quarter of 2004 (the average growth of these job-gaining establishments is 7.3 jobs), and the 56 thousand establishments growing by 20 or more jobs added 3.1 million new jobs to the economy (an average growth of 56 jobs per establishment). The key graph is in the bottom panel of figure 4.5. Between the third quarter of 1992 and the fourth quarter of 1999, establishments gaining or losing 1 to 3 jobs created an average of 99 thousand net new jobs per quarter. During this same time period, establishments gaining or losing 4 to 19 jobs created an average of 228 thousand net new jobs per quarter, and establishments gaining or losing 20 or more jobs created an average of 331 thousand jobs per quarter. These three statistics sum to the average net employment growth of 657 thousand per quarter during the 1990s (the three series in the bottom panel of figure 4.5 sum to the series graphed in figure 4.1). The 2001 recession is clearly evident in both the top and bottom panels of figure 4.5. Establishments that were gaining or losing 1 to 3 jobs lost a net 110 thousand jobs during the third quarter of 2001, establishments that were gaining or losing 4 to 19 jobs lost a net of 325 thousand jobs in that quarter, and establishments that were gaining or losing 20 or more jobs lost a net of 758 thousand jobs in the third quarter of 2001. These statistics indicate that 64 percent of the net job losses in the most severe recessionary quarter are attributable to the relatively few establishments gaining or losing 20 or more jobs. To return to the motivating question, this new seasonally adjusted time series of quarterly distribution statistics illustrates where the temporary spike in gross job losses occurred in the 2001 recession. The spike in gross job losses did not occur because many establishments had small declines in employment, but rather from a relatively few number of establishments with large declines. Similarly, the substantial and persistent fall in gross job gains during and following the 2001 recession did not occur because many establishments did not add a few positions, but rather this fall can be attributed to a relatively few number of establishments cutting back significantly on their hiring. The analysis we have presented in this section is quite simple. There are many empirical extensions that could be done. As mentioned above in the discussion of levels versus rates, it would be interesting to know whether the establishments that are adding or losing twenty or more jobs are relatively small establishments with a large percentage change in employment, or whether they are large establishments with a relatively small percentage change in employment. Furthermore, the statistics we have presented are quarterly; annual distribution statistics would enable us to analyze whether the large (twenty or more) establishment-level gains or losses in a quarter are onetime changes within a year, or whether they are one incremental

Business Employment Dynamics

143

step towards even larger gains or losses within the year. We hope that our presentation and simple analysis of distribution statistics that we have provided in this section will spur on additional empirical and theoretical work about producer dynamics and the causes and consequences of employment growth over the business cycle. 4.4.3 Sectoral Detail The editors of this conference volume have asked us present some sectoral detail. The statistics in figure 4.6 show the distribution of employ-

Fig. 4.6

Quarterly gross job gains and losses, manufacturing (seasonally adjusted)

144

Fig. 4.7

Richard L. Clayton and James R. Spletzer

Quarterly gross job gains and losses, services (seasonally adjusted)

ment gains and losses for the manufacturing sector, and the statistics in figure 4.7 show the distribution of employment gains and losses for the services sector. The basic results for these two sectors mimic the analysis we presented for the national statistics. During the 1990s, establishments with large gains or losses in employment are the biggest contributors to the gross job gains and gross job losses. During the 2001 recession, the em-

Business Employment Dynamics

145

ployment losses are most evident for the establishments with the largest gains and losses. 4.5 Conclusion Our goals in this chapter were threefold: to describe the BLS Business Employment Dynamics program, to summarize the data from this program and how it has informed us about the U.S. labor market, and to present a new seasonally adjusted time series of the distribution of quarterly gross job gains and gross job losses. The first two objectives are described in the text, and are not summarized here. This chapter released for the first time a seasonally adjusted time series of the distribution of quarterly gross job gains and gross job losses for the entire U.S. economy. This new data series is motivated by the earlier work of Davis and Haltiwanger (1990, 1992), Davis, Haltiwanger, and Schuh (1996), and Spletzer (2000). We have learned from these earlier studies that gross job gains and gross job losses are concentrated at establishments with large percentage changes in employment. We mimic this finding with the Business Employment Dynamics data—in the fourth quarter of 2004, we find that 39 percent of all gross job gains are contributed by just 3 percent of establishments who gain twenty or more jobs, and 38 percent of all gross job losses are contributed by just 3 percent of establishments who lose twenty or more jobs. Our seasonally adjusted time series shows that these relatively few establishments with large gross job gains and large gross job losses are the drivers of the 2001 business cycle. The Business Employment Dynamics data is now routinely cited in the economic, statistical, and policy communities, as well as in the popular press. This high level of attention by the user community reinforces our belief that the relatively new BLS Business Employment Dynamics data is a major contributor to our understanding of producer dynamics in the U.S. economy. We do not find this surprising: the data are timely, high quality, high frequency, and historically consistent. And in conclusion, we note that the BLS was able to create the Business Employment Dynamics data with no new data collection efforts and with no new additional respondent burden.

References Anderson, P. M., and B. D. Meyer. 1994. The extent and consequences of job turnover. Brookings Papers on Economic Activity, Vol. 1994:177–236. Boon, Z., C. M. Carson, R. J. Faberman, and R. E. Ilg. 2008. Studying the Labor Market with BLS Labor Dynamics Data. Monthly Labor Review 131 (2): 3–16.

146

Richard L. Clayton and James R. Spletzer

Brown, S., and J. Spletzer. “Labor Market Dynamics Associated with the Movement of Work Overseas.” Paper presented at the Organization for Economic Cooperation and Development (OECD) conference on the Globalisation of Production. November 2005. Burgess, S., J. Lane, and D. Stevens. 2000. Job flows, worker flows, and churning. Journal of Labor Economics 18 (3): 473–502. Butani, S. J., R. L. Clayton, V. Kapani, J. R. Spletzer, D. M. Talan, and G. S. Werking Jr. 2006. Business employment dynamics: Tabulations by employer size. Monthly Labor Review 129 (2): 3–22. Butani, S., G. Werking, and V. Kapani. 2005. Employment dynamics of individual companies versus multicorporations. Monthly Labor Review 128 (12): 3–15. Clark, K. A. 2004. The job openings and labor turnover survey: What initial data show. Monthly Labor Review 127 (11): 14–23. Clark, K. A., and R. Hyson. 2001. New tools for labor market analysis: JOLTS. Monthly Labor Review 124 (12): 32–37. Clayton, R. L., and J. A. Mousa. 2004. Measuring labor dynamics: The next generation in labor market information. Monthly Labor Review 127 (5): 3–8. Clayton, R. L., and J. R. Spletzer. Business employment dynamics. Paper presented at the April 2005 NBER-CRIW conference on Producer Dynamics. Online at http://www.nber.org/confer/2005/CRIWs05/clayton.pdf Davis, S. J., R. J. Faberman, and J. Haltiwanger. 2006. The flow approach to labor markets: New data sources and micro-macro links. Journal of Economic Perspectives 20 (3): 3–26. Davis, S. J., and J. C. Haltiwanger. 1990. Gross job creation and destruction: Microeconomic evidence and macroeconomic implications. NBER Macroeconomics Annual 5:123–68. ———. 1992. Gross job creation, gross job destruction, and employment reallocation. Quarterly Journal of Economics 57 (3): 819–63. Davis, S. J., J. C. Haltiwanger, and S. Schuh. 1996. Job creation and destruction. Cambridge, MA: The MIT Press. Dunne, T., M. J. Roberts, and L. Samuelson. 1988. Patterns of firm entry and exit in U.S. manufacturing industries. RAND Journal of Economics 19 (4): 495–515. ———. 1989a. Plant turnover and gross employment flows in the U.S. manufacturing sector. Journal of Labor Economics 7 (1): 48–71. ———. 1989b. The growth and failure of U.S. manufacturing plants. Quarterly Journal of Economics 54 (4): 671–98. Faberman, R. J. 2004. Gross job flows over the past two business cycles: Not all “recoveries” are created equal. Bureau of Labor Statistics Working Paper no. 372. Available at http://www.bls.gov/ore/pdf/ec040020.pdf Foote, C. L. 1998. Trend employment growth and the bunching of job creation and destruction. Quarterly Journal of Economics 63 (3): 809–34. Hyson, R. T., and J. R. Spletzer. 2002. Large-scale layoffs, employment dynamics, and firm survival. Paper presented at the Society of Labor Economists annual conference. May 2002. Knaup, A. E. 2005. Survival and longevity in the Business Employment Dynamics data. Monthly Labor Review 128 (5): 50–56. Knaup, A. E., and M. C. Piazza. 2007. Business employment dynamics data: Survival and longevity, II. Monthly Labor Review 130 (9): 3–10. Pinkston, J. C., and J. R. Spletzer. 2004. Annual measures of gross job gains and gross job losses. Monthly Labor Review 127 (11): 3–13. Robertson, K., L. Huff, G. Mikkelson, T. Pivetz, and A. Winkler. 1999. Improvements in record linkage processes for the Bureau of Labor Statistics’ Business Establishment list. In Record Linkage Techniques—1997: Proceedings of an In-

Business Employment Dynamics

147

ternational Workshop and Exposition, 212–221. Washington, D.C.: National Academy Press. Sadeghi, A. 2008. Measuring births and deaths in business employment dynamics data series. Monthly Labor Review, forthcoming. Spletzer, J. R. 2000. The contribution of establishment births and deaths to employment growth. Journal of Business and Economic Statistics 18 (1): 113–26. Spletzer, J. R., R. J. Faberman, A. Sadeghi, D. M. Talan, and R. L. Clayton. 2004. Business employment dynamics: New data on gross job gains and losses. Monthly Labor Review 127 (4): 29–42.

5 The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators John M. Abowd, Bryce E. Stephens, Lars Vilhuber, Fredrik Andersson, Kevin L. McKinney, Marc Roemer, and Simon Woodcock

5.1 Introduction Since 2003, the U.S. Census Bureau has published a new and innovative statistical series: the Quarterly Workforce Indicators (QWI). Compiled from administrative records data collected by a large number of states for both jobs and firms, and enhanced with information integrated from other John M. Abowd is the Edmund Ezra Day Professor of Industrial and Labor Relations at Cornell University, and a research associate of the National Bureau of Economic Research. Bryce E. Stephens is a senior consultant with the economics consulting firm Bates White. Lars Vilhuber is a senior research associate at the Cornell Institute for Social and Economic Research, and a senior research associate in the Longitudinal Employer-Household Dynamics program at the U.S. Census Bureau. Fredrik Andersson is a senior research associate of the Cornell Institute for Social and Economic Research (CISER), and a research fellow of the Longitudinal Employer-Households Dynamics Program (LEHD) of the U.S. Bureau of the Census. Kevin L. McKinney is an economist in the Longitudinal Employer-Household Dynamics program at the U.S. Census Bureau, and an administrator of the California Census Research Data Center. Marc Roemer is a mathematical statistician at the U.S. Census Bureau and independent researcher. Simon Woodcock is an assistant professor of economics at Simon Fraser University, and a consultant for the Cornell Institute for Social and Economic Research (CISER). The authors acknowledge the substantial contributions of the staff and senior research fellows of the U.S. Census Bureau’s Longitudinal Employer-Household Dynamics (LEHD) Program. We thank participants of the 2005 CRIW “Conference on Producer Dynamics: New Evidence from Micro Data” and an anonymous discussant for their comments, and Mark Roberts, Tim Dunne, and Brad Jensen for their valuable input during the editorial process. This document is based in part on a presentation first given at the NBER Summer Institute Conference on Personnel Economics, 2002, by John Abowd, Paul Lengermann, and Lars Vilhuber. It replaces LEHD Technical Paper TP-2002-05-rev1 (Longitudinal EmployerHousehold Dynamics Program 2002). This research is a part of the U.S. Census Bureau’s Longitudinal Employer-Household Dynamics Program (LEHD), which is partially supported by the National Science Foundation Grant SES-9978093 to Cornell University (Cornell Institute for Social and Economic Research), the National Institute on Aging Grant R01 AG018854, and the Alfred P. Sloan Foundation. This research is also partially supported by

149

150

J. Abowd et al.

data sources at the Census Bureau, these statistics offer unprecedented detail on the local dynamics of labor markets. Despite the fine geographic and industry detail, the confidentiality of the underlying micro-data is maintained by the application of new, state-of-the-art protection methods. The underlying data infrastructure was designed by the Longitudinal Employer-Household Dynamics (LEHD) Program at the Census Bureau (Abowd, Haltiwanger, and Lane 2004). The Census Bureau collaborates with its state partners, the suppliers of critical administrative records from the state unemployment insurance programs, through the Local Employment Dynamics (LED) cooperative federal-state program. Although the QWI are the flagship statistical product published from the LEHD Infrastructure Files, the latter have found a much more widespread application. The infrastructure constitutes an encompassing and almost universal data source for individuals and firms of all forty-six currently participating states.1 When complete, the LEHD Infrastructure Files will be the first nationally comprehensive statistical product developed from a universe that covers jobs—a statutory employment relation between an individual and employer—as distinct from ones that cover households (e.g., the Decennial Census of Population and Housing) or establishments (e.g., the Economic Censuses or the Quarterly Census of Employment and Wages [QCEW]). In this chapter, we describe the primary input data underlying the LEHD Infrastructure Files, the methods by which the Infrastructure Files are compiled, and how these files are integrated to create the Quarterly Workforce Indicators. We also provide details about the statistical models used to improve the basic administrative data, and describe enhancements and limitations imposed by both data and legal constraints. Many of the infrastructure and derivative micro-data files are now available within the Research Data Centers of the U.S. Census Bureau, and we indicate these files during the discussion. the National Science Foundation Information Technologies Research Grant SES-0427889, which provides financial resources to the Census Research Data Centers. This document reports the results of research and analysis undertaken by U.S. Census Bureau staff. It has undergone a Census Bureau review more limited in scope than that given to official Census Bureau publications. This document is released to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed herein are attributable only to the authors and do not represent the views of the U.S. Census Bureau, its program sponsors, Cornell University, or data providers. Some or all of the data used in this paper are confidential data from the LEHD Program. The U.S. Census Bureau supports external researchers’ use of these data through the Research Data Centers (see www.ces.census.gov). For other questions regarding the data, please contact Jeremy S. Wu, Program Manager, U.S. Census Bureau, LEHD Program, Data Integration Division, Room 6H136C, 4600 Silver Hill Rd., Suitland, MD 20233, USA. http://lehd.did.census.gov 1. The number of participating states still increases regularly as new Memoranda of Understanding are signed and new states begin shipping data. As of January 15, 2008, there are 46 states with signed MOUs, and 43 states with public use data available at http:// lehd.did.census.gov/

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

151

The QWI use a bewildering array of data sources—administrative records, demographic surveys and censuses, and economic surveys and censuses. The Census Bureau receives Unemployment Insurance (UI) wage records and ES-202 (QCEW) establishment records from each state participating in the LED federal/state partnership. The Census Bureau then uses these products to integrate information about the individuals (place of residence, sex, birth date, place of birth, race, education) with information about the employer (place of work, industry, employment, sales). Not all of the integration methods are exact one-to-one matches based on stable identifiers. In some cases, statistical matching techniques are used, and in other cases critical linking values are imputed. Throughout the process, critical imputations are done multiple times, improving the accuracy of the final estimates and permitting an assessment of the additional variability due to the imputations. Data integration is a two-way street. Not only do the Census Bureau’s surveys and censuses improve the detail on the administrative files, allowing the creation of new statistical products without any increase in respondent burden, but also as a part of its Title 13 mission, the Census Bureau uses the integrated files to improve its other demographic and economic products. The demographic products that have been improved include the Current Population Survey, the Survey of Income and Program Participation, and the American Community Survey. In addition, the LEHD Infrastructure Files are used for research to improve the Census Bureau’s Business Register, which is the sampling frame for all its economic data and the initial contact frame for the Economic Censuses. We give an overview of the different raw data inputs and how they are treated and adjusted in section 5.2. In a system that focuses on the dynamics at the individual, establishment, and firm level, proper identification of the entities is important, and we briefly highlight the steps undertaken to edit and verify the identifiers. A more detailed analysis of the longitudinal editing of individual record identifiers using probabilistic record linking has been published elsewhere (Abowd and Vilhuber 2005). The raw data are then aggregated and standardized into a series of component files, which we call the “Infrastructure Files,” as described in section 5.3. Finally, sections 5.4 and 5.5 illustrate how these Infrastructure Files are brought together to create the QWI. It will soon become clear to the reader that the level of detail potentially available with these statistics requires special attention to the confidentiality of the micro data supplied by the underlying entities. How their identities and data are protected is described in section 5.6. Many of the files described in this chapter are accessible in either a public-use or restricted-access version. A brief description of these files with pointers to more detailed documentation is provided in section 5.8. Section 5.9 concludes and provides a glimpse at the ongoing research into improving the infrastructure files.

152

J. Abowd et al.

We should note that this chapter has far too few authors. Over the years, many individuals have contributed to the effort documented in this chapter. As far as we are aware, in addition to the authors of this chapter, the following individuals, who are or were part of the LEHD Program and other parts of the Census Bureau, contributed to the design, implementation, and dissemination of the Infrastructure Files and the Quarterly Workforce Indicators. We thank Romain Aeberhardt, Charlotte Andersson, Matt Armstrong, B.K. Atrostic, Sasan Bakhtiari, Nancy Bates, Gary Benedetto, Melissa Bjelland, Lisa Blumerman, Holly Brown, Bahattin Buyuksahin, Barry Bye, John Carpenter, Nick Carroll, Pinky Chandra, Hyowook Chiang, Karen Conneely, Rob Creecy, Anja Decressin, Pat Doyle, Lisa Dragoset, Robert Dy, Colleen Flannery, Matt Freedman, Kaj Gittings, Cheryl Grim, Matt Graham, Matt Harlin, Sam Hawala, Sam Highsmith, Tomeka Hill, Rich Kihlthau, Charlene Leggieri, Paul Lengermann, Cyr Linonis, Cindy Ma, Jennifer Marks, Kristin McCue, Erika McEntarfer, John Messier, Harry Meyers, Jeronimo Mulato, Dawn Nelson, Nicole Nestoriak, Sally Obenski, Robert Pedace, Barry Plotch, Ron Prevost, George Putnam, Bryan Ricchetti, Kristin Sandusky, Lou Schwarz, David Stevens, Martha Stinson, Cynthia Taeuber, Jan Tin, Dennis Vaughn, Pete Welbrock, Greg Weyland, Karen Wheeless, Bill Winkler, and Laura Zayatz. In addition, continuing guidance was provided by Census Bureau executive staff and Senior Research Fellows, including Chet Bowie, Cynthia Clark, Gerald Gates, Nancy Gordon, John Haltiwanger, Hermann Habermann, Ron Jarmin, Brad Jensen, Frederick Knickerbocker, Julia Lane, Tom Mesenbourg, Paula Schneider, Rick Swartz, John Thompson, Dan Weinberg, and Jeremy Wu. 5.2 Input Files The LEHD Infrastructure File system is, fundamentally, a job-based frame designed to represent the universe of individual-employer pairs covered by state unemployment insurance system reporting requirements.2 Thus, the underlying data are wage records extracted from Unemployment Insurance (UI) administrative files from each LED partner state. In addition to the UI wage records, LED partner states also deliver an extract of the file reported to the Bureau of Labor Statistic’s Quarterly Census of Employment and Wages (QCEW, formerly known as ES-202). These data are received by LEHD on a quarterly basis, with historical time series extending back to the early 1990s for many states. 2. The frame is intended to be comprehensive for legal employment relations and selfemployment. Current development efforts include the addition of federal employment via records provided by the Office of Personnel Management and the addition of self-employment via records constructed from the Employer and Nonemployer Business Registers.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

153

5.2.1 Wage Records: UI Wage records correspond to the report of an individual’s UI-covered earnings by an employing entity, identified by a state UI account number (recoded to the State Employer Identification Number [SEIN] in the LEHD system). An individual’s UI wage record is retained in the processing if at least one employer reports earnings of at least one dollar for that individual during the quarter. Thus, an in-scope job must produce at least one dollar of UI-covered earnings during a given quarter in the LEHD universe. Maximum earnings reported are defined in a specific state’s unemployment insurance system, and observed top-coding varies across states and over time. A record is completed with information on the individual’s Social Security Number (later replaced with the Protected Identification Key [PIK] within the LEHD system), first name, last name, and middle initial. A few states include additional information: the firm’s reporting unit or establishment (recoded to SEINUNIT in the LEHD system), available for Minnesota, and a crucial component to the Unit-to-Worker imputation described later; weeks worked, available for some years in Florida; hours worked, available for Washington state. Current UI wage records are reported for the quarter that ended approximately six months prior to the reporting date at Census (the first day of the calendar quarter). Wage records are also reported for the quarter that the state considers final in the sense that revisions to its administrative UI wage record database after that date are relatively rare. This quarter typically ends nine months prior to the reporting date. Historical UI wage records were assembled by the partner states from their administrative record backup systems. 5.2.2 Employer Reports: ES-202 The employer reports are based on information from each state’s Department of Employment Security. The data are collected as part of the Covered Employment and Wages (CEW) program, also known as the ES202 program, which is jointly administered by the U.S. Bureau of Labor Statistics (BLS) and the Employment Security Agencies in a federal-state partnership. This cooperative program between the states and the federal government collects employment, payroll, economic activity, and physical location information from employers covered by state unemployment insurance programs and from employers subject to the reporting requirements of the ES-202 system. The employer and workplace reports from this system are the same as the data reported to the BLS as part of the Quarterly Census of Employment and Wages (QCEW), but are referred to in the LEHD system by their old acronym, ES-202. The universe for these data is a reporting unit, which is the QCEW establishment—the place

154

J. Abowd et al.

where the employees actually perform their work. Most employers have one establishment (single-units), but most employment is with employers who have multiple establishments (multi-units). One report per establishment per quarter is filed.3 The information contained in the ES-202 reports has increased substantially over the years. Employers report wages subject to statutory payroll taxes on this form, together with some other information. Common to all years, and critical to LEHD processing, are information on the employer’s identity (the SEIN), the reporting unit’s identity (SEINUNIT), ownership information, employment on the twelfth of each month covered by the quarter, and total wages paid over the course of the quarter. Additional information pertains to industry classifications (initially Standard Industrial Classification [SIC] and later, North American Industry Classification System [NAICS]). Other information includes the federal Employer Identification Number (EIN), and geography both at an aggregated civil level (county or Metropolitan Statistical Area [MSA]) and at a detailed level (physical location street address and mailing address). A recent expansion of the standard report’s record layout has increased the informational content substantially. 5.2.3 Administrative Demographic Information: PCF and CPR The UI and ES-202 files are the core data files describing the economic activity of individuals, jobs, and employers. Although these files contain a tremendous amount of detail on the economic activity, they contain little or no demographic information on the individuals. Demographic information comes from two administrative data sources—the Person Characteristics File (PCF) and the Composite Person Record (CPR), compiled by the Planning, Research, and Evaluation Division at the Census Bureau.4 The PCF contains information on sex, date of birth, place of birth, citizenship, and race, most of which is extracted from the Social Security Administration’s Numident file—the database containing application information for Social Security Numbers (SSN) sorted in SSN order. The CPR information contains annual place of residence data compiled from the Statistical Administrative Records System (StARS). 5.2.4 Demographic Product Integration As part of the integration of individual and household demographic information, the LEHD system uses the fact that many individuals were part of respondent households in the Survey of Income and Program Partici3. These data are also used to compile the Covered Employment and Wages (CEW) and Business Employment Dynamics (BED) data at the BLS. 4. This Division has now been reconstituted as part of the Data Integration Division in the Demographic Programs Directorate at the Census Bureau.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

155

pation (SIPP) or the March Current Population Survey (CPS). Identifier information from the 1984, 1990–1993, and 1996 SIPP panels as well as from March Demographic Supplement to the CPS from 1983 forward have been integrated into the system. See the discussion of the Individual Characteristics File system in section 5.3.2. 5.2.5 Economic Censuses and Annual Surveys Integration The LEHD Infrastructure Files include a crosswalk between the SEIN/ SEINUNIT and the federal Employer Identification Number (EIN). This crosswalk can be used to integrate data from the 1987, 1992, 1997, and 2002 economic censuses, all annual surveys of manufacturing, service, trade, transportation, and communication industries and selected, approved fields from the Census Bureau’s Employer and Nonemployer Business Registers. The integration is used for research to improve the economic activity and geocoding information in both the Infrastructure Files and the Business Registers. The integration of these data is based upon exact EIN matches, supplemented with statistical matching to recover establishments. See the discussion of the Business Register Bridge in section 5.8.2. 5.2.6 Identifiers and Their Longitudinal Consistency Both the wage records and employer reports are administrative datacomprehensive, but sometimes less than perfect. Spurious changes in the entity identifiers (Social Security Number for individuals, SEIN/SEINUNIT for employers and establishments) used for longitudinal matching can have a significant impact on most economic uses of the data. This section discusses the procedures implemented in the LEHD Infrastructure Files to detect, edit, and manage these identifiers. Scope of Data and Identifiers In the LEHD system, a person is identified initially by Social Security Number, and later by the Protected Identification Key (PIK). This identifier is national in scope, and individuals can be tracked across all states and time periods. Not all individuals are in-scope at all times. To be included in the wage record database, an individual’s job must be covered by the reporting requirements of the state’s unemployment insurance system. The prime exclusions are agriculture and some parts of the public sector, particularly federal, military, and postal works. Coverage varies across states and time, although on average, 96 percent of all private-sector jobs are covered. The BLS Handbook of Methods (Bureau of Labor Statistics l997a) describes UI coverage as “broad and basically comparable from state to state,” and claims “over 96 percent of total wage and salary civilian jobs” were covered in 1994. Stevens (2007) provides a survey of coverage for a subset of the current participant states in the LEHD system. An employer is identified primarily by its state UI account number (re-

156

J. Abowd et al.

coded to SEIN). A single legal employer might have multiple SEINs, but regardless of its operations in other states a legal employer has a different unemployment insurance account in each state in which it has statutory employees. In particular, the QWI are based exclusively on SEIN-based entities and their associated establishments. Since the SEIN is specific to a state, the QWI does not account for simultaneous activity of individuals across state lines, but within the same multi-state employer. Such activity appears as distinct jobs in the universe. Time-consistency is also not guaranteed, since the UI account number associated with an employer can also change (see later discussions). Although the QWI are based on SEIN/SEINUNIT establishments, this restriction does not apply to the Infrastructure Files themselves. Using the federal EIN, reported on the ES-202 extract and stored on the Employer Characteristics File (ECF) and the Business Register Bridge (BRB), research links to the Census Employer and Nonemployer Business Registers (BR) permit analyses that map entities from the QCEW universe to the Census establishment universe even when the employer-entity operates across state lines. (See section 5.8.2 for more information on the Business Register Bridge.) Error Correction of Person Identifiers Coding errors in the SSN can occur for a variety of reasons. A survey of fifty-three state employment security agencies in the United States over the 1996–1997 time period found that most errors are due to coding errors by employers, but that when errors were attributable to state agencies, data entry was the culprit (Bureau of Labor Statistics 1997b). The report noted that 38 percent of all records were entered by key entry, while another 11 percent were read in by optical character readers (OCRs) Optical character readers and magnetic media tend to be less prone to errors. Errors can be random digit coding errors that do not persist, typically generated when data are transferred from one format (paper) to another (digital), or they can be persistent, typically occurring when a firm’s payroll system contains an erroneous SSN. While the latter is harder to identify and to correct, the LEHD system uses statistical matching techniques, primarily probabilistic record linking, to correct for spurious and nonpersistent coding errors. The incidence of errors and the success rate of the error correction methods differs widely by state. In particular, it depends critically on the quality of the available individual name information on the wage records. Abowd and Vilhuber (2005) describe and analyze the LEHD SSN editing process as it was applied to data provided by the state of California. The process verified over half a billion records for that state and is now routinely applied to all states in the LEHD Infrastructure Files. The number of records that are recoded is slightly less than 10 percent of the total num-

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

157

ber of unique individuals appearing in the original data, and only a little more than 0.5 percent of all wage records. The authors estimate that the true error rate in the data is higher, in part due to the conservative setup of the process. Over 800,000 job history interruptions in the original data are eliminated, representing 0.9 percent of all jobs, but 11 percent of all interrupted jobs. Despite the small number of records that are found to be miscoded, the impact on flow statistics can be large. Accessions in the uncorrected data are overestimated by 2 percent, and recalls are biased upwards by nearly 6 percent. Payroll for accessions and separations are biased upward by up to 7 percent. The wage record editing occurs prior to the construction of any of the Infrastructure Files for two reasons. First, the wage record edit process requires access to the original Social Security Numbers as well as to the names on the wage records, both of which, because they are covered by the Privacy Act, are replaced by the Protected Identification Key (PIK) early in the processing of wage records. The PIK is used for all individual data integration. The original SSN and the individual’s name are not part of the LEHD Infrastructure Files. Second, because the identifier changes underlying the wage record edit are deemed spurious, and because individuals have no economic reason at all to change Social Security Numbers, there is little ambiguity about the applicability of the edit. This is different from the editing of employer identifiers, as shown in section 5.3. The Census Bureau designed the PIK as a replacement for the Privacy Act-protected SSN. The PIK itself is a random number related to the SSN solely through a one-to-one correspondence table that is stored and maintained by the Census Bureau on a computing system that is isolated from all LEHD systems and from most other systems at the Census Bureau. To avoid any commingling of SSN-laden data with PIK-laden data, which might compromise the protection afforded by the PIK, the wage record editing process takes place in a secure computing area distinct from the rest of the LEHD processing. Correcting for Changes in Firm Identifiers Firms in the QCEW system are identified by a UI account number assigned by the state. As with all employer identifiers, an account number can change over time for a number of reasons, not all of which are due to economically meaningful changes. State administrative units take great care to follow the legal entities in their system, but account numbers may nevertheless change for reasons which economists may not consider legitimate economic reasons. For instance, a change in ownership of a firm without any change in economic activity may lead to a change in the account number. Often, but not always, such a change is noted in the successor/predecessor fields of the ES-202 record. Other times, without changes in ownership, employees migrate en masse from one UI account to another. In this

158

J. Abowd et al.

case, one might make a reasonable inference that there were continuous economic operations. Because changes in the employer identifiers are correlated with some elements of economic choice, albeit imperfectly, these identifiers are managed in the LEHD Infrastructure File system. Because the system is designed to operate from regular reports of the administrative record systems in the partner states, the original employer identifiers must be retained in all files in the system. The LEHD system then builds a database of entity demographics that traces the formal successor/predecessor relations among these identifiers. In addition, entity-level summary inferences about undocumented successor/predecessor relations, which are based on worker flow statistical analysis, are also stored in this entity demography database. An auxiliary file, the Successor-Predecessor File (SPF), is created from the entity demographic histories and used to selectively apply successor/predecessor edits to the input files for the QWI. Handling the entity identifiers in this manner allows the LEHD system to receive and integrate updates of input data from partner state (because these share common entity identifiers) and to purge statistical analyses of the spurious changes due to noneconomic changes in the entity demography over time. Benedetto et al. (2007) provide more detail on the development of the SPF and its validity. The SPF is described in more detail later in this chapter. 5.3 Infrastructure Files This section describes the creation of the core Infrastructure Files from the raw input files. These files form the core of the integrated system that supports the job-based statistical frame that LEHD created. Each Infrastructure File is integrated into the system with longitudinally consistent identifiers that satisfy fundamental database rules, allowing them to be used as unique record keys. Thus, the core Infrastructure File system can be used to create valid statistical views of data for jobs, individuals, employers, or establishments. The system is programmed entirely in SAS and all files are maintained in SAS format with SAS indices. The raw input files, quarterly UI wage records, and ES-202 reports are first standardized.5 The UI wage record files are edited for longitudinal identifier consistency, and the SSN is then replaced by the PIK. The ES-202 files are standardized, but no identifier or longitudinal edits are performed at this stage. Thus, the raw input files with only the edits noted here are preserved for future research. Beyond these standardizing steps, no further processing of the raw files occurs. Instead, all the editing and imputation are done in the process of building the Infrastructure Files. The LEHD system builds the Infrastructure Files from the standardized 5. The ES-202 files, in particular, have been received in a bewildering array of physical file layouts and formats, reflecting the wide diversity in computer systems installed in state agencies.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

159

input files augmented by a large number of additional Census-internal demographic and economic surveys and censuses. The Employment History File (EHF) provides a full time series of earnings at all within-state jobs for all quarters covered by the LEHD system and provided by the state.6 It also provides activity calendars at a job, SEINUNIT, and SEIN level. The Individual Characteristics File (ICF) provides time-invariant personal characteristics and some address information.7 The Employer Characteristics File (ECF) provides a complete database of firm and establishment characteristics, most of which are time-varying. The ECF includes a subset of the data available on the Geocoded Address List (GAL), which contains geocodes for the block-level Census geography and latitude/longitude coordinates for the physical location addresses from a large set of administrative and survey data, including address information in the ES-202 input files. We will describe each of these files in detail in this section. 5.3.1 Employment History File: EHF The Employment History File (EHF) is designed to store the complete in-state work history for each individual that appears in the UI wage records. The EHF for each state contains one record for each employeeemployer combination—in other words, a job—in that state in each year. Both annual and quarterly earnings variables are available in the EHF. Individuals who never have strictly positive earnings at their employing SEIN (a theoretical possibility) in a given year do not have a record in the EHF for that year. The EHF data are restructured into a file containing one observation per job (PIK-SEIN combination), with all quarterly earnings and activity information available on that record. The restructured file is called the Person History File (PHF).8 An active job within a quarter, the primary job-level economic activity measure, is defined as having strictly positive quarterly earnings for the individual-employer pair that define the job. A similar time series, based on observed activity (positive employment) in the ES-202 records, is computed at the SEINUNIT level (UNIT History File, UHF) and the SEIN level (SEIN History File, SHF). At this stage of the data processing the first major integrated quality con6. The earliest data accepted by the LEHD system are 1990, quarter 1. Most states provided data beginning some time in the early 1990s. All partner states provide data beginning in 1997, quarter 1. Current input raw data files are delivered six months after the close of the quarter. The QWI data are produced within three months of the receipt of the raw input files from the unemployment insurance system. The LEHD system maintains all of the data reported by a partner state (or nationally for the national files). The QWI system uses as much of these data as possible. 7. A longitudinal enhancement of the ICF, which updates residential address information annually and contains some data from 2000 Census of Population and Housing, is under development. 8. It should be noted that the actual file structure is at the PIK-SEIN-SEINUNIT-YEAR level for the EHF, and at the PIK-SEIN-SEINUNIT level for the PHF. Although only one state (Minnesota) has nonzero values for SEINUNIT, this allows the file structure to be homogeneous across states.

160

J. Abowd et al.

trol checks occur. The system performs a quarter-by-quarter comparison of the earnings and employment information from the UI wage records (beginning-of-quarter employment, see the appendix for the definition, and total quarterly payroll) and ES-202 records (month one employment and total quarterly payroll). Large discrepancies in any quarter are highlighted and the problematic input files are passed to an expert analyst for study. Discrepancies that have already been investigated and that will, therefore, be automatically corrected in the subsequent processing of a state’s data are allowed to pass. Other discrepancies are investigated by the analyst. The analyst’s function is to find the cause of the discrepancy and take one of three courses of action: • Arrange for corrected data from the state supplier. • Develop an edit that can be applied to correct the problem. • Flag the data as problematic so that they are not used in the QWI estimation system. The first two actions result in a continuation of the Infrastructure File processing and no change in the QWI estimation period. The third action results in continuation of the Infrastructure File processing and either the suppression of a state’s QWI data until the problem can be corrected or a shortening of the time period over which QWI data are produced for that state. Often, a state-supplied corrected data file is imported into the LEHD system. Equally often, a state-specific edit is built into the data processing. Each time the state’s data are reprocessed, this edit is invoked. Unfortunately, not all data discrepancies can be resolved. Then, the third action occurs. In particular, the state’s archival historical UI wage record and ES202 data are sometimes permanently damaged or defective. In these cases, the data have been lost or permanently corrupted. The quality control during the EHF processing identifies the state and quarter when such problems occur. In the current Infrastructure File system, such data are not used for the QWI estimation but may be used by analysts for specific research projects. In the course of such research projects, the analyst often develops a statistical method for improving the defective data. These improvements are then ported into the Infrastructure File system.9 5.3.2 Individual Characteristics File: ICF The Individual Characteristics File (ICF) for each state contains one record for every person who is ever employed in that state over the time period spanned by the state’s unemployment insurance records. 9. For example, research on wage dynamics associated with estimates of firm-level human capital use has produced a statistical missing data edit for the UI wage records that detects missing wage records and imputes them by drawing from an appropriate posterior predictive distribution. The statistical models that detect and correct this problem will be imported into a future version of the EHF Infrastructure File.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

161

The ICF is constructed in the following manner. First, the universe of individuals is defined by compiling the list of unique PIKs from the EHF. Basic demographic information from the PCF is merged using the PIK, and records without a valid match are flagged. PIK-survey identifier crosswalks link the CPS and SIPP ID variables into the ICF. Sex and age information from the CPS is used to complement and verify the PCF-provided information. Age and Sex Imputation Approximately 3 percent of the PIKs found in the UI wage records do not link to the PCF. Multiple imputation methods are used to impute date of birth and sex for these individuals. To impute sex, the probability of being male is estimated using a state-specific logit model: (1)

P(male) f(Xiss)

where Xis contains a full set of yearly log earnings and squared log earnings, and full set of employment indicators covering the time period spanned by the state’s records, for each individual i with strictly positive earnings within state s and non-missing PCF sex. The state-specific ˆs, as estimated from equation (1), is then used to predict the probability of being male for individuals with missing sex within state s, and sex is assigned as (2)

ˆ male if Xis s l

where l ~ U [0, 1] is one of l 1, . . . , 10 independent draws from the distribution. Thus, each individual with missing sex is assigned ten independent missing data implicates, all of which are used in the QWI processing.10 The imputation of date of birth is done in a similar fashion using a multinomial logit to predict the probability of being in one of eight birth date decades and then assigning a birth date within decade based on this probability and the distribution of birth dates within the decade. Again, ten implicates are imputed for birth date. If an individual is missing sex or birth date in the PCF, but not in the CPS, then the CPS values are used, not the imputed values. Before the imputation model for date of birth is implemented, basic editing of the date of birth variable eliminates obvious coding errors, such as a negative age at

10. Note that this imputation does not account for estimation error in ˆ. This was one of the first missing data imputations developed at LEHD. At the time, techniques for sampling from the posterior predictive distribution of a binary outcome where the likelihood function is based on a logistic regression were not feasible on the LEHD computer system. Since only three percent of the observations in the ICF are subject to this missing data edit, it was implemented as described in the text. A longitudinal, enhanced ICF is under development (see section 5.9). All missing data imputations in the new ICF will be performed by sampling from an appropriate posterior predictive distribution. This will properly account for estimation error.

162

J. Abowd et al.

the time when UI earnings are first reported for the individual. In those relatively rare cases where the date of birth information is deemed unrealistic, birth date is set to missing and imputed based on the model described previously. Place of Residence Imputation Place of residence information on the ICF is derived from the StARS (Statistical Administrative Records System), which for the vast majority of the individuals found in the UI wage records contains information on the place of residence down to the exact geographical coordinates. However, in less than ten percent of all cases the geography information is incomplete or missing. Since the QWI estimation relies on completed place of residence information, because this information is a critical conditioning variable in the unit-to-worker (U2W) imputation model (see section 5.4.2), all missing residential addresses are imputed. County of residence is imputed based on a categorical model of the data that is a fully saturated contingency table. Separately for each state, unique combinations of categories of sex, age, race, income, and county of work are used to form i 1, . . . , I populations. For each sample i, the probability of residing in a particular county as of 1999, ij, is estimated by the sample proportion, pij nij /ni , where j 1, . . . , J indexes all the counties in the state plus an extra category for out-of-state residents. County of residence is then imputed based on (3)

county j if Pij1 uk Pij

where Pi is the CDF corresponding to pi for the ith population and kl ~ U [0, 1] is one of k 1, . . . , 10 independent draws for the i th individual belonging to the ith population.11 In its current version, no geography below the county level is imputed and in those cases where exact geographical coordinates are incomplete the centroid of the finest geographical area is used. Thus, in cases where no geography information is available this amounts to the centroid of the imputed county. Geographical coordinates are not assigned to individuals whose county of residence has been imputed to be out-of-state. Education Imputation The imputation model for education relies on a statistical match between the Decennial Census 1990 and LEHD data. The probability of belonging to one of thirteen education categories is estimated using 1990 Decennial data conditional on characteristics that are common to both Decennial and LEHD data, using a state-specific logit model: 11. The longitudinal, enhanced ICF that is under development augments the model in the text with a Dirichlet prior distribution for the Pij. The imputations are then made by sampling from the posterior predictive distribution, which is also Dirichlet.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

(4)

163

P(educat) f(Zis s)

where Zis contains age categories, earnings categories, and industry dummies for individuals age fourteen and older in the 1990 Census Long Form residing in the state being estimated, and who reported strictly positive wage earnings. The education category is imputed based on (5)

educat j if cpj1 l cpj

where cpj Zis s and l ~ U [0, 1] is one of l 11, . . . , 20 independent draws, and i ∈ EHF.12 5.3.3 The Geocoded Address List: GAL The Geocoded Address List (GAL) is a file system containing the unique commercial and residential addresses in a state geocoded to the Census block and latitude/longitude coordinates. The file encompasses addresses from the state ES-202 data, the Census Bureau’s Employer Business Register (BR), the Census Bureau’s Master Address File (MAF), the American Community Survey Place of Work file (ACS-POW), the American Housing Survey (AHS) and others. Addresses from these source files are processed by geocoding software (Group1’s Code1), address standardizers (Ascential/Vality), and record-matching software (Ascential/Vality) for unduplication. The remaining processing is done in SAS and the final files are in SAS format. The final output file system consists of the address list and a crosswalk for each processed file-year. The GAL contains each unique address, identified by a GAL identifier called GALID, its geocodes, a flag for each fileyear in which it appears, data quality indicators, and data processing information, including the release date of the Geographic Reference File (GRF). The GAL Crosswalk contains the ID of each input entity and the ID of its address (GALID). Geographic Codes and Their Sources A geocode on the GAL is constructed as the concatenation of FIPS (Federal Information Processing Standard) state, county and Census tract: FIPS-state (2) || FIPS-county (3) || Census-tract (6) This geocode uniquely identifies the Census tract in the United States. The tract is the lowest level of geography recommended for analysis. The Census block within the tract is also available on the GAL, but the uncertainties in block-coding make some block-level analyses unreliable. Geocoding 12. In the longitudinally enhanced ICF that is under development, this imputation is replaced by a probablistic record link to Census 2000 long form data. Approximately one person in six acquires directly reported educational attainment as of 2000. The remaining individuals get 10 multiple imputations from a Dirichlet/Multinomial posterior predictive distribution.

164

J. Abowd et al.

Table 5.1

Value

Sources of geocodes on GAL Typical percent

C

12.20

M E

81.86 0.00

W

0.03

O S I D missing

1.23 1.17 0.01 0.00 3.50

Meaning Code1, or the address matches an address for which Code1 supplied the block code The MAF—the address is a MAF address or matches a MAF address The MAF, the street address is exactly the same as a MAF address in the same tract The MAF, the street address is between 2 MAF addresses on the same block face Imputed using the distribution of commercial addresses in the tract Imputed using the distribution of residential addresses in the tract Imputed using the distribution of mixed-use addresses in the tract Imputed using the distribution of all addresses in the tract Block code is missing

100.00

to the block allows the addition of all the higher-level geocodes associated with the addresses. Latitude and longitude coordinates are also included in the file.13 Block Coding. Block coding is achieved by a combination of geocoding software (Group1’s Code1), a match to the MAF, or an imputation based on addresses within the tract. Table 5.1 describes the typical distribution of geocode sources. In all states processed to date, except California, no address required the D method. That is, almost every tract where an address lacks a block code contains commercial, residential, and mixed-use addresses. The Census Bureau splits blocks to accommodate changes in political boundaries. Most commonly, these are place boundaries (a place is a city, village, or similar municipality). The resulting block parts are identified by 2 suffixes, each taking a value from A to Z. The GAL assigns the block part directly from the MAF, or by using the one whose internal point is closest to the address by the straight-line distance. The GAL also provides the following components of the geocodes as separate variables, for convenience: Federal Information Processing Standards (FIPS) code (5 digits), FIPS state code (the first 2 digits of the FIPS code), FIPS county code within state (the rightmost 3 digits of the FIPS code), and Census tract code (a tract within the county, a 6-digit code). Higher-level geographic codes originate from the Block Map File (BMF). 13. An enhanced geocoding system was developed for the newer LEHD product called OnTheMap, which published to the block level. These enhancements are being integrated into an enhanced version of the GAL, which will be used for both QWI and OnTheMap.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

165

The BMF is an extract of the GRF-C (Geographic Reference File-Codes). All geocodes are character variables. Federal Information Processing Standards (FIPS) codes are unique within the United States; Census codes are not. Table 5.2 lists the available higher-level geocodes. Geographic Coordinates. The geographic coordinates of each address are available as latitude and longitude with 6 implied decimals. The coordinates are not always as accurate as 6 decimal places implies. An indicator of their quality is provided. Table 5.3 provides the typical distribution of codes, which range from 1 (highest quality) to 9 (lowest quality). Variables indicating the source of the geographic coordinates (Block internal point, geocoding software, MAF, or otherwise derived) are also available. Most coordinates are provided by either commercial geocoding software or the MAF. Finally, a set of flags also indicates, for each year and source file, whether an address appears on that file. For example, the flag variable b1997 equals 1 if the address is on the 1997 BR; otherwise it equals 0. As another example, if a state partner supplies 1991 ES-202 data with no address information, then e1991 will be 0 for all addresses. In a typical GAL year, between 3 and 6 percent of addresses are present on that year’s ES202 files, between 4 and 10 percent are present on a specific BR year file, and between 80 and 90 percent are present on the MAF. Less than one percent of addresses are found on the ACS-POW and AHS data, because these are sample surveys. Note that this distribution indicates where the GAL found a geocoded address, not the percentage of addresses that could be geocoded. Table 5.2

Higher-level geocodes on GAL

a_ fipsmcd a_mcd a_ fipspl a_ place a_msapmsa a_wib

5-digit FIPS Minor Civil Division (a division of a county) 3-digit Census Minor Civil Division (a division of a county) 5-digit FIPS Place 4-digit Census Place Metropolitan-Statistical-Area(4)—Primary-Metropolitan-Statistical-Area(4) 6-digit Workforce Investment Board area

Table 5.3

Quality of geographic coordinates

Value

Typical percent

Meaning

80.15 1.59 10.12 4.65 3.50

Rooftop or MAF (most accurate) ZIP4 or block face, block face is certain Block group is certain Tract is certain Coordinates are missing

1 2 3 4 9

100.00

166

J. Abowd et al.

Accessing the GAL: The GAL Crosswalks The GAL crosswalks allow data users to extract geographic and address information about any entity whose address went into the GAL. Each crosswalk contains the identifiers of the entity, its GALID, and sometimes flags. To attach geocodes, coordinates, or address information to an entity, users merge the GAL crosswalk to the GAL by GALID, selecting only observations existing on the required entities on the GAL crosswalk. Then they merge the resulting file to the entities of interest using the entity identifiers. An entity whose address was not processed (because it is out of state or lacks address information) will have blank GAL data. Table 5.4 lists the entity identifiers by data set or survey. 5.3.4 The Employer Characteristics File: ECF The Employer Characteristics File (ECF), which is actually a file system, consolidates most employer and establishment-level information (size, location, industry, etc.) into two files. The employer SEIN-level file contains one record for every year-quarter in which a SEIN is present in either the ES-202 or the UI wage records, with more detailed information available for the establishments of multi-unit SEINs in the SEINUNIT-level file. The SEIN file is built up from the SEINUNIT file and contains no additional information, but is an easier and more efficient way to access SEINlevel summary data. A number of inputs are used to build the ECF. The primary input is the ES-202 data. Unemployment Insurance (UI) wage record summary data are used to supplement information from the ES-202; in particular, SEINlevel employment (beginning of quarter, see appendix, for definitions) and quarterly payroll measures are built from the wage records. Unemployment Insurance (UI) wage record data are also used to supplement published BLS county-level employment data, which are used to construct weights for use in the QWI processing. Geocoded address information

Table 5.4

GAL crosswalk entity identifiers

Dataset

Entity identifier variables

AHS ES-202

control and year sein, seinunit, year, and quarter

ACS-POW BR

acsfileseq, cmid, seq, and pnum. cfn, year, and singmult

MAF

mafid and year

Note

e_ flag = p for physical addresses, e_ flag = m for mailing addresses as source of address info singmult indicates whether the entity resides in the single-unit (su) or the multi-unit (mu) data set. b_ flag = P if physical address, b_ flag = M for mailing address.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

167

from the GAL file contributes latitude-longitude coordinates of most establishments, as well as updated Workforce Investment Board (WIB) area and MSA information. The state-provided extracts from the BLS Longitudinal Database (LDB) and LEHD-developed imputation mechanisms are used to backfill NAICS information for periods in which NAICS was not collected. Finally, the QWI disclosure avoidance mechanism is initiated in the ECF. We will describe basic methods for constructing the ECF in the next section. Details of the NAICS imputation algorithm are described in the section titled “NAICS Codes on the ECF”. The entire disclosure-proofing mechanism is described in section 5.6. Constructing the ECF ECF processing starts by integrating yearly summary files for each SEIN and SEINUNIT in the ES-202 data files. General and state-specific consistency checks are then performed. The county, NAICS, SIC, and federal EIN data are checked for invalid values. The industry code edit goes beyond a simple validity check. If a four-digit SIC code or NAICS industry code (six-digit) is present, but is not valid, then the industry code undergoes a conditional missing data imputation based on the first two and three (SIC) or three, four, and five (NAICS) digits.14 All other invalid or missing industry codes are subjected to the longitudinal edit and missing data imputation described in the following paragraphs. Based on the EHF, SEIN-level quarterly employment (beginning of quarter) and payroll totals are computed. Unemployment Insurance (UI) wage record data are used as an imputation source for either payroll or employment in the following situations: • If ES-202 month one employment is missing, but ES-202 payroll is reported, then UI wage record beginning-of-quarter employment is used. • If ES-202 month one employment is zero, then UI employment is not used, since this may be a correct report of zero employment for an existing SEIN. The situation may arise when bonuses or benefits were retroactively paid, even though no employees were actively employed. • If ES-202 quarterly payroll is zero and ES-202 employment is positive, then UI wage record quarterly payroll is used. • If ES-202 quarterly payroll and employment are both zero or both missing, then UI wage record quarterly payroll and beginning-ofquarter employment are used. The ES-202 data contain a master record for multi-unit SEINs, which is removed after preserving information not available in the establishment records. Various inconsistencies in the record structure are also handled at 14. The NAICS 1997 are updated to NAICS 2002. Then, NAICS 2002 are used for the imputation. The same procedure is later used for LDB data.

168

J. Abowd et al.

this stage of the processing. For a single-unit SEIN, which has two records (master and establishment), information from the master records is used to impute missing data items directly for the establishment record. For a multi-unit SEIN, a flat prior is used in the allocation process; missing establishment data are imputed, assuming that each establishment has an equal share of unallocated employment and payroll. A subsequent longitudinal edit reexamines this allocation and improves it if there is historical information that is better than the equal-size assumption. The allocation process implemented above (master to establishments) does not incorporate any information on the structure of the SEIN. To improve on this, SEINs that are missing establishment structure for some periods—but reported a valid multi-unit structure in other periods—are inspected. The absence of information on establishment structure typically occurs when a SEIN record is missing due to a data processing error. A SEIN with a valid multi-unit structure in a previous period is a candidate for structure imputation. The employer’s establishment structure is then imputed using the last available record with a multi-unit structure. Payroll and employment are allocated appropriately. From this point on, the employer’s establishment structure (number of establishments per SEIN) is defined for all periods. Geocoded data from the GAL are incorporated to obtain geographic information on all establishments. Once the multi-unit structure has been edited and the geocoding data have been integrated, the ECF records undergo a longitudinal edit. Geographic data, industry codes (SIC and NAICS), and EIN data from quarters with valid data are used to fill missing data in other quarters for the same establishment (SEINUNIT). If at least one industry variable among the several sources (SIC, NAICS1997, NAICS2002, NAICS 2007, LDB) has valid data, it is used to impute missing values in other fields. Geography, if still missing, is imputed conditional on industry, if available. Counties with larger employment in a SEINUNIT’s industry have a higher probability of being selected. All missing data imputations are single draws from posterior predictive distributions that are multinomial based on an improper uniform Dirichlet prior. The imputation probabilities are the ratio of employment in each possible value to total employment in the support of the distribution.15 For SEINs, the (employment and establishment-weighted) modal values of county, industry codes, ownership codes, and EIN are calculated for 15. The posterior predictive distribution is multinomial because the employment proportions are derived from the population of employing establishments in the quarter, which is assumed to be nonrandom. Only a single imputation is performed because the unit-to-worker missing data model imputes 10 establishments to each job in a multi-unit SEIN. Multiple imputation of the missing data in those establishments would have meant that 100 implicates would have to be processed for each multi-unit job. This processing requirement was deemed impractical for the current QWI system.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

169

each SEIN and year-quarter. The SEIN-level records with missing data are filled in with data from the closest time period with valid data. At this point, if a SEIN mode variable has a missing value, then no information was ever available for that SEIN. Additional attention is devoted to industry codes, which are critical for QWI processing. Missing SIC and NAICS are randomly imputed with probability proportional to the statewide share of employment in four-digit SIC code or five-digit NAICS code. The SIC and NAICS codes with a larger share of employment have a higher probability of selection. If an industry code is imputed, it is done so once for each SEIN and remains constant across time. These industry codes are then propagated to all SEINUNITs as well. With most data items complete, provisional weights are calculated. These weights are discussed in the section on QWI processing (section 5.5). The disclosure avoidance noise-infusion factors are also prepared at the SEIN and SEINUNIT level and added to the ECF at this point. Disclosure avoidance methods are discussed in detail in section 5.6. Imputations in the ECF All employer or establishment data items used in the QWI processing, when missing, are imputed. These items include employer-establishment structure, employment, payroll, geography, industry, ownership, and EIN. This subsection describes these imputations, which are of two types: longitudinal edits—data from another period closest in time to the period with missing data are copied into the missing data items, and probabilistic imputation—missing data are imputed by sampling from a posterior predictive multinomial distribution based on a uniform Dirichlet prior, conditional on as much sample information as possible. The analyst is responsible for developing the likelihood component of the posterior predictive distribution. The employer-establishment structure refers to the structure of establishments within single-unit and multi-unit employers. In the ECF, the SEIN master record summarizes the information from all establishments. This record is either based on the comparable record in the raw ES-202 data (input directly or aggregated from the establishment records), or imputed by calculating summary information on beginning-of-quarter employment and total quarterly payroll directly from all UI wage records in a given quarter that come from the indicated SEIN (in the case where the SEIN does not have a record in the raw ES-202 data for that quarter). In either case, a SEIN master record is always available for every SEIN that exists in a given quarter in either the ES-202 or UI wage record data for that quarter. However, the establishment structure of this SEIN may be missing in a given quarter; that is, the SEINUNITs associated with the SEIN for this quarter are not input directly from the ES-202 data. In this case, the establishment structure is imputed by a longitudinal edit that looks for the

170

J. Abowd et al.

nearest quarter in which the establishment structure is not missing and copies this structure to the quarter with the missing structure. Then, the missing SEINUNIT employment and payroll are imputed from the SEIN master record by proportionally allocating the current quarter SEIN-level values to the SEINUNITs based on the proportions of the same variables in the donor quarter’s establishments. Only longitudinal edits are used in this process. If no donor quarter can be found, then the establishment structure is assumed to be single unit and a single SEINUNIT record is built from the SEIN master record. At this point, the employer-establishment structure is available for all SEINs, and all missing employment and payroll data have been imputed for every SEIN and SEINUNIT that exists in a state’s complete ECF. The ECF records are then geocoded from the GAL, as described in section 5.3.3. Hence, the missing geocode items are completed before the remainder of the missing data in the ECF are imputed. The geography subprocess of the ECF combines information about the entity history with the geocoding information from the GAL. Geocodes in the GAL are determined exclusively by contemporaneous address information, but contain information on the quality of the geocode information—whether a geocode reflects a rooftop geocode, a block, a block group, a tract, or only a county. The ECF geography subprocess takes this information, and applies a longitudinal edit, conditional on the SEINUNIT not changing locations. The inference of a geographical move for a SEINUNIT occurs whenever the geocode delivered by the GAL is different for two different time periods in a way that is not due to variations in the quality of geography coding. For example, a rooftop and a block group geocode will always necessarily have different geocodes. However, if the block groups corresponding to each entity differ, then the system assumes that the entity has physically moved. If the two SEINUNITs have been geocoded to the same block group, the difference in geocodes is considered a change in geography quality, not a move. Finally, the GALID associated with the best quality geography is copied to all quarters within the nonmove time period for that SEINUNIT. SEINUNITs with missing geography are excluded from the longitudinal edit. These units are assigned geography by a probabilistic imputation based on employment shares across counties given SIC (if the industry for the SEINUNIT is available), or by unconditional employment shares across counties (if it is not available). Each SEINUNIT with missing geography is assigned a pseudo-GALID reflecting the imputed county’s centroid. Additional geographic information (MSA or Core Based Statistical Area [CBSA], and WIB area) is attached to the ECF based on the GALID or pseudo-GALID. At this point all SEINUNIT-level records have completed geocoding. When the records are returned from geocoding, missing industry codes,

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

171

ownership, and EIN are imputed by longitudinal edit, if possible. The values of SIC, NAICS, ownership, and EIN are copied from the nearest nonmissing quarter. No further editing occurs for ownership or EIN. Industry codes that are still missing after the longitudinal edit are imputed with probabilistic methods based on the empirical distribution conditional on the same unit’s observed other industry data items. For instance, if SIC is missing, but NAICS1997 is available, the relative observed distribution of SIC-NAICS1997 pairs is used to impute the missing data item. If all previous imputation mechanisms fail, SIC is imputed unconditionally based on the observed distribution of within-state employment across SIC industries. Once SIC is assigned, the previous conditional imputation mechanisms are again used to impute other industry data items. Geocoding and industry coding are supplied for the SEIN-level record based on the following edit. The unweighted and employment-weighted modal values across SEINUNITs from the same SEIN are computed for WIB, MSA/CBSA, state, county, best sub-county geography, ownership, SIC, NAICS, and EIN. All SEIN-level records get assigned the both modal values (weighted and unweighted) and a researcher or analyst may choose the appropriate value.16 NAICS Codes on the ECF Enhanced NAICS variables on the ECF can be differentiated by the sources and coding systems used in their creation. There are two sources of data—the ES-202 and the BLS-created LDB—and three coding systems for NAICS—NAICS1997, NAICS2002, and NAICS2007. Every NAICS variable uses at least one source and one coding system. The ESO (ES-202-only) and FNL (final) variables are of primary importance to the user community. The ESO variables use information from the ES-202 exclusively and ignore any information that may be available on the LDB. In section 5.7.2 we provide an analysis on why this may be preferred. The FNL variables incorporate information from both the ES-202 and the LDB, with the LDB being the primary source. The QWI uses FNL variables for its NAICS statistics. Neither ESO nor FNL variables contain missing values. NAICS algorithm precedence ordering. Four basic sources of industry information are available on the ECF: NAICS and NAICS_AUX as well as SIC from ES-202 records, and the LDB-sourced NAICS_LDB codes. The NAICS, NAICS_AUX, and NAICS_LDB, when missing (no valid 6-digit industry code), are imputed based on the following algorithm. The SIC is 16. The employment weighted modal values that are on the SEIN-level record are only used in the QWI processing when the unit-to-work imputation described in the next section fails to impute a SEINUNIT to a job history.

172

J. Abowd et al.

filled similarly. Depending on the imputation used, a miss variable is defined, which is used in building the ESO and FNL variables. 1. Valid 6-digit industry code (miss 0). 2. Imputed code based on first 3, 4, or 5 digits when no valid 6-digit code is available in another period (miss 0). 3. Imputed code based on contemporaneous SIC if SIC changed prior to 2000 (miss 1.5). 4. Valid 6-digit code from another period (miss 2). 5. Valid code from another source (for example, if NAICS1997 is missing, NAICS2002 or SIC may be available) (miss 3). 6. Use employment-weighted SEIN modal value (miss 5 if contemporaneous modal value, miss 7 if the modal value stems from another time period). 7. Unconditional impute (miss 6 if only the SEIN-level modal value is imputed unconditionally, miss 11 if the SEIN-level value was unconditionally imputed and propagated to all SEINUNITs). ESO and FNL Variables. The ESO and FNL variables are made up of combinations of the various sources of industry information. The ESO variable uses the NAICS and NAICS_AUX variables as input. Information from the variable with the lowest miss value is preferred, although in case of a tie the NAICS_AUX value is used. The FNL variable uses the ESO and LDB variables. Information from the variable with the lowest miss value is preferred, although in case of a tie the NAICS_LDB value is used. 5.4 Completing the Missing Job-Level Data The Infrastructure Files contain most of the information necessary to compute the QWI. However, there are two important sources of missing job-level data that must be addressed before those estimates can be formed using substate levels of geography and detailed levels of industry: spurious employer-level identifier changes and missing establishment-level geography and economic activity data. We discuss the edits and imputations associated with these problems in this section. Fundamentally, the QWI are based on the job-level employment histories. Dynamic inconsistencies in these histories that are caused by individual identifier breaks are handled by the wage record edit described previously. Dynamic inconsistencies in these histories that are caused by employer or establishment identifier breaks that are not due to real economic activity are handled by creating the Successor-Predecessor File from the entity demographics, then extracting information from this file to suppress spurious employment and job flows. We describe this process in section 5.4.1.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

173

Job histories that are part of a multi-unit SEIN do not contain the establishment (SEINUNIT) associated with the job, except for the state of Minnesota. This means that establishment-level characteristics—geography and industry, in particular—are missing data for job histories that relate to multi-units. The missing imputation associated with this problem is discussed in section 5.4.2. 5.4.1 Connecting Firms Intertemporally: The Successor-Predecessor File (SPF) The firm identifier used in all of LEHD’s files is a state-specific account number from that state’s unemployment insurance accounting system, used, in particular, to administer the tax and benefits of the UI system. These account numbers, recoded and augmented by a state identifier, become the entity identifier called the SEIN in the Infrastructure File system. The SEINs can, and do, change for a number of reasons, including a change in legal form or a merger. Typically, the separation of a worker from an employer is identified by a change in the SEIN on that worker’s UI wage records. If an employer changes SEINs, but makes no other changes, the worker would appear to have left the original firm even though his or her employment status remains unchanged from an economic viewpoint. These spurious apparent employer changes are known to induce biases in both employment and job flow statistics. For example, a simple change in account numbers would lead to the observation of a firm closing even though all workers remain employed. To identify such events, the Successor Predecessor File (SPF) tracks large worker movements between SEINs. Benedetto et al. (2007) used the SPF for an early analysis in one particular state of the impact of such an exercise. The SPF provides a variety of link characteristics, based on the number of workers leaving a SEIN, in both absolute and relative terms, and the number of workers entering a SEIN, again in absolute and relative terms. For the QWI, only the strongest links are used to filter out spurious employer identifier changes. If 80 percent of a SEIN’s workers (the predecessor) are observed to move to a single successor, and that successor absorbs 80 percent of its employees from a single predecessor, then all flows between those two account numbers are filtered out and treated as if they had never existed. This is accomplished by coding in the QWI processing, not by changing any of the information in the infrastructure files.17 Of importance to the unit-to-worker imputation (described in section 5.4.2) is a similar measure, computed within a SEIN. For most states, and employers within states, the breakout of units into SEINUNITs is at the discretion of the employer, and the employer may decide to change such a 17. A more extensive evaluation of the impact of the SPF on the aggregate QWI statistics is currently under way.

174

J. Abowd et al.

breakout. The SPF, by following groups of workers as they move between SEINUNITs, identifies spurious intra-SEIN flows, which are then ignored when doing the unit-to-worker imputation for multi-unit job histories. 5.4.2 Allocating Workers to Workplaces: Unit-to-Worker Imputation (U2W) Early versions of the QWI (then called the Employment Dynamics Estimates [EDE]), were computed only at the SEIN level, with employment allocated to a single location per SEIN. This approach was driven by the absence of workplace information on almost all state-provided wage records. Only the state of Minnesota requires the identification of a worker’s workplace (SEINUNIT) on its UI wage records. A primary objective of the QWI is to provide employment, job and worker flows, and wage measures at a very detailed level of geography (place-of-work) and industry. The structure of the administrative data received by LEHD from state partners, however, poses a challenge to achieving this goal. The QWI measures are primarily based on the processing of UI wage records that report, with the exception of Minnesota, only the legal employer (SEIN) of the workers. The ES-202 micro-data, however, are comprised of establishment-level records which provide the geographic and industry detail needed to produce the QWI. For employers operating only one establishment within a state, the assignment of establishmentlevel characteristics to UI wage records is straightforward because there is no distinction between the employer and the establishment. However, approximately 30 to 40 percent of state-level employment is concentrated in employers that operate more than one establishment in that state. For these multi-unit employers, the SEIN on workers’ wage records identifies the legal employer in the ES-202 data, but not the employing establishment (place-of-work). Thus, establishment level characteristics—geography and industry, in particular—are missing data for these multi-unit job histories. In order to impute establishment-level characteristics to job histories of multi-unit employers, a nonignorable missing data model with multiple imputation was developed. The model imputes establishment-of-employment using two key characteristics available in the LEHD Infrastructure Files: (a) distance between place-of-work and place-of-residence and (b) the distribution of employment across establishments of multi-unit employers. The distance to work model is estimated using data from Minnesota, where both the SEIN and SEINUNIT identifiers appear on a UI wage record. Then, the posterior distribution of the parameters from this estimation, combined with the actual SEIN and SEINUNIT employment histories from the ES-202 data, are used for multiple imputation of the SEINUNIT associated with workers in a given SEIN in the data from states other than

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

175

Minnesota.18 Emerging from this process is an output file, called the Unitto-Worker (U2W) file, containing ten imputed establishments for each worker of a multi-unit employer. These implicates are then used in the downstream processing of the QWI. The U2W process relies on information from each of the four Infrastructure Files—ECF, GAL, EHF, and ICF—as well as the auxiliary SPF file. Within the ECF, the universe of multi-unit employers is identified. For these employers, the ECF also provides establishment-level employment, date-of-birth, and geocodes (which are acquired from the GAL). The SPF contains information on predecessor relationships, which may lead to the revision of date-of-birth implied by the ECF. Finally, job histories in the EHF in conjunction with place-of-residence information stored in the ICF provide the necessary worker information needed to estimate and apply the imputation model. A Probability Model for Employment Location Definitions. Let i 1, . . . , I index workers, j 1, . . . , J index employers (SEINs), and t 1, . . . , T index time (quarters). Let Rjt denote the number of active establishments at employer j in quarter t, let maxj,t Rjt, and r 1, . . . , index establishments. Note that the index r is nested within j. Let Njrt denote the quarter t employment of establishment r in employer j. Finally, if worker i was employed at employer j in t, denote by yijt the establishment at which the worker was employed. Let t denote the set of employers active in quarter t, let jt denote the set of individuals employed at employer j in quarter t, let jt denote the set of active (Njrt 0) establishments at employer j in t, and let ijt ⊂ jt denote the set of active establishments that are feasible for worker i. Feasibility is defined as follows: an establishment r ∈ ijt if Njrs 0 for every quarter s that i was employed at j. The probability model. Let pijrt Pr( yijt r). At the core of the model is the probability statement: e x (6) pijrt i e x jrt

∑

ijrt

jst

ijst

s∈jt

where jrt is a establishment- and quarter-specific effect, xijrt is a time-varying vector of characteristics of the worker and establishment, and measures the effect of characteristics on the probability of being employed at a partic18. The actual SEINUNIT coded on the UI wage records is used for Minnesota, and would be used for any other state that provided such data. Note that there are occasional, and rare, discrepancies between the unit structure on the Minnesota wage records and the unit structure on the Minnesota ES-202 data for the same quarter. These discrepancies are resolved during the initial processing of the Minnesota data in its state-specific read-in procedures.

176

J. Abowd et al.

ular establishment. In the current implementation, xijrt is a linear spline in the (great-circle) distance between worker i’s residence and the physical location of establishment r. The spline has knots at 25, 50, and 100 miles. Using equation (6), the following likelihood is defined T

(7)

p( y| , , x) ∏ ∏ ∏ ∏ ( pijrt)d

ijrt

t1 j∈t i∈jt r∈ijt

where dijrt

(8)

1 if yijt r 0 0 otherwise

and where y is the appropriately-dimensioned vector of the outcome variables yijt, is the appropriately dimensioned vector of the jrt, and x is the appropriately-dimensioned matrix of characteristics xijrt. For jrt, a hierarchical Bayesian model based on employment counts Njrt is specified. The object of interest is the joint posterior distribution of and . A uniform prior on , p () ∝ 1 is assumed. The characterization of p( , |x, y, N) is based on the factorization (9)

p( , |x, y, N) p( |N)p(| , x, y) ∝ p( |N)p()p(y| , , x) ∝ p( |N)p( y| , , x).

Thus, the joint posterior (9) is completely characterized by the posterior of and the likelihood of y in (7). Note (7) and (9) assume that the employment counts N affect employment location y only through the parameters . Estimation. The joint posterior p( , |x, y, N) is approximated at the posterior mode. In particular, we estimate the posterior mode of p(| , x, y) evaluated at the posterior mode of . From these we compute the posterior modal values of the jrt, then, maximize the log posterior density T

(10) log p(| , x, y) ∝ ∑ ∑ ∑ ∑ dijrt jrt xijrt log t1 j∈t i∈jt r∈jti

∑e

jst xijst

s∈ijt

which is evaluated at the posterior modal values of the jrt, using a modified Newton-Raphson method. The mode-finding exercise is based on the gradient and Hessian of (10). In practice, (10) is estimated for three employer-employment size classes: 1 to 100 employees, 101 to 500 employees, and greater than 500 employees, using data for Minnesota. Imputing Place of Work After estimating the probability model using Minnesota data, the posterior distribution of the estimated parameters is combined with the entityspecific posterior distribution of the parameters in the imputation pro-

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

177

cess for other states. A brief outline of the imputation method, as it relates to the probability model previously discussed, is provided in this section. Emphasis is placed on not only the imputation process itself, but also the preparation of input data. Sketch of the imputation method. Ignoring temporal considerations, 10 implicates are generated as follows. First, using the posterior mean and variance of estimated from the Minnesota data, we take 10 draws of from the normal approximation (at the mode) to p(| , x, y). Next, using ES-202 employment counts for the establishments, we compute 10 values of jt based on the hierarchical model for these parameters. Note that these are draws from the exact posterior distribution of the jrt. The drawn values of and are used to draw 10 imputed values of place of work from the asymptotic approximation to the posterior predictive distribution (11)

p( y˜|x, y) ∫ ∫ p(y˜| , , x, y)p( |N)p(| , x, y) d d.

Implementation Establishment data. Using state-level micro-data, the set of employers (SEINs) that ever operate more that one establishment in a given quarter is identified; these SEINs represent the set of ever-multi-unit employers defined above as the set t. For each of these employers, its establishmentlevel records are identified. For each establishment, latitude and longitude coordinates, parent employer (SEIN) employment, and ES-202 monthone employment19 for the entire history of the establishment are retained. Those establishments with positive month-one employment in a given quarter characterize jt, the set of all active establishments. An establishment birth date is identified and, in most cases, is the first quarter in the ES-202 time series in which the establishment has positive month-one employment. For some employers, predecessor relationships are identified in the SPF; in those instances, the establishment date-of-birth is adjusted to coincide with that of the predecessor’s. Worker data. The EHF provides the earnings histories for employees of the ever-multi-unit employers. For each in-scope job (a worker-employer pair), one observation is generated for the end of each job spell, where a job spell is defined as a continuum of quarters of positive earnings for a worker at a particular employer during which there are no more than three consecutive periods of nonpositive earnings.20 The start date of the job history 19. In rare instances where no ES-202 employment is available, an alternative employment measure based on UI wage record counts may be used. 20. A new hire is defined in the QWI as a worker who accedes to a firm in the current period but was not employed by the same firm in any of the 4 previous periods. A new job spell is created if, for example, a worker leaves a firm for more than 4 quarters and is subsequently reemployed by the same firm.

178

J. Abowd et al.

is identified as the first quarter of positive earnings; the end date is the last date of positive earnings.21 These job spells characterize the set jt. Candidates. Once the universe of establishments and workers is identified, data are combined and a priori restrictions and feasibility assumptions are imposed. For each quarter of the date series, the history of every job spell that ends in that quarter is compared to the history of every active (in terms of ES-202 first month employment) establishment of the employing employer (SEIN). The start date of the job spell is compared to the birth date of each establishment. Establishments that were born after the start of a job spell are immediately discarded from the set of candidate establishments. The remaining establishments constitute the set ijt ⊂ jt for a job spell (worker) at a given employer.22 Given the structure of the pairing of job spells with candidate establishments, it is clear that within job spell changes of establishment are ruled out. An establishment is imputed once for each job spell,23 thereby creating no spurious labor market transitions. Imputation and output data. Once the input data are organized, a set of 10 imputed establishment identifiers are generated for each job spell ending in every quarter for which both ES-202 and UI wage records exist. For each quarter, implicate, and size class, s 1, 2, 3, the parameters on the linear spline in distance between place-of-work and place-of-residence ˆ s are sampled from the normal approximation of the posterior predictive distribution of s conditional on Minnesota (MN) (12)

p(s| MN, xMN, yMN).

The draws from this distribution vary across implicates, but not across time, employers, and individuals. Next, for each employer j at time t, a set of ˆ jrt are drawn from (13)

p( ST |NST)

which are based on the ES-202 month-one employment totals (Njrt) for all candidate establishments rjt ⊂ Rjt at employer j within the state (ST ) being processed. The initial draws of ˆ jrt from this distribution vary across time and employers but not across job spells. Combining (12) and (13) yields 21. By definition, an end-date for a job spell is not assigned in cases where a quarter of positive earnings at a firm is succeeded by 4 or fewer quarters of nonemployment and subsequent reemployment by the same firm. 22. The sample of UI wage and QCEW data chosen for processing of the QWI is such that the start and end dates are the same. Birth and death dates of establishments are, more precisely, the dates associated with the beginning and ending of employment activity observed in the data. The same is true for the dates assigned to the job spells. 23. More specifically, an establishment is imputed to a job spell only once within each implicate.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

(14)

179

p( ST|NST)p(s| MN, xMN, yMN) ≈ p( ST|NST)p(s| ST, xST, yST) p( ST, ST|xST, yST, NST)

an approximation of the joint posterior distribution of and s (9) conditional on data from the state being processed. The draws ˆ s and ˆ jrt in conjunction with the establishment, employer, and job spell data are used to construct the pijrt in (1) for all candidate establishments r ∈ ijt. For each job spell and candidate establishment combination, the ˆ s are applied to the calculated distance between place-ofresidence (of the worker holding the job spell) and the location of the establishment, where the choice of ˆ s depends on the size class of the establishment’s parent employer. For each combination an ˆ jrt is drawn, which is based primarily on the size (in terms of employment) of the establishment relative to other active establishments at the parent employer. In conjunction, these determine the conditional probability pijrt of a candidate establishment’s assignment to a given job spell. Finally, from this distribution of probabilities is drawn an establishment of employment. The imputation process yields a data file containing a set of 10 imputed establishment identifiers for each job spell. In a very small set of cases, the model fails to impute an establishment to a job spell. This is often due to unanticipated idiosyncrasies in the underlying administrative data. Furthermore, across states, the proportion of these failures relative to successful imputation is well under 0.5 percent. For these job spells, a dummy establishment identifier is assigned and in downstream processing, the employment-weighted modal employer-level characteristics are used. 5.5 Forming Aggregated Estimates: QWI 5.5.1 What are the QWI? The Quarterly Workforce Indicators (QWI) provide detailed local estimates of a variety of employment and earnings indicators. Employment, earnings, gross job creation and destruction, and worker turnover are available at different levels of geography, including the county, Workforce Investment Area, and Core Based Statistical Area.24 At each level of geography, the QWI are available by detailed industry (SIC and NAICS), sex, and age of workers. As of January 2008, QWI for forty-three states had been published, three additional states were in prerelease analysis, and a total of forty-six states, including the District of Columbia, had signed 24. The original QWI release used Metropolitan Statistical Areas. The older MSA definitions were replaced with CBSA definitions in 2005.

180

J. Abowd et al.

Memorandums of Understanding (MOUs). The program was still expanding with the goal of national coverage. 5.5.2 Computing the Estimates The establishment of the LEHD Infrastructure Files was driven in large part, although not exclusively, by the needs of the QWI computations. Completed and representative job-level data, with worker and workplace characteristics, are the primary input for the QWI. The ICF (section 5.3.2) and the ECF (section 5.3.4) draw on a large number of data sources, and use a set of editing and imputation procedures described previously, to provide a detailed picture of each economic actor. The ECF also provides the input data for the weighting, which is explained in more detail in section 5.5.3. The wage record edit (section 5.2.6) and the SPF (section 5.4.1) apply longitudinal edits and probabilistic matching rules to the improve the longitudinal linking of entities. The U2W (section 5.4.2) completes the picture, by multiply imputing an employing establishment to each job reported by the multi-unit employers. Figure 5.1 provides a graphical overview of how these data sources are used in QWI processing. These data are then combined and aggregated to compute the QWI statistics. The aggregation is a four-step process: 1. A job—a unique PIK-SEIN-SEINUNIT combination—is identified, and the job’s complete activity history (when the worker had positive earnings at the SEIN-SEINUNIT, and when the worker did not have positive earnings) was recorded. Note that for job history associated with multi-unit SEINs, there are 10 implicate SEINUNITs (possibly nonunique) for each job, and these implicates each get a weight of 0.1 in the downstream processing.25 2. Job-level variables are computed as a set of indicators. The computation of each of these variables is described in detail in section 2.2 of the appendix. 3. Job-level variables are aggregated to the establishment level (SEINUNIT), using appropriate implicate weights. The aggregation is done using formulae described in section 2.3 of the appendix. For many variables, aggregation to the establishment-level is achieved by summing the job-level variables (beginning-of-period employment, end-of-period employment, accessions, new hires, recalls, separations, full-quarter employment, full-quarter accessions, full-quarter new hires, total earnings of full-quarter employees, total earnings of full-quarter accessions, and total earnings of full-quarter new hires). Some aggregate flow variables are computed using the beginning- and end-of-quarter employment estimates for 25. In the underlying frame, a job is a PIK-SEIN pair. For single-unit employers, this is equivalent to a PIK-SEIN-SEINUNIT triple. For multi-unit employers within a single state, the original pair is completed to a triple by the unit-to-worker multiple imputation.

Fig. 5.1

Overview of LEHD data flow

182

J. Abowd et al.

that workplace. Examples are net job flows (see equation (A43) in appendix section 2), average employment (A44), job creations (A46) and job destructions (A48). The file created in this step, internally known as the Unit Flow File (UFF_B), is also available in the RDC system (see section 5.8.2 for details). 4. The variables necessary for applying the QWI disclosure avoidance algorithm—SEINUNIT-specific noise infusion called “fuzz factors”—are attached, and the establishment-level file is summed to the desired level of geographic and demographic detail, using the noise-infused values. Some flow variables are computed directly from other aggregated variables (see appendix section 2.5). An undistorted version of all aggregates is also created. All aggregations use weights (see section 5.5.3). 5. The tables created in the previous step are processed by the disclosure avoidance procedure (see section 5.6), using a comparison with the undistorted version of each indicator and appropriate cell counts. If necessary, items in some cells are suppressed, and noisy estimates are flagged as such. 5.5.3 Weighting in the QWI The QWI are estimates formed from weighted sums where the weights have been controlled to state-level QCEW statistics for all private employers as published by the BLS. The control is approximate, however, because the weights are calculated from the unfuzzed beginning-of-quarter employment data whereas the publication estimates are based on the weighted sums of the noise-infused data. When building the ECF, weights are computed such that the measured beginning-of-quarter UI employment of in-scope units, when properly weighted, is equal to the published QCEW statewide employment in the first month of the quarter for all private employers. A preliminary weight is computed as part of the ECF processing. An adjustment factor that accounts for system-wide missing data imputation and other edits, is computed in the downstream (UFF_B) processing. This adjustment factor is computed for all private establishments. The final weight is computed in the UFF_B processing to control the product of the initial weight and the adjustment factor to the state total for all private employment in that quarter’s QCEW data. The same overall adjustment factor that was calculated for all private establishments is used to produce the final weights for all the establishments QWI estimates. Selection, editing, longitudinal linking, and disclosure avoidance procedures in the micro data used to build the QWI all change the in-scope units’ data somewhat, causing the preliminary and final weights to disagree. When the final weight is used for all published QWI statistics, the difference between the published QCEW statistic and the appropriate statistic in the QWI system is less than 0.5 percent.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

183

5.6 Disclosure Avoidance Procedures for the QWI The disclosure avoidance procedures for the QWI consist of a set of methods used to protect the confidentiality of the identity and attributes of the individuals and businesses that form the underlying data in the system. In the QWI system, disclosure avoidance is required to protect the information about individuals and businesses that contribute to the UI wage records, the ES-202 quarterly reports, and the Census Bureau demographic data that have been integrated with these sources. The QWI disclosure avoidance mechanism is described and analyzed in more detail in Abowd et al. (2006); we present an overview here. 5.6.1 Three Layers of Confidentiality Protection There are three layers of confidentiality protection and disclosure avoidance protections in the QWI system. The first layer occurs when job-level estimates (computed from the EHF) are aggregated to the establishment level. The QWI system infuses specially constructed noise into the estimates of all of the workplace-level measures. We will describe the noiseinfusion process in more detail in section 5.6.2. After this noise infusion, the distorted micro data item is used as the source for all published QWIs. A second layer of confidentiality protection occurs when the workplacelevel measures are aggregated to higher levels (e.g., substate geography and industry detail). The data from many individuals and establishments are combined into a (relatively) few estimates using a dynamic weight that controls the state-level beginning of quarter employment for all private employers to match the first month in quarter employment as tabulated from the QCEW. The weighting procedure introduces an additional difference between the confidential data item and the released data item, and in combination with the noise infusion, the published data are moved away from the value contained in the underlying micro data, contributing to the protection of the confidentiality of the micro data. Third, some of the aggregate estimates turn out to be based on fewer than three persons or establishments. These estimates are suppressed and a flag set to indicate suppression. Suppression is only used when the combination of noise infusion and weighting may not distort the publication data with a high enough probability to meet the criteria laid out above. Estimates such as employment are subject to suppression. Continuous dollar measures like payroll are not. All published estimates are influenced by the noise that was infused in the first layer of the protection system. When the distortion exceeds certain limits, the estimates are still published, but flagged as substantially distorted. Each observation on any one of the published QWI tables thus has an associated flag that describes its disclosure status. Table 5.5 lists all possible flags in the published QWI tables.

184

J. Abowd et al.

Table 5.5

Disclosure flags in the QWI

Flag –2 –1 0 1 5 9

Explanation No data available in this category for this quarter Data not available to compute this estimate Zero employment estimated or zero estimated denominator in a ratio, zero released OK, distorted value released Value suppressed because it does not meet U.S. Census Bureau publication standards Data significantly distorted, distorted value released

5.6.2 Details of the QWI Noise Infusion Process The noise infused into the QWI data is designed to have three very important properties. First, every data item is distorted by some minimum amount. Second, for a given workplace, the data are always distorted in the same direction (increased or decreased) by the same percentage amount in every period, and in every revision of the QWI series. Third, the statistical properties of this distortion are such that when the estimates are aggregated, the effects of the distortion cancel out for the vast majority of the estimates, preserving both cross-sectional and time series analytical validity. We describe below the algorithms by which the above goals are achieved. A statistical analysis providing evidence of the third goal is provided in section 5.7.2. Disclosure Avoidance Using Noise Infusion Factors To implement the multiplicative noise model in section 5.6, a random fuzz factor j is drawn for each establishment j according to the following process: (b )/(b a)2, ∈ [a, b]

p(j) (b 2)/(b a)2, ∈ [2 b, 2 a] 0, otherwise

0, 2 b [( b 2)2]/[2(b a)2], ∈ [2 b, 2 a]

F(j)

0.5, ∈ (2 a, a) 0.5 [(b a)2 (b )2]/[2(b a)2], ∈ [a, b]

1, b

where a 1 c/100 and b 1 d/100 are constants chosen such that the true value is distorted by a minimum of c percent and a maximum of d percent (the exact numbers are confidential). Note that 1 a b 2. This produces a random noise factor centered around 1 with distortion of at

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

Fig. 5.2

185

Distribution of fuzz factors

least c and at most d percent. Figure 5.2 depicts such a distribution. A fuzz factor is drawn for each employer and for each of the establishments associated with that employer. Although fuzz factors vary across establishments of the same employer, the fuzz factors attached to all establishments of the same employer are drawn from the same (upper or lower) tail of the fuzz factor distribution. Thus, if the fuzz factor associated with a particular employer (SEIN) is less than unity, then all that employer’s establishments (SEINUNITs) will also have fuzz factors less than unity. It is also important to point out that a fuzz factor is attached to each SEIN and SEINUNIT only once and retained for all time periods after the initial assignment. Applying the Fuzz Factors to Estimates Although all estimates are distorted based on the multiplicative noise model, the exact implementation depends on the type of estimate that is computed. For completeness we show all the relevant formulas here, referring the reader to Abowd, Stephens, and Vilhuber (2006) for details. In all cases, the micro data noise infusion occurs at the level of an establishment estimate. However, for QWI involving ratios and changes, the basic fuzzed and unfuzzed values are combined at the publication level of aggregation to produce the released estimates. In what follows, distorted values are distinguished from their undistorted counterparts by an asterisk, that is, the

186

J. Abowd et al.

true (unfuzzed) value of beginning-of-quarter employment is B, its noiseinfused (fuzzed) counterpart is B∗. Fuzzing of estimates of employment. The fuzz factor j is used to fuzz all estimates of employment totals by scaling of the true establishment level statistic according to the formula: (15)

X∗jt j Xjt

where Xjt is an establishment level employment estimate: B, E, M, F, A, S, H, R, FA, FS, and FH. All variable definitions are provided in section 2 of the appendix. Fuzzing of averages of magnitude estimates where the denominator is an employment estimate. Ratios of magnitude estimates to employment estimates are protected by using fuzzed numerators and unfuzzed denominators according the formula: Y∗jt Yjt ZY∗jt j B(Y)jt B(Y)jt where ZYjt is a ratio of a magnitude estimate, Yjt, (dollars or quarters) and B(Yjt) is an estimate of employment. The ratio has the interpretation of an average in most cases. The variables protected according to this method are: ZW2, ZW3, ZWFH, ZWA, ZWS, ZNA, ZNH, ZNR, and ZNS. The relevant values of Yjt and B(Yjt) are shown in the establishment level statistics in the previous equation. In the actual QWI processing, the numerator and denominator of these confidentiality-protected ratios are tabulated separately for each publication category (ownership state substategeography industry age group sex). Then, the publication ratio is computed when the public-use release files are created. Fuzzing of differences of counts and magnitudes. Fuzzed net job flow (JF ) is computed at the aggregate level for k (ownership state substategeography industry age group sex) cell as the product of the aggregated, unfuzzed rate of growth of net jobs and the aggregated fuzzed employment: E ∗kt J F ∗kt Gkt E ∗kt J Fkt . E kt This method of fuzzing net job flow will consistently estimate net job flow because it takes the product of two consistent estimators. The formulas for fuzzing gross job creation (JC ) and job destruction (JD) are similar: E ∗kt JC∗kt JCRkt E ∗kt JCkt E kt

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

187

and E ∗kt JD∗kt JDRkt E ∗kt JDkt . E kt The same method was used to protect estimates of wage changes for different employment estimates. The unfuzzed estimated total changes were divided by the unfuzzed denominators then multiplied by the ratio of the fuzzed denominator to the unfuzzed denominator, as in the formula: Y∗kt WYkt ZWY ∗kt Ykt Ykt where, again, Y denotes a particular employment, WY denotes the estimated change in wages for that employment estimate, and ZWY∗ is the confidentiality-protected estimate of the ratio. This method is used for ZWA, ZWS, ZWFA, and ZWFS. The ratio FT involves three QWI that are also in the release file. In order to protect the ratio of the fuzzed to unfuzzed estimate of full-quarter employment, the release value of FT is protected by the formula: (FA∗kt FS∗kt) /2 F ∗kt FT ∗kt Fkt Fkt In the actual QWI processing the numerator and denominator of these confidentiality-protected changes and ratios are tabulated separately for each publication category (ownership state substate-geography industry age group sex). Then, the publication change or ratio is computed when the public-use release files are created. 5.7 Analysis of the QWI Files In this section, we will provide some basic analysis highlighting the usefulness of the QWI as time series data on local labor market conditions and measuring the impact of the various corrections that are applied to the series. 5.7.1 Basic Trends of Some Variables The QWI are uniquely positioned to provide a picture of a dynamic workforce at a highly disaggregated level with both demographic and economic detail. In this section, we consider three variables, and provide examples of analyses that can be easily produced with the QWI. We consider employment (more precisely, begininning-of-quarter employment), job creation, and recalls. We have picked the states of Illinois and Montana to illustrate the analyses. Figures 5.3 and 5.4 show the basic data trends for the three variables, stated in thousands of workers, for both sexes combined and separately. In general, all three time series show considerable seasonality, but job creations

Fig. 5.3

Basic data trends, Illinois

Fig. 5.4

Basic data trends, Montana

190

J. Abowd et al.

and recalls are considerably more variable. However, when looking at the time series by sex, there appears to be less volatility in job creations and recalls for women than for men. Figures 5.5 and 5.6 restate these series as the percentage of women in the total for each variable. In Illinois, the percentage of jobs created that are filled by recalls is significantly lower for women (46.2 percent) than it is for men (53 percent), and persistently so over time (fig. 5.3), although there is strong seasonality in this pattern as well. A similar pattern, although not quite as stable, emerges in Montana (fig. 5.3). Thus, it would seem that although women participate as much as men in job creation, they are more likely to have found a new job than to have been recalled to an old job. Of course, this is a very simple analysis. A further breakdown by industry (also feasible using the public-use QWI) might reveal that the lower recall rate of women is a phenomenon specific to certain industries that employ a higher fraction of women for other reasons. However, it is an example that serves to highlight the utility of the demographic, geographic, and industry breakdown that is possible with the QWI. Disaggregating statistics by geography is one of the more common strategies for policy analysts, and several data sources are available to perform such an analysis. With the QWI, geographic analysis can be extended to distinct demographic groups. In figure 5.7, the geographic distribution of job creation is plotted for young workers nineteen to twenty-one years of age, by counties in Illinois. A policy analyst could perform such an analysis for eight age groups and both sexes, using the complete QWI. Note that net job creation is computed with both the numerator and the denominator computed for workers aged nineteen to twenty-one, so it is not simply

Fig. 5.5

Proportion of women, select variables, Illinois

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

Fig. 5.6

191

Proportion of women, select variables, Montana

a decomposition of an aggregate job creation statistics; it is computed from the ground up using data only on workers between nineteen and twentyone years of age. 5.7.2 Importance of LEHD Adjustments to Raw Data There are numerous data edits, corrections, and imputations performed in the processing of the QWI data series. We summarize the effect of some of these adjustments here. Choosing Between LDB and LEHD Coding of NAICS Variables As noted in section 5.3.4, the ECF provides enhanced NAICS variables that expand on the information available on the ES-202 files. Information is imputed based on all available industry information, and backcoded to time periods that precede the introduction and widespread implementation of NAICS coding on ES-202 data. The creation of the enhanced NAICS variables was described in section 5.3.4. In this section, we present a summary of research done on a comparison of the ESO (ES-202 only) and FNL (final) NAICS codes on the Illinois ECF. The imputation algorithm used by the BLS to create the LDB stably backfills NAICS codes once it has imputed a code for a later year; that is, once an establishment has received a backcoded NAICS, that code is used for all prior years of data for the establishment. The LEHD algorithm allows the backcoded NAICS to change if the contemporaneously coded SIC changes. Thus, we expect the two backcodes to have different statistical properties for historical NAICS-based QWI. Although some of the SIC changes over time may be spurious, a SEINUNIT’s SIC code history

Fig. 5.7

Job creation for young workers, by county, Illinois

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

193

contains valuable information that we have attempted to preserve in the LEHD imputation algorithm. Overall, the effect of the different approaches is relatively small, since very few SEINUNITs change industry, in particular relative to the proportion of SEINUNITs that change geography. The LDB-sourced NAICS variable is used for about 85 percent of the records for Illinois; the rest are filled with information from the ES-202. It is unclear why only 85 percent of ES-202 records are in the LDB. The results weighted by employment are about the same, suggesting that activity was not a criterion for being included on the LDB. First and not surprisingly, in later years and quarters (1999 ) when NAICS is actively coded by the states, the ESO and FNL codes look almost identical when available. Second, there is little variation in the LDB NAICS codes over time compared with SIC. Among all of the active SEINSEINUNITs over the period covered by the Illinois data, only slightly more than 8 percent experience at least one SIC change, compared with about 1.5 percent on the LDB. Almost all NAICS code changes occur after 1999. While this is not entirely unexpected, it is something to keep in mind when comparing NAICS FNL versus SIC or NAICS ESO employment totals. Many of these changes in industry appear to be real and are not captured on the LDB. As we go back in time, a larger portion of employment can be found in NAICS FNL codes that are different from what one would expect given the SIC code on the ECF. For example, in 1990 about 13 percent of employment is in a NAICS FNL code that is different from what we would expect based on the SIC. By 2001, the proportion of employment that is in a NAICS code outside of the set of possible values predicted by the SICNAICS crosswalk falls to 3 percent. The ES-202 based NAICS variable does a better job tracking SIC, since more SIC information is used in putting it together. The main source of the discrepancy is due to entities that experience a change in their SIC code prior to 2000. The LDB appears to ignore this change, while the ES-202 based NAICS variable uses an SIC-based imputation for these SEINUNITs. The result is a series that exhibits similar patterns of change over time as SIC, while still preserving the value added in the NAICS codes for entities that did not experience a change. Users should keep in mind that for early years (before 1997) some of the NAICS industries have yet to come into existence. The prevalence of this problem has not yet been investigated. Correcting for Coding Errors in Personal Identifiers Abowd and Vilhuber (2005) describe and analyze the method used at LEHD to identify coding errors in the person identifier (Social Security Number [SSN]), and provide an analysis of the impact that correcting for

194

J. Abowd et al.

such errors has on statistics generated from the corrected and the uncorrected data for one state (California). A simplified version of the same analysis is used as a quality assurance method during the wage record edit, and the results are similar for other states, but vary with length of available data and with prior state processing of name and SSN fields. For California, the process verified over half a billion records. Slightly less than 10 percent of the total number of unique individuals appear in the original data, and only a little more than 0.5 percent of all wage records require some corrective measures, which is considered conservative relative to other analyses done (see Abowd and Vilhuber [2005] for further references). Table 5.6 presents patterns of job histories for uncorrected and corrected data. The unit of observation is a worker-employer match (a job), potentially interrupted. For each such observation, the longest interruption is tabulated if there is one. If no interruption was observed during the worker’s tenure with the employer, then the type of continuous job spell is tabulated. By definition, the absence of a hole implies continuous tenure, but that spell may have been ongoing in the first (left-truncated) or last (right-truncated) quarter of the data, or in both (entire period). If the spell was continuous, with both the beginning and the end of the job spell observed within the data, then the default code of C is assigned.

Table 5.6

Wage record edit: Comparing job histories before and after editing process Original data

Pattern in job history

1 quarter 2 quarters 3 quarters 4 quarters 5 quarters 6 quarters 7 quarters 8 quarters 9 or more quarters C Continuous F Entire period L Left-truncated R Right-truncated

Frequency

Edited data

Percent (%)

Frequency

Percent (%)

Noncontinuous, length of longest interruption 5,315,869 5.50 4,710,673 4.87 2,357,942 2.44 2,359,374 2.44 1,764,701 1.83 1,755,814 1.82 750,910 0.78 747,707 0.77 532,174 0.55 529,777 0.55 466,301 0.48 463,878 0.48 430,549 0.45 429,179 0.44 241,573 0.25 240,214 0.25 1,172,039 1.21 1,163,420 1.20

Change Frequency

Percent (%)

–605,196 1,432 –8,887 –3,203 –2,397 –2,423 –1,370 –1,359 –8,619

–11.38 0.06 –0.50 –0.42 –0.45 –0.51 –0.31 –0.56 –0.73

59,990,419 1,735,340 9,871,084 12,001,245

62.08 1.80 10.22 12.42

Continuous 60,311,626 1,807,775 10,032,149 12,144,959

62.37 1.87 10.37 12.56

321,207 72,435 161,065 143,714

0.53 4.17 1.63 1.19

96,630,146

100.00

96,696,545

100.00

66,399

0.06

Notes: From table 6, Abowd and Vilhuber (2005). For definitions of job history patterns, see text.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

195

Over 800,000 job history interruptions in the original data are eliminated by the corrections, representing 0.9 percent of all jobs, but 11 percent of all interrupted jobs (Table 5.6). Despite the small number of records that are found to be miscoded, the impact on flow statistics can be large. Accessions in the uncorrected data are overestimated by 2 percent, and recalls are biased upwards by nearly 6 percent. On the other hand, and as expected, overall payroll W1 are not biased, but payroll for accessions (WA) and separations (WS) are biased upward by up to 7 percent (Table 5.7). Identification of Successor-Predecessor Links of Firms and Establishments As noted in section 5.4.1, care is taken when tracking firms and establishments over time by tracking worker movements between firms. These corrections should have little or no impact on the time series of pure stock

Table 5.7

Distribution of percentage bias in aggregate QWI statistics All age groups, both sexes, SEIN-level micro data

Variable (bias) A

B

F

R

S

W1

WA

WS

Unit

Mean (%)

Std (%)

N

Firm County Industry Firm County Industry Firm County Industry Firm County Industry Firm County Industry Firm County Industry Firm County Industry Firm County Industry

2.17 1.56 1.97 –0.74 –0.46 –0.31 –1.23 –0.78 –0.53 4.71 5.26 5.95 2.31 1.66 2.01 –0.01 –0.01 0.04 15.57 4.92 3.95 18.77 4.87 3.64

13.98 1.01 2.29 6.14 0.31 0.31 8.05 0.36 0.31 26.86 3.61 3.49 14.29 1.11 2.08 4.96 0.15 0.35 1111.78 3.34 4.94 1094.50 3.17 4.48

11,755,355 2,006 374 20,717,508 1,947 363 18,454,708 1,888 352 3,242,186 1,888 352 11,161,916 1,947 363 23,229,843 2,006 374 11,755,355 2,006 374 11,161,916 1,947 363

P10 (%)

P50 (%)

P90 (%)

0.62 0.51

1.42 1.47

2.64 3.40

–0.75 –0.59

–0.45 –0.34

–0.25 –0.14

–1.21 –0.90

–0.74 –0.53

–0.43 –0.24

1.70 1.93

4.59 5.46

9.18 10.29

0.67 0.63

1.46 1.53

2.72 3.41

–0.05 –0.04

–0.02 –0.02

0.00 0.08

1.89 0.77

4.38 3.35

8.44 6.79

2.02 1.00

4.31 3.18

8.06 5.71

Note: From table 9, Abowd and Vilhuber (2005). There are 23,232,068 firm-quarter cells, 2006 county-quarter cells, and 374 industry-quarter cells. Percentiles for firm-quarter cells are all zero and not reported for simplification.

196

J. Abowd et al.

measures (total wage bill W1), but should influence a number of flow measures. In particular, separations (S) and accessions (A) will be reduced when between-firm (successor-predecessor) links are identified. A small experiment was run using the standard processing stream for the QWI for a single state. Transitions associated with observed successorpredecessor flows as identified by the SPF, which are normally suppressed, were left intact. In other words, the SPF was removed from the processing stream. Comparing the resultant (unreleased) QWI with published QWI from the same time period provides an estimate of the bias due to firm links that unadjusted QWI would otherwise have. The suppression of flows due to successor-predecessor links also affects B, beginning-of-quarter employment, which in turn is used to weight the QWI (section 5.5.3). Thus, all statistics will be affected, either directly through the statistic itself, or indirectly through a change in the weights. Analysis performed on Montana reveals that earnings and separations are 4 percent lower if successor-predecessor transitions are filtered out. Beginning-of-period employment estimates are 0.4 percent lower. For more results, consult Benedetto et al. (2007), who have used the successor-predecessor flows in the analysis of the firm. Analytical Validity of the Unit-to-Worker Imputation This subsection presents some results of the assessment of the analytical validity of the unit-to-worker imputation process (section 5.4.2). For five QWI measures—beginning-of-quarter employment (B), full-quarter employment (F ), accessions (A), separations (S), and total payroll (W1)—percentiles of the distribution of the bias induced by the imputation process for two levels of industry aggregation are presented. A complete evaluation of the validity of the unit-to-worker imputation process is provided in Stephens (2006, chapter on Imputation of Place-of-Work in the Quarterly Workforce Indicators). To assess the analytic validity of the imputation process, two sets of QWI measures for the 1994:1 to 2003:4 time period were generated using the Minnesota data. The first set, True, is produced using the establishment of work reported on the Minnesota UI wage records. The second set, Imputed, is generated treating the establishment of work as unknown; thus, Imputed is generated using the same imputation process that is applied to other states in the QWI system. Measures for both sets were tabulated using data for all establishments in Minnesota and produced for two levels of industry aggregation—SIC Division and two-digit SIC—as well as by county, sex, and age. For each measure, the discrepancies between values X, prior to the application of multiplicative noise factors, contained in Imputed and True for each interior quarter industry county age sex cell are calculated as:

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

197

XImputed XTrue bias . XTrue Table 5.8 presents percentiles of both the weighted and unweighted distribution of the bias statistic for five QWI measures across interior data cells for SIC Division and two-digit SIC industry aggregations. For all distributions, the median discrepancy never exceeds 0.005 in absolute value, suggesting that on average the bias induced by the imputation of place of work is relatively insignificant. For all bias statistics, the unweighted distribution is tighter than the weighted distribution, illustrating that the bias is less severe in data cells with higher levels of employment. This is expected, as fewer establishments and workers contribute data to cells with relatively low levels of employment, and the lower are the number of establishments and workers contributing data, the more detectable are outliers that emerge from the imputation process. Also expected is the relative tightness of the distributions of the bias when comparing across levels of industry aggregation. The SIC Division level distributions of the bias are tighter than the two-digit SIC distributions, as more establishments and workers contribute more data to each SIC Division level cell. The tightening distributions are clear when examining the 90-10 differential. For B at the SIC Division level, for example, the spread between the 90th and 10th percentile falls by 0.19 when the distribution of the bias is weighted. It is also clear that the spread between the 90th and 10th percentiles is smaller for the SIC Division level as compared to the two-digit SIC level of aggregation. Time-Series Properties of Disclosure Avoidance System The disclosure avoidance algorithm described in section 5.6 has the dual goals of preserving confidentiality and maintaining a high level of analytical validity of the public-use data. This section draws on Abowd, Stephens, and Vilhuber (2006), who provide an in-depth analysis of the extent of disclosure protection and the degree to which analytical validity is maintained. The analysis presented in this subsection focuses on the time series properties of the published QWI, after noise-infusion and suppressions. Abowd, Stephens, and Vilhuber (2006) also show the cross-sectional unbiasedness of the published data. In each case, data from two states (Illinois and Maryland) were used. The unit of analysis is an interior substate geography industry age sex cell kt. Substate geography in all cases is a county, whereas the industry classification is SIC. Analytical validity is obtained when the data display no bias and the additional dispersion due to the confidentiality protection system can be quantified so that statistical inferences can be adjusted to accommodate it. To analyze the impact on the time series properties of the distorted data,

0.25999 0.04425 –0.00004 –0.00232 –0.17605 0.43603

90 75 50 25 10 P90–P10

0.13802 0.03965 –0.00010 –0.04169 –0.13951 0.27752

0.09165 0.03697 0.00415 –0.02232 –0.08075 0.17239

Weighted

0.25257 0.04218 –0.00004 –0.00232 –0.18858 0.44114

0.21274 0.07287 0.00000 –0.03696 –0.15879 0.37152

Unweighted

0.13494 0.04002 –0.00005 –0.04061 –0.13865 0.27359

0.09021 0.03709 0.00482 –0.02159 –0.08020 0.17040

Weighted

Full-quarter employment Weighted

0.14072 0.04312 –0.00449 –0.04274 –0.11827 0.25898 0.17390 0.03683 –0.00111 –0.07000 –0.20784 0.38173

Unweighted SIC Division 0.27966 0.09597 0.00001 –0.01610 –0.17476 0.45441 2-Digit SIC 0.30875 0.05200 –0.00004 –0.00075 –0.20977 0.51851

Accessions

0.29985 0.04938 –0.00004 –0.00075 –0.22191 0.52175

0.26821 0.09332 0.00000 –0.02105 –0.18000 0.44820

Unweighted

0.17117 0.03622 –0.00075 –0.06709 –0.19383 0.36500

0.13265 0.04043 –0.00414 –0.04015 –0.10717 0.23981

Weighted

Separations

0.36038 0.05296 –0.00002 –0.00127 –0.16911 0.52948

0.28185 0.09184 0.00262 –0.02998 –0.16044 0.44229

Unweighted

0.16118 0.04267 –0.00004 –0.04034 –0.14427 0.30544

0.11456 0.04159 0.00405 –0.02219 –0.08178 0.19633

Weighted

Total payroll

Notes: Estimated using Minnesota data. A statistic XTrue is calculated using job histories coded to the SEIN/SEINUNIT level; a statistic XImputed is calculated using job histories coded to the SEIN level with imputed SEINUNITs. See text for further details.

0.21484 0.07426 0.00002 –0.03475 –0.15197 0.36680

Unweighted

Beginning-of-period employment

XImputed – XTrue Bias = XTrue

Distribution of proportional bias in unit-to-worker imputation

90 75 50 25 10 P90–P10

Table 5.8

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators Table 5.9

199

Distribution of the error in first order serial correlation of QWI Δr = r – r*

Percentile

Beginning-of-quarter employment

Accessions

Separations

Full-quarter employment

Net job flows

01 05 10 25 50 75 90 95 99

–0.085495 –0.047704 –0.034558 –0.015317 –0.000512 0.013438 0.030963 0.044796 0.080282

IL County × SIC Division –0.092455 –0.098770 –0.046665 –0.045208 –0.031767 –0.032898 –0.014197 –0.015077 –0.000997 –0.000707 0.011536 0.012457 0.027037 0.028835 0.037906 0.041862 0.079122 0.083824

–0.079205 –0.046830 –0.033607 –0.015533 –0.001000 0.011670 0.027970 0.040096 0.077419

–0.008447 –0.004959 –0.003186 –0.001189 –0.000049 0.000861 0.002489 0.004801 0.007537

01 05 10 25 50 75 90 95 99

–0.065342 –0.035974 –0.024174 –0.010393 0.000230 0.011382 0.025160 0.035176 0.060042

MD County × SIC Division –0.072899 –0.072959 –0.036995 –0.040314 –0.027689 –0.028577 –0.013686 –0.012505 –0.000542 0.000797 0.012628 0.013034 0.026325 0.025272 0.034114 0.034999 0.056477 0.055043

–0.058021 –0.030985 –0.021361 –0.009401 0.000279 0.009429 0.022027 0.030152 0.049213

–0.009081 –0.004540 –0.002823 –0.001243 –0.000025 0.001045 0.002799 0.004321 0.009208

Notes: Estimated from undistorted (r) and published data (r*). Unit of observation is a county × SIC division × age-group × sex cell for all private employment, interior cells only. For more details, see text and Abowd, Stephens, and Vilhuber (2006).

we estimated an AR(1) for the time series associated with each cell kt, using county-level data for all counties in each state. Two AR(1) coefficients are estimated for each cell time series. The first order serial correlation coefficient computed using undistorted data is denoted by r. The estimate computed using the distorted data is denoted by r∗. For each cell, the error r r – r∗ is computed. Table 5.9 shows the distribution of the errors r across SIC-division county cells, for B, A, S, F, and JF when comparing raw (confidential) data to published data, which excludes suppressed data items. The table shows that the time series properties of all variables analyzed remain largely unaffected by the distortion. The maximum bias (as measured by the median of this distribution) is never greater than 0.001. The error distribution is tight; the semi-interquartile range of the distortion for B in Maryland is 0.010, which is less than the precision with which estimated serial correlation coefficients are normally displayed. The maximum semiinterquartile range for any variable in either of the two states is 0.012.26 26. Abowd, Stephens, and Vilhuber (2006) also report that the maximum semi-interquartile range for SIC2-based variables is 0.0241, and for SIC3-based variables, 0.0244.

200

J. Abowd et al.

Although the overall spread of the distribution is slightly higher when considering two-digit SIC county and three-digit SIC county cells, which are sparser than the SIC-division county cells, the general results hold in these cases as well (Abowd, Stephens, and Vilhuber 2006, tables 7 and 8). Abowd, Stephens, and Vilhuber (2006) thus conclude that the time series properties of the QWI data are unbiased with very little additional noise, which is, in general, economically meaningless. 5.8 Public-Use and Restricted-Access Files In this section, we briefly describe the public-use release files and those files available at Census Research Data Centers. We focus on how these files differ from the corresponding internal files discussed in the rest of the article. 5.8.1 Public-Use Files Three public-use products, fully or partially based on QWI data, are currently available on a regular basis: the QWI distribution files, the Older Worker Reports, and OnTheMap.27 A subset of eight variables from the full public-use release is available at QWI Online (http://lehd.did .census.gov/). Additional variables are used in other applications accessible from the same Census Bureau web site. The complete set of QWI public-use variables is available from the Cornell Virtual Research Data Center (VirtualRDC) as of January 2008. The VirtualRDC is partially funded by grants from the National Science Foundation (NSF) and the National Institute on Aging. Computing resources to manipulate complete QWI are also available on the VirtualRDC for qualified researchers (http://www.vrdc.cornell.edu/). Other distribution options for the full QWI data may be available when this volume appears. Up-to-date information on all access options is posted at http://lehd.did.census.gov/. The public-use QWI data differ from the Census-internal version because the public-use version has been subjected to the disclosure avoidance methods described in section 5.6. In order to preserve the integrity of these disclosure avoidance algorithms, all special tabulations released from the QWI must follow the same procedures. 5.8.2 Restricted-Access Files A larger set of files are available within the protected environment provided by the Census Research Data Centers (RDCs). The only information removed from RDC versions of QWI and LEHD Infrastructure files rela27. The Older Worker Reports are based entirely on the QWI public use files. OnTheMap uses the QWI micro data to produce a QWI report for the user-defined geographic area.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

201

tive to their internal-use counterparts is the information specifically used to do confidentiality protection of the QWI—the fuzz factors and the fuzzed data items. All of RDC-accessible LEHD files can be used for research purposes by submitting a research proposal to the Center for Economic Studies (CES) at the U.S. Census Bureau.28 ECF The version of the ECF available in the RDC environment is called the LEHD-ECF on the CES RDC documentation. It is identical to the one described in section 5.3.4 except for the removal of the QWI fuzz factors. Unit Flow Files: Establishment-Level QWI Data The SEINUNIT-level input files to the final aggregation step of the QWI, internally known as UFF_B, are available in the RDC environment under the name LEHD-QWI. These files are identical to final establishment level flow files documented in section 5.5 except that they contain only the unfuzzed raw aggregates. Establishment Crosswalks: Business Register Bridge The Census Bureau maintains lists of establishments to develop the frames for economic censuses and surveys. These lists are called the Employer and Nonemployer Business Registers (BR). The research version of the Employer BR is maintained by CES, which produces a new set of files annually. The BR contains very reliable information on business identifiers, business organizational structure, and business location. Unfortunately, the establishment identification system for the Business Register differs from the LEHD establishment identifier (SEIN/SEINUNIT). As a consequence, there is no single best way to form linkages between these data sources. The LEHD Business Register Bridge (LEHD-BRB) that is available in the RDC network provides several ways to integrate the economic censuses and surveys with LEHD-provided data. The choice of linking strategy is left to researchers, who must determine the best definition of an entity on both side of the linkage, considering data sources and the stated research objectives. Available identifiers on the LEHD-BRB that are common to both the LEHD Infrastructure Files and the BR are the EIN, geographic information, and four-digit SIC. These variables may be used to construct pseudo-establishments that are aggregates of SEIN/SEINUNIT establishments at different levels of aggregation. These identifiers can also be linked to sets of ALPHA/CFN establishments on the BR and other Census economic data products. A more detailed guide is available on the CES or LEHD web site (Chiang, Sandusky, and Vilhuber 2005). 28. Available at http://www.ces.census.gov

202

J. Abowd et al.

Household and Establishment Geocoding: GAL The GAL (Geocoded Address List) described in section 5.3.3, is available in the RDC environment under the reference LEHD-GAL. Access to the GAL is predicated on the project having permission to use business or residential address information from other RDC-available source files. Once that permission has been properly established, the researcher is granted access to the GAL for the purpose of obtaining a consistent set of geocodes. Wage Decomposition Data: Human Capital Files These files will contain employer-level distributions of human capital measures as initially developed in Abowd, Lengermann, and McKinney (2002). They are expected to become available in 2009. Remaining LEHD Infrastructure Files The remaining LEHD Infrastructure Files outlined in this chapter are available in the RDC environment as of the time of publication of this volume. In general, unless explicitly mentioned above, these files are provided to researchers as-is, and are subject to the same Title 13 use restrictions as all other data on the RDC network. The LEHD Infrastructure data are also subject to usage restriction in the MOU that governs the Census Bureau and state participation in the LED partnership. The most important of those restrictions is the one that requires the written consent of the state’s signatory official on the MOU before state-specific results based on the LEHD Infrastructure Files may be released. Results may be released from analyses performed on multiple states. For up-to-date details, researchers should contact CES directly. 5.9 Concluding Remarks 5.9.1 Future Projects This section describes some of the ongoing efforts to improve the LEHD Infrastructure Files. Planned Improvements to the ICF Currently, researchers at LEHD are developing an enhanced, longitudinal version of the ICF, internally named ICF version 4 because the current system was the third version of most of the infrastructure files. The improved ICF is the first national LEHD Infrastructure File system. Individuals appearing in any state, including those that have not yet joined the LED federal/state partnership have their ICF data on a set of annual records. Additional data sources will be integrated with this enhanced version of

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

203

the ICF using direct links. The statistical link to the 1990 Decennial Census will be replaced by a direct link to the 2000 Decennial Census, and additional links to the ACS will be incorporated. The existing education imputation will greatly benefit from this enhancement. The additional links, as well as improved links to currently integrated data, will also allow for additional time-invariant characteristics to be incorporated and completed, including information on race and ethnicity and additional time-varying characteristics such as Temporary Assistance for Needy Families (TANF) recipiency. Longitudinal residence information will be appended to the ICF based on the information available from the StARS. Where appropriate, residence will be imputed based on a change in residence imputation model and Bayesian methods for imputing geography at the block level, replacing the current residential address missing data imputation model. In fact, all imputation models will be based on the most up-to-date imputation engines developed at LEHD. Planned Improvements to the EHF The UI wage records in several states suffer from defects in the historical records. These defects can be detected automatically when they produce a big enough fluctuation in certain flow statistics, typically beginning of period employment as compared to total flow employment. Algorithms have been developed to detect the probable existence of missing wage records using the posterior predictive distribution of employment histories given the available data and an informative prior on certain patterns. Once detected, the missing wage records are imputed, again using appropriate Bayesian methods. The same imputation engines are also being used to impute top-coded UI wages. These improvements are in the testing stage and should be implemented within the next year. Planned Improvements to the ECF Two major enhancements to the ECF are in development. The first is a probabilistic record link to the Census Bureau’s Business Register in order to improve the physical addresses on the ECF. This enhancement is currently in the testing phase. The second major enhancement, which impacts not just the ECF, is the expansion of coverage to include entities so far not covered by the LEHD Infrastructure. Integration of Data from Missing Parts of the Universe Nonemployer data. The job universe currently used by all LEHD Infrastructure Files is legal employment with an employer that has mandatory reporting to the state UI wage record system. Nonemployer businesses are out-of-scope for this universe but are of intrinsic interest in the economic analysis of sources of labor income. In addition, the income to the sole pro-

204

J. Abowd et al.

prietor of an employer business is of interest as a source of labor income. The LEHD Program and CES are collaborating in developing enhancements to the Business Register to account for nonemployer income sources and to better track sole proprietor employers. The nonemployer enhancements will also affect the LEHD Infrastructure Files because the information on the identity of the nonemployer, the identity of the nonemployer business, and the income from the nonemployer business provides a job record for this activity, which can then be integrated with the EHF, ICF, and ECF file systems. Federal government employment. The LEHD program has completed an MOU with the Office of Personnel Management in the federal government to obtain historical and ongoing information from the OPM databases that permits construction of LEHD Infrastructure File system records that correspond to job histories for federal employees in the EHF and employer-establishment records in the ECF. Records already exist for these individuals in the new ICF. Creation of Public-Use Synthetic Data As a part of a National Science Foundation Information Technology Research grant (SES-0427889), awarded to a consortium of Census Research Data Centers, researchers at LEHD and other parts of Census are collaborating with social scientists and statisticians working in the RDCs to create and validate synthetic micro data from the LEHD Infrastructure Files. Such synthetic micro data will be confidentiality protected so that they may be released for public use. They will also be inference valid—permitting the estimation of some statistical models with results comparable to those obtained on the confidential micro data. The First Twenty-First Century Statistical System The goal of the development of the Quarterly Workforce Indicators was to create a twenty-first century statistical system. Without increasing respondent burden, the LEHD infrastructure permits the creation of extremely detailed estimates that, for the first time in the United States, provide integrated demographic and economic information about the local labor market. The same techniques will work for other areas of interest— transportation dynamics and welfare-to-work dynamics, to name just two examples. The two essential features of twenty-first century statistical systems will be their heavy reliance on existing data instruments (surveys, censuses, and administrative records that are already in production) and their extensive use of data-intensive statistical modeling to enhance and summarize this information. In these regards, we think the LEHD infrastructure and the QWI system are worthy pioneers.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

205

Appendix Definitions of Fundamental LEHD Concepts A.1 A.1.1

Fundamental Concepts Dates

The QWI are a quarterly data system with calendar year timing. We use the notation yyyy:q to refer to a year and quarter combination. For example, 1999:4 refers to the fourth quarter of 1999, which includes the months October, November, and December. A.1.2

Employer

An employer in the QWI system consists of a single Unemployment Insurance (UI) account in a given state’s UI wage reporting system. For statistical purposes, the QWI system creates an employer identifier called a State Employer Identification Number (SEIN) recoded from the UI account number and information about the state (FIPS code). Thus, within the QWI system, the SEIN is a unique identifier within and across states but the entity to which it refers is a UI account. A.1.3

Establishment

For a given employer in the QWI system, a SEIN, each physical location within the state is assigned a unit number, called the SEINUNIT. This SEINUNIT is recoded from the reporting unit in the ES-202 files supplied by the states. All QWI statistics are produced by aggregating statistics calculated at the establishment level. Single-unit SEINs are UI accounts associated with a single reporting unit in the state. Thus, single-unit SEINs have only one associated SEINUNIT in every quarter. Multi-unit SEINs have two or more SEINUNITs associated for some quarters. Since the UI wage records are not coded down to the SEINUNIT, SEINUNITs are multiply imputed as described in section 5.4.2 on the unit-to-worker imputation. A feature of this imputation system is that it does not permit SEINUNIT to SEINUNIT movements within the same SEIN. Thus, for multi-unit SEINs, the following definitions produce the same flow estimates at the SEIN level whether the definition is applied to the SEIN or the SEINUNIT. A.1.4

Employee

Individual employees are identified by their Social Security Numbers (SSN) on the UI wage records that provide the input to the QWI. To protect the privacy of the SSN and the individual’s name, a different branch of the Census Bureau removes the name and replaces the SSN with an internal Census identifier called a Protected Identification Key (PIK).

206

J. Abowd et al.

A.1.5 Job The QWI system definition of a job is the association of an individual (PIK) with an establishment (SEINUNIT) in a given year and quarter. The QWI system stores the entire history of every job that an individual holds. Estimates are based on the following definitions, which formalize how the QWI system estimates the start of a job (accession), employment status (beginning- and end-of-quarter employment), continuous employment (full-quarter employment), the end of a job (separation), and average earnings for different groups. A.1.6 Unemployment Insurance Wage Records (the QWI System Universe) The Quarterly Workforce Indicators are built upon concepts that begin with the report of an individual’s UI-covered earnings by an employing entity (SEIN). An individual’s UI wage record enters the QWI system if at least one employer reports earnings of at least one dollar for that individual (PIK) during the quarter. Thus, the job must produce at least one dollar of UI-covered earnings during a given quarter to count in the QWI system. The presence of this valid UI wage record in the QWI system triggers the beginning of calculations that estimate whether that individual was employed at the beginning of the quarter, at the end of the quarter, and continuously throughout the quarter. These designations are discussed later. Once these point-in-time employment measures have been estimated for the individual, further analysis of the individual’s wage records results in estimates of full-quarter employment, accessions, separations (point-intime and full-quarter), job creations and destructions, and a variety of fullquarter average earnings measures. A.1.7

Employment at a Point in Time

Employment is estimated at two points in time during the quarter, corresponding to the first and last calendar days. An individual is defined as employed at the beginning of the quarter when that individual has valid UI wage records for the current quarter and the preceding quarter. Both records must apply to the same employer (SEIN). An individual is defined as employed at the end of the quarter when that individual has valid UI wage records for the current quarter and the subsequent quarter. Again, both records must show the same employer. The QWI system uses beginning and end-of-quarter employment as the basis for constructing worker and job flows. In addition, these measures are used to check the external consistency of the data, since a variety of employment estimates are available as point-in-time measures. Many federal statistics are based upon estimates of employment as of the twelfth day of particular months. The Census Bureau uses March 12 as the reference date for employment mea-

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

207

sures contained in its Business Register and on the Economic Censuses and Surveys. The BLS “Covered Employment and Wages (CEW)” series, which is based on the QCEW—formerly ES-202—data, use the twelfth of each month as the reference date for employment. The QWI system cannot use exactly the same reference date as these other systems because UI wage reports do not specify additional detail regarding the timing of these payments. The LEHD research has shown that the point-in-time definitions used to estimate beginning and end-of-quarter employment track the CEW month-one employment estimates well at the level of an employer (SEIN). For single-unit SEINs, there is no difference between an employer-based definition and an establishment-based definition of point-in-time employment. For multi-unit SEINs, the unit-to-worker imputation model assumes that unit-to-unit transitions within the same SEIN cannot occur. Therefore, point-in-time employment defined at either the SEIN or SEINUNIT level produces the same result. A.1.8

Employment for a Full Quarter

The concept of full-quarter employment estimates individuals who are likely to have been continuously employed throughout the quarter at a given employer. An individual is defined as full-quarter employed if that individual has valid UI wage records in the current quarter, the preceding quarter, and the subsequent quarter at the same employer (SEIN). That is, in terms of the point-in-time definitions, if the individual is employed at the same employer at both the beginning and end of the quarter, then the individual is considered full-quarter employed in the QWI system. Consider the following example. Suppose that an individual has valid UI wage records at employer A in 1999:2, 1999:3, and 1999:4. This individual does not have a valid UI wage record at employer A in 1999:1 or 2000:1. Then, according to the previous definitions, the individual is employed at the end of 1999:2, the beginning and end of 1999:3, and the beginning of 1999:4 at employer A. The QWI system treats this individual as a fullquarter employee in 1999:3 but not in 1999:2 or 1999:4. Full-quarter status is not defined for either the first or last quarter of available data. A.1.9

Point-in-Time Estimates of Accession and Separation

An accession occurs in the QWI system when it encounters the first valid UI wage record for a job (an individual [PIK]-employer [SEIN] pair). Accessions are not defined for the first quarter of available data from a given state. The QWI definition of an accession can be interpreted as an estimate of the number of new employees added to the payroll of the employer (SEIN) during the quarter. The individuals who acceded to a particular employer were not employed by that employer during the previous quarter, but received at least one dollar of UI-covered earnings during the quarter of accession.

208

J. Abowd et al.

A separation occurs in the current quarter of the QWI system when it encounters no valid UI wage record for an individual-employer pair in the subsequent quarter. This definition of separation can be interpreted as an estimate of the number of employees who left the employer during the current quarter. These individuals received UI-covered earnings during the current quarter but did not receive any UI-covered earnings in the next quarter from this employer. Separations are not defined for the last quarter of available data. A.1.10

Accession and Separation from Full-Quarter Employment

Full-quarter employment is not a point-in-time concept. Full-quarter accession refers to the quarter in which an individual first attains full-quarter employment status at a given employer. Full-quarter separation occurs in the last full-quarter that an individual worked for a given employer. As previously noted, full-quarter employment refers to an estimate of the number of employees who were employed at a given employer during the entire quarter. An accession to full-quarter employment, then, involves two additional conditions that are not relevant for ordinary accessions. First, the individual (PIK) must still be employed at the end of the quarter at the same employer (SEIN) for which the ordinary accession is defined. At this point (the end of the quarter where the accession occurred and the beginning of the next quarter) the individual has acceded to continuingquarter status. An accession to continuing-quarter status means that the individual acceded in the current quarter and is end-of-quarter employed. Next, the QWI system must check for the possibility that the individual becomes a full-quarter employee in the subsequent quarter. An accession to full-quarter status occurs if the individual acceded in the previous quarter, and is employed at both the beginning and end of the current quarter. Consider the following example. An individual’s first valid UI wage record with employer A occurs in 1999:2. Thus, the individual acceded in 1999:2. The same individual has a valid wage record with employer A in 1999:3. The QWI system treats this individual as end-of-quarter employed in 1999:2 and beginning-of-quarter employed in 1999:3. Thus, the individual acceded to continuing-quarter status in 1999:2. If the individual also has a valid UI wage record at employer A in 1999:4, then the individual is fullquarter employed in 1999:3. Since 1999:3 is the first quarter of full-quarter employment, the QWI system considers this individual an accession to full-quarter employment in 1999:3. Full-quarter separation works much the same way. One must be careful about the timing, however. If an individual separates in the current quarter, then the QWI system looks at the preceding quarter to determine if the individual was employed at the beginning of the current quarter. An individual who separates in a quarter in which that person was employed at the beginning of the quarter is a separation from continuing-quarter status in

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

209

the current quarter. Finally, the QWI system checks to see if the individual was a full-quarter employee in the preceding quarter. An individual who was a full quarter employee in the previous quarter is treated as a fullquarter separation in the quarter in which that person actually separates. Note, therefore, that the definition of full-quarter separation preserves the timing of the actual separation (current quarter) but restricts the estimate to those individuals who were full-quarter status in the preceding quarter. For example, suppose that an individual separates from employer A in 1999:3. This means that the individual had a valid UI wage record at employer A in 1999:3 but did not have a valid UI wage record at employer A in 1999:4. The separation is dated 1999:3. Suppose that the individual had a valid UI wage record at employer A in 1999:2. Then, a separation from continuing quarter status occurred in 1999:3. Finally, suppose that this individual had a valid UI wage record at employer A in 1999:1. Then, this individual was a full-quarter employee at employer A in 1999:2. The QWI system records a full-quarter separation in 1999:3. A.1.11

Point-in-Time Estimates of New Hires and Recalls

The QWI system refines the concept of accession into two subcategories: new hires and recalls. In order to do this, the QWI system looks at a full year of wage record history prior to the quarter in which an accession occurs. If there are no valid wage records for this job (PIK-SEIN) during the four quarters preceding an accession, then the accession is called a new hire; otherwise, the accession is called a recall. Thus, new hires and recalls sum to accessions. For example, suppose that an individual accedes to employer A in 1999:3. Recall that this means that there is a valid UI wage record for the individual 1 at employer A in 1999:3 but not in 1999:2. If there are also no valid UI wage records for individual 1 at employer A for 1999:1, 1998:4, and 1998:3, then the QWI system designates this accession as a new hire of individual 1 by employer A in 1999:3. Consider a second example in which individual 2 accedes to employer B in 2000:2. Once again, the accession implies that there is not a valid wage record for individual 2 at employer B in 2000:1. If there is a valid wage record for individual 2 at employer B in 1999:4, 1999:3, or 1999:2, then the QWI system designates the accession of individual 2 to employer B as a recall in 2000:2. New hire and recall data, because they depend upon having four quarters of historical data, only become available one year after the data required to estimate accessions become available. A.1.12

New Hires and Recalls to and from Full-Quarter Employment

Accessions to full-quarter status can also be decomposed into new hires and recalls. The QWI system accomplishes this decomposition by classifying all accessions to full-quarter status who were classified as new hires in

210

J. Abowd et al.

the previous quarter as new hires to full-quarter status in the current quarter. Otherwise, the accession to full-quarter status is classified as a recall to full-quarter status. For example, if individual 1 accedes to full-quarter status at employer A in 1999:4, then, according to the previous definitions, individual 1 acceded to employer A in 1999:3 and reached full-quarter status in 1999:4. Suppose that the accession to employer A in 1999:3 was classified as a new hire; then, the accession to full quarter status in 1999:4 is classified as a full-quarter new hire. For another example, consider individual 2, who accedes to full-quarter status at employer B in 2000:3. Suppose that the accession of individual 2 to employer B in 2000:2, which is implied by the full-quarter accession in 2000:3, was classified by the QWI system as a recall in 2000:2; then, the accession of individual 2 to full-quarter status at employer B in 2000:3 is classified as a recall to full-quarter status. A.1.13

Job Creations and Destructions

Job creations and destructions are defined at the employer (SEIN) level and not at the job (PIK-SEIN) level. For single-unit employers, there is never more than one SEINUNIT per quarter, so the definition at the employer level and the definition at the establishment level are equivalent. For multi-unit employers, the QWI system performs the calculations at the establishment level (SEINUNIT); however, the statistical model for imputing establishment described in section 5.4.2 does not permit establishmentto-establishment flows. Hence, although the statistics are estimated at the establishment level, the sum of job creations and destructions at a given employer in a given quarter across all establishments active that quarter is exactly equal to the measure of job creations that would have been estimated by using employer-level inputs (SEIN) directly. To construct an estimate of job creations and destructions, the QWI system totals beginning and ending employment for each quarter for every employer in the UI wage record universe; that is, for an employer who has at least one valid UI wage record during the quarter. The QWI system actually uses the Davis et al. (1996) formulas for job creation and destruction (see definitions in appendix A.2). Here, we use a simplified definition. If end-of-quarter employment is greater than beginning-of-quarter employment, then the employer has created jobs. The QWI system sets job creations in this case equal to end-of-quarter employment less beginning-ofquarter employment. The estimate of job destructions in this case is zero. On the other hand, if beginning-of-quarter employment exceeds end-ofquarter employment, then this employer has destroyed jobs. The QWI system computes job destructions in this case as beginning-of-period employment less end-of-period employment. The QWI system sets job creations to zero in this case. Notice that either job creations are positive or job destructions are positive, but not both. Job creations and job destructions can simultaneously be zero if beginning-of-quarter employment

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

211

equals end-of-quarter employment. There is an important subtlety regarding job creations and destructions when they are computed for different sex and age groups within the same employer. There can be creation and destruction of jobs for certain demographic groups within the employer without job creation or job destruction occurring overall. That is, jobs can be created for some demographic groups and destroyed for others even at enterprises that have no change in employment as a whole. Here is a simple example. Suppose employer A has 250 employees at the beginning of 2000:3 and 280 employees at the end of 2000:3. Therefore, employer A has 30 job creations and zero job destructions in 2000:3. Now suppose that of the 250 employees, 100 are men and 150 are women at the beginning of 2000:3. At the end of the quarter suppose that there are 135 men and 145 women. Then, job creations for men are 35 and job destructions for men are 0 in 2000:3. For women in 2000:3 job creations are 0 and job destructions are 5. Notice that the sum of job creations for the employer by sex (35 0) is not equal to job creations for the employer as a whole (30) and that the sum of job destructions by sex (0 5) is not equal to job destructions for the employer as a whole. A.1.14

Net Job Flows

Net job flows are also only defined at the level of an employer (SEIN). Once again, the QWI system computes these statistics at the establishment level but does not allow establishment-to-establishment flows. Hence, the estimates for a given employer (SEIN) are the sum of the estimates for that employer’s establishments (SEINUNIT) that are active in the given quarter. Net job flows are the difference between job creations and job destructions. Thus, net job flows are always equal to end-of-quarter employment less beginning-of-quarter employment. If we return to the example in the description of job creations and destructions, employer A has 250 employees at the beginning of 2000:3 and 280 employees at the end of 2000:3. Net job flows are 30 (job creations less job destructions or beginning-of-quarter employment less end-of-quarter employment). Suppose, once again, that employment of men goes from 100 to 135 from the beginning to the end of 2000:3 and employment of women goes from 150 to 145. Notice that net job flows for men (35) plus net job flows for women (–5) equals net job flows for the employer as a whole (30). Net job flows are additive across demographic groups even though gross job flows (creations and destructions) are not. Some useful relations among the worker and job flows include: • Net job flows job creations – job destructions • Net job flows end-of-quarter employment – beginning-of-period employment • Net job flows accessions – separations

212

J. Abowd et al.

These relations hold for every demographic group and for the employer as a whole. Additional identities are shown in the second section of the appendix. A.1.15 Full-Quarter Job Creations, Job Destructions, and Net Job Flows The QWI system applies the same job flow concepts to full-quarter employment to generate estimates of full-quarter job creations, full-quarter job destructions, and full-quarter net job flows. Full-quarter employment in the current quarter is compared to full-quarter employment in the preceding quarter. If full-quarter employment has increased between the preceding quarter and the current quarter, then full-quarter job creations are equal to full-quarter employment in the current quarter less full-quarter employment in the preceding quarter. In this case full-quarter job destructions are zero. If full-quarter employment has decreased between the previous and current quarters, then full-quarter job destructions are equal to full-quarter employment in the preceding quarter minus full-quarter employment in the current quarter. In this case, full-quarter job destructions are zero. Full-quarter net job flows equal full-quarter job creations minus full-quarter job destructions. The same identities that hold for the regular job flow concepts hold for the full-quarter concepts. A.1.16

Average Earnings of End-of-Period Employees

The average earnings of end-of-period employees is estimated by first totaling the UI wage records for all individuals who are end-of-period employees at a given employer in a given quarter. Then, the total is divided by the number of end-of-period employees for that employer and quarter. A.1.17

Average Earnings of Full-Quarter Employees

Measuring earnings using UI wage records in the QWI system presents some interesting challenges. The earnings of end-of-quarter employees who are not present at the beginning of the quarter are the earnings of accessions during the quarter. The QWI system does not provide any information about how much of the quarter such individuals worked. The range of possibilities goes from one day to every day of the quarter. Hence, estimates of the average earnings of such individuals may not be comparable from quarter to quarter unless one assumes that the average accession works the same number of quarters regardless of other conditions in the economy. Similarly, the earnings of beginning-of-quarter workers who are not present at the end of the quarter represent the earnings of separations. These present the same comparison problems as the average earnings of accessions; namely, it is difficult to model the number of weeks worked during the quarter. If we consider only those individuals employed at the employer in a given quarter who were neither accessions nor separations during that quarter, we are left, exactly, with the full-quarter employees.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

213

The QWI system measures the average earnings of full-quarter employees by summing the earnings on the UI wage records of all individuals at a given employer who have full-quarter status in a given quarter, then dividing by the number of full-quarter employees. For example, suppose that in 2000:2 employer A has ten full-quarter employees and that their total earnings are $300,000. Then, the average earnings of the full-quarter employees at A in 2000:2 is $30,000. Suppose, also, that six of these employees are men and that their total earnings are $150,000. So, the average earnings of full-quarter male employees is $25,000 in 2000:2 and the average earnings of female full-quarter employees is $37,500 ( $150,000/4). A.1.18

Average Earnings of Full-Quarter Accessions

As discussed previously, a full-quarter accession is an individual who acceded in the preceding quarter and achieved full-quarter status in the current quarter. The QWI system measures the average earnings of fullquarter accessions in a given quarter by summing the UI wage record earnings of all full-quarter accessions during the quarter and dividing by the number of full-quarter accessions in that quarter. A.1.19

Average Earnings of Full-Quarter New Hires

Full-quarter new hires are accessions to full-quarter status who were also new hires in the preceding quarter. The average earnings of full-quarter new hires are measured as the sum of UI wage records for a given employer for all full-quarter new hires in a given quarter divided by the number of fullquarter new hires in that quarter. A.1.20

Average Earnings of Full-Quarter Separations

Full-quarter separations are individuals who separate during the current quarter who were full-quarter employees in the previous quarter. The QWI system measures the average earnings of full-quarter separations by summing the earnings for all individuals who are full-quarter status in the current quarter and who separate in the subsequent quarter. This total is then divided by full-quarter separations in the subsequent quarter. Thus, the average earnings of full-quarter separations are the average earnings of fullquarter employees in the current quarter who separated in the next quarter. Note the dating of this variable. A.1.21 Average Periods of Nonemployment for Accessions, New Hires, and Recalls As noted previously, an accession occurs when a job starts; that is, on the first occurrence of a SEIN-PIK pair following the first quarter of available data. When the QWI system detects an accession, it measures the number of quarters (up to a maximum of four) that the individual spent nonemployed in the state prior to the accession. The QWI system estimates the

214

J. Abowd et al.

number of quarters spent nonemployed by looking for all other jobs held by the individual at any employer in the state in the preceding quarters up to a maximum of four. If the QWI system does not find any other valid UI wage records in a quarter preceding the accession, it augments the count of nonemployed quarters for the individual who acceded, up to a maximum of four. Total quarters of nonemployment for all accessions is divided by accessions to estimate average periods of nonemployment for accessions. Here is a detailed example. Suppose individual 1 and individual 2 accede to employer A in 2000:1. In 1999:4, individual A does not work for any other employers in the state. In 1999:1 through 1999:3 individual 1 worked for employer B. Individual 1 had one quarter of nonemployment preceding the accession to employer A in 2000:1. Individual 2 has no valid UI wage records for 1999:1 through 1999:4. Individual 2 has four quarters of nonemployment preceding the accession to employer A in 2000:1. The accessions to employer A in 2000:1 had an average of 2.5 quarters of nonemployment in the state prior to accession. Average periods of nonemployment for new hires and recalls are estimated using exactly analogous formulas except that the measures are estimated separately for accessions who are also new hires as compared with accession who are recalls. A.1.22

Average Number of Periods of Nonemployment for Separations

Analogous to the average number of periods of nonemployment for accessions prior to the accession, the QWI system measures the average number of periods of nonemployment in the state for individuals who separated in the current quarter, up to a maximum of four. When the QWI system detects a separation, it looks forward for up to four quarters to find valid UI wage records for the individual who separated among other employers in the state. Each quarter that it fails to detect any such jobs is counted as a period of nonemployment, up to a maximum of four. The average number of periods of nonemployment is estimated by dividing the total number of periods of nonemployment for separations in the current quarter by the number of separations in the quarter. A.1.23 Average Changes in Total Earnings for Accessions and Separations The QWI system measures the change in total earnings for individuals who accede or separate in a given quarter. For an individual accession in a given quarter, the QWI system computes total earnings from all valid wage records for all of the individual’s employers in the preceding quarter. The system then computes the total earnings for the same individual for all valid wage records and all employers in the current quarter. The acceding individual’s change in earnings is the difference between the cur-

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

215

rent quarter earnings from all employers and the preceding quarter earnings from all employers. The average change in earnings for all accessions is the total change in earnings for all accessions divided by the number of accessions. The QWI system computes the average change in earnings for separations in an analogous manner. The system computes total earnings from all employers for the separating individual in the current quarter and subtracts total earnings from all employers in the subsequent quarter. The average change in earnings for all separations is the total change in earnings for all separations divided by the number of separations. Here is an example for the average change in earnings of accessions. Suppose individual 1 accedes to employer A in 2000:3. Earnings for individual 1 at employer A in 2000:3 are $8,000. Individual 1 also worked for employer B in 2000:2 and 2000:3. Individual 1’s earnings at employer B were $7,000 and $3,000 in 2000:2 and 2000:3, respectively. Individual 1’s change in total earnings between 2000:3 and 2000:2 was $4,000 ( $8,000 $3,000 – $7,000). Individual 2 also acceded to employer A in 2000:3. Individual 2 earned $9,000 from employer A in 2000:3. Individual 2 had no other employers during 2000:2 or 2000:3. Individual 2’s change in total earnings is $9,000. The average change in earnings for all of employer A’s accessions is $6,500 ( [$4,000 $9,000] /2) , the average change in total earnings for individuals 1 and 2. A.2 A.2.1

Definitions of Job Flow, Worker Flow, and Earnings Statistics Overview and Basic Data Processing Conventions

For internal processing the variable t refers to the sequential quarter. The variable t runs from qmin to qmax, regardless of the state being processed. The quarters are numbered sequentially from 1 (1985:1) to the latest available quarter. These values are qmin 1 (1985:1) and qmax 88 (2006:4), as of December 2007. For publication, presentation, and internal data files, all dates are presented as (year:quarter) pairs (e.g., 1990:1) for first quarter 1990. The variable qfirst refers to the first available sequential quarter of data for a state (e.g., qfirst 21 for Illinois). The variable qlast refers to the last available sequential quarter of data for a state (e.g., qlast 88 for Illinois). Unless otherwise specified a variable is defined for qfirst t qlast. Statistics are produced for both sexes combined, as well as separately, for all age groups, ages fourteen to eighteen, nineteen to twentyone, twenty-two to twenty-four, twenty-five to thirty-four, thirty-five to forty-four, forty-five to fifty-four, fifty-five to sixty-four, sixty-five and over, and all combinations of these age groups and sexes. An individual’s age is measured as of the last day of the quarter.

216

J. Abowd et al.

A.2.2

Individual Concepts

Flow employment: (m): for qfirst t qlast, individual i employed (matched to a job) at some time during period t at establishment j (A1) mijt

0, otherwise.

1, if i has positive earnings at establishment j during quarter t

Beginning-of-quarter employment: (b): for qfirst t, individual i employed at the beginning of t (and the end of t – 1), bijt

(A2)

1, if mijt1 mijt 1

0, otherwise.

End-of-quarter employment: (e): for t qlast, individual i employed at j at the end of t (and the beginning of t 1), eijt

(A3)

1, if mijt mijt 1 1

0, otherwise.

Accessions: (a1): for qfirst t, individual i acceded to j during t a1ijt

(A4)

1, if mijt1 0 and mijt 1

0, otherwise.

Separations: (s1): for t qlast, individual i separated from j during t s1ijt

(A5)

1, if mijt 1 and mijt 0

0, otherwise.

Full-quarter employment: ( f ): for qfirst t qlast, individual i was employed at j at the beginning and end of quarter t (full-quarter job) (A6)

fijt

1, if mijt1 1 and mijt 1 and mijt 1 1

0, otherwise.

New hires: (h1): for qfirst 3 t, individual i was newly hired at j during period t (A7) h1ijt

1, if mijt4 0 and mijt3 0 and mijt2 0 and mijt1 0 and mijt 1

0, otherwise.

Recalls: (r1): for qfirst 3 t, individual i was recalled from layoff at j during period t

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

(A8)

r1ijt

217

1, if mijt1 0 and mijt 1 and hijt 0

0, otherwise.

Accessions to consecutive-quarter status: (a2): for qfirst t qlast, individual i transited from accession to consecutive-quarter status at j at the end of t and the beginning of t 1 (accession in t and still employed at the end of the quarter) (A9)

a2ijt

1, if a1ijt 1 and mijt 1 1

0, otherwise.

Accessions to full-quarter status: (a3): for qfirst 1 t qlast, individual i transited from consecutive-quarter to full-quarter status at j during period t (accession in t – 1 and employed for the full quarter in t) (A10)

a3ijt

1, if a2ijt1 1 and mijt 1 1

0, otherwise.

New hires to consecutive-quarter status: (h2): for qfirst 3 t qlast, individual i transited from newly hired to consecutive-quarter hired status at j at the end of t and the beginning of t 1 (hired in t and still employed at the end of the quarter) (A11)

h2ijt

1, if h1ijt 1 and mijt 1 1 . 0, otherwise

New hires to full-quarter status: (a3): for qfirst 4 t qlast, individual i transited from consecutive-quarter hired to full-quarter hired status at j during period t (hired in t – 1 and full-quarter employed in t) (A12)

h3ijt

1, if h2ijt1 1 and mijt 1 1

0, otherwise.

Recalls to consecutive-quarter status: (r2): for qfirst 3 t qlast, individual i transited from recalled to consecutive-quarter recalled status at j at the end of t and beginning of t 1 (recalled in t and still employed at the end of the quarter) (A13)

r2ijt

1, if r1ijt 1 and mijt 1 1

0, otherwise.

Recalls to full-quarter status: (r3) for qfirst 4 t qlast, individual i transited from consecutive-quarter recalled to full-quarter recalled status at j during period t (recalled in t – 1 and full-quarter employed in t)

218

(A14)

J. Abowd et al.

r3ijt

1, if r2ijt1 1 and mijt 1 1

0, otherwise.

Separations from consecutive-quarter status: (s2): for qfirst t qlast, individual i separated from j during t with consecutive-quarter status at the start of t (A15)

s2ijt

1, if s1ijt 1 and mijt1 1

0, otherwise.

Separations from full-quarter status: (s3): for qfirst 1 t qlast, individual i separated from j during t with full-quarter status during t – 1 (A16)

s3ijt

1, if s2ijt 1 and mijt2 1

0, otherwise.

Total earnings during the quarter: (w1): for qfirst t qlast, earnings of individual i at establishment j during period t (A17)

w1ijt ∑ all UI-covered earnings by i at j during t.

Earnings of end-of-period employees: (w2): for qfirst t qlast, earnings of individual i at establishment j during period t (A18)

w2ijt

w1ijt, if eijt 1

undefined, otherwise.

Earnings of full-quarter individual: (w3): for qfirst t qlast, earnings of individual i at establishment j during period t (A19)

w3ijt

w1ijt, if fijt 1

undefined, otherwise.

Total earnings at all employers: (w1•): for qfirst t qlast, total earnings of individual i during period t (A20)

w1i•t

∑

w1ijt.

j employs i during t

Total earnings at all employers for of end-of-period employees: (w2•): for qfirst t qlast, total earnings of individual i during period t (A21)

w2i•t

w1i•t, if eijt 1

undefined, otherwise.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

219

Total earnings at all employers of full-quarter employees: (w3•): for qfirst t qlast, total earnings of individual i during period t (A22)

w3i•t

w1i•t, if fijt 1

undefined, otherwise.

Change in total earnings at all employers: (w1•): for qfirst t qlast, change in total earnings of individual i between periods t – 1 and t (A23)

w1i•t w1i•t w1i•t1.

Earnings of accessions: (wa1): for qfirst t qlast, earnings of individual i at employer j during period t (A24)

wa1ijt

w1ijt, if a1ijt 1

undefined, otherwise.

Earnings of consecutive-quarter accessions: (wa2): for qfirst t qlast, earnings of individual i at employer j during period t (A25)

wa2ijt

w1ijt, if a2ijt 1

undefined, otherwise.

Earnings of full-quarter accessions: (wa3): for qfirst 1 t qlast, earnings of individual i at employer j during period t (A26)

wa3ijt

w1ijt, if a3ijt 1

undefined, otherwise.

Earnings of full-quarter new hires: (wh3): for qfirst 4 t qlast, earnings of individual i at employer j during period t (A27)

wh3ijt

w1ijt, if h3ijt 1

undefined, otherwise.

Total earnings change for accessions: (wa1): for qfirst 1 t qlast, earnings change of individual i at employer j during period t (A28)

wa1ijt

w1i•t, if a1ijt 1

undefined, otherwise.

Total earnings change for full-quarter accessions: (wa3): for qfirst 2 t qlast, earnings change of individual i at employer j during period t

220

(A29)

J. Abowd et al.

w1i•t, if a3ijt 1

undefined, otherwise.

wa3ijt

Earnings of separations from establishment: (ws1): for t qlast, earnings of individual i separated from j during t (A30)

ws1ijt

w1ijt, if s1ijt 1

undefined, otherwise.

Earnings of full-quarter separations: (ws3): for qfirst 1 t qlast, individual i separated from j during t 1 with full-quarter status during t (A31)

ws3ijt

w1ijt, if s3ijt 1 1

undefined, otherwise.

Total earnings change for separations: (ws1): for t qlast, earnings change in period t 1 of individual i separated from j during t (A32)

ws1ijt

w1i•t 1, if s1ijt 1

undefined, otherwise.

Total earnings change for full-quarter separations: (ws3): for t qlast, earnings change in period t 1 of individual i full-quarter separated from j during t, last full-quarter employment was t – 1 (A33)

ws3ijt

w1i•t 1, if s3ijt 1

undefined, otherwise.

Periods of nonemployment prior to an accession: (na): for qfirst 3 t, periods of nonemployment during the previous four quarters by i prior to an accession at establishment j during t

(A34)

naijt

∑

nits, if a1ijt 1

1s4

undefined, otherwise.

where nit 1, if mijt 0 ∀ j. Periods of nonemployment prior to a new hire: (nh): for qfirst 3 t, periods of nonemployment during the previous four quarters by i prior to a new hire at establishment j during t

(A35)

nhijt

∑

nits, if h1ijt 1

1s4

undefined, otherwise.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

221

Periods of nonemployment prior to a recall: (nr): for qfirst 3 t, periods of nonemployment during the previous four quarters by i prior to a recall at establishment j during t

(A36)

nrijt

∑

nits, if r1ijt 1

1s4

undefined, otherwise.

Periods of nonemployment following a separation: (ns): for t qlast – 3, periods of nonemployment during the next four quarters by individual i separated from establishment j during t

(A37) A.2.3

nsijt

∑

nit s, if s1ijt 1

1s4

undefined, otherwise.

Establishment Concepts

For statistic xcijt denote the sum over i during period t as xc•jt. For example, beginning-of-period employment for firm j is written as: (A38)

b•jt ∑bijt. i

All individual statistics generate establishment totals according to the formula above. The key establishment statistic is the average employment growth rate for establishment j, the components of which are defined here. Beginning-of-period employment: (number of jobs) (A39)

Bjt b•jt.

End-of-period employment: (number of jobs) (A40)

Ejt e•jt.

Employment any time during the period: (number of jobs) (A41)

Mjt m•jt.

Full-quarter employment: (A42)

Fjt f•jt.

Net job flows: (change in employment) for establishment j during period t (A43)

JFjt Ejt Bjt.

Average employment: for establishment j between periods t – 1 and t (A44)

(Bjt Ejt) E jt . 2

222

J. Abowd et al.

Average employment growth rate: for establishment j between periods t – 1 and t JFjt (A45) Gjt . E jt Job creation: for establishment j between periods t – 1 and t (A46)

JCjt E jt max (0,Gjt).

Average job creation rate: for establishment j between periods t – 1 and t JCjt (A47) JCRjt . E jt Job destruction: for establishment j between periods t – 1 and t (A48)

JDjt E jt abs (min (0,Gjt)).

Average job destruction rate: for establishment j between periods t – 1 and t JDjt (A49) JDRjt . E jt Net change in full-quarter employment: for establishment j during period t (A50)

FJFjt Fjt Fjt1.

Average full-quarter employment: for establishment j during period t (A51)

Fjt1 Fjt F jt . 2

Average full-quarter employment growth rate: for establishment j between t – 1 and t FJFjt (A52) FGjt . F jt Full-quarter job creations: for establishment j between t – 1 and t (A53)

FJCjt F jt max (0, FGjt).

Average full-quarter job creation rate: for establishment j between t – 1 and t FJCjt (A54) FJCRjt . F jt Full-quarter job destruction: for establishment j between t – 1 and t (A55)

FJDjt F jt abs (min (0, FGjt)).

Average full-quarter job destruction rate: for establishment j between t – 1 and t

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

(A56)

223

FJDjt FJDRjt . F jt

Accessions: for establishment j during t (A57)

Ajt a1•jt.

Average accession rate: for establishment j during t Ajt (A58) ARjt . E jt Separations: for establishment j during t (A59)

Sjt s1•jt.

Average separation rate: for establishment j during t Sjt (A60) SRjt . E jt New hires: for establishment j during t (A61)

Hjt h1•jt.

Full-quarter new hires: for establishment j during t (A62)

H3jt h3•jt.

Recalls: for establishment j during t (A63)

Rjt r1•jt.

Flow into full-quarter employment: for establishment j during t (A64)

FAjt a3•jt.

New hires into full-quarter employment: for establishment j during t (A65)

FHjt h3•jt.

Average rate of flow into full-quarter employment: for establishment j during t FAjt (A66) FARjt . F jt Flow out of full-quarter employment: for establishment j during t (A67)

FSjt s3•jt.

Average rate of flow out of full-quarter employment: for establishment j during t FSjt (A68) FSRjt . F jt

224

J. Abowd et al.

Flow into consecutive quarter employment: for establishment j during t CAjt a2•jt.

(A69)

Flow out of consecutive quarter employment: for establishment j during t CSjt s2•jt.

(A70) Total payroll of all employees:

W1jt w1•jt.

(A71)

Total payroll of end-of-period employees: W2jt w2•jt.

(A72)

Total payroll of full-quarter employees: W3jt w3•jt.

(A73) Total payroll of accessions: (A74)

WAjt wa1•jt.

Change in total earnings for accessions: (A75)

WAjt ∑

wa1ijt.

i∈{J(i,t)j}

Total payroll of transits to consecutive-quarter status: (A76)

WCAjt wa2•jt.

Total payroll of transits to full-quarter status: (A77)

WFAjt wa3•jt.

Total payroll of new hires to full-quarter status: (A78)

WFHjt wh3•jt.

Change in total earnings for transits to full-quarter status: (A79)

WFAjt ∑

wa3ijt.

i∈{J(i,t)j}

Total periods of nonemployment for accessions: (A80)

NAjt na•jt.

Total periods of nonemployment for new hires (last four quarters): (A81)

NHjt nh•jt.

Total periods of nonemployment for recalls (last four quarters): (A82)

NRjt nr•jt.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

225

Total earnings of separations: (A83)

WSjt ws1•jt.

Total change in total earnings for separations: (A84)

WSjt ∑

ws1ijt.

i∈{J(i,t)j}

Total earnings of separations from full-quarter status (most recent full quarter): (A85)

WFSjt ws3•jt.

Total change in total earnings for full-quarter separations: (A86)

WFSjt ∑

ws3ijt.

i∈{J(i,t)j}

Total periods of nonemployment for separations: NSjt ns•jt.

(A87)

Average earnings of end-of-period employees: (A88)

W2jt ZW2jt . Ejt

Average earnings of full-quarter employees: (A89)

W3jt ZW3jt . Fjt

Average earnings of accessions: (A90)

WAjt ZWAjt . Ajt

Average change in total earnings for accessions: (A91)

WAjt ZWAjt . Ajt

Average earnings of transits to full-quarter status: (A92)

WFAjt ZWFAjt . FAjt

Average earnings of new hires to full-quarter status: (A93)

WFHjt ZWFHjt . FHjt

226

J. Abowd et al.

Average change in total earnings for transits to full-quarter status: (A94)

WFAjt ZWFAjt . FAjt

Average periods of nonemployment for accessions: (A95)

NAjt ZNAjt . Ajt

Average periods of nonemployment for new hires (last four quarters): (A96)

NHjt ZNHjt . Hjt

Average periods of nonemployment for recalls (last four quarters): (A97)

NRjt ZNRjt . Rjt

Average earnings of separations: (A98)

WSjt ZWSjt . Sjt

Average change in total earnings for separations: (A99)

WSjt ZWSjt . Sjt

Average earnings of separations from full-quarter status (most recent full quarter): (A100)

WFSjt1 ZWFSjt1 . FSjt

Average change in total earnings for full-quarter separations: (A101)

WFSjt ZWFSjt . FSjt

Average periods of nonemployment for separations: (A102)

NSjt ZNSjt . Sjt

End-of-period employment (number of workers): [Aggregate concept not related to a business] (A103)

Nt n•t.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

A.2.4

227

Identities

The identities stated below hold at the establishment level for every age group and sex subcategory. These identities are preserved in the QWI processing. Definition 1: Employment at beginning of period t equals end of period t – 1 Bjt Ejt1. Definition 2: Evolution of end-of-period employment Ejt Bjt Ajt Sjt. Definition 3: Evolution of average employment (Ajt Sjt) E jt Bjt . 2 Definition 4: Job flow identity JFjt JCjt JDjt. Definition 5: Creation-destruction identity Ejt Bjt JCjt JDjt. Definition 6: Creation-destruction/accession-separation identity Ajt Sjt JCjt JDjt. Definition 7: Evolution of full-quarter employment Fjt Fjt1 FAjt FSjt. Definition 8: Full-quarter creation-destruction identity Fjt Fjt1 FJCjt FJDjt. Definition 9: Full-quarter job flow identity FJFjt FJCjt FJDjt. Definition 10: Full-quarter creation-destruction/accession-separation identity FAjt FSjt FJCjt FJDjt. Definition 11: Employment growth rate identity Gjt JCRjt JDRjt. Definition 12: Creation-destruction/accession-separation rate identity JCRjt JDRjt ARjt SRjt.

228

J. Abowd et al.

Definition 13: Full-quarter employment growth rate identity FGjt FJCRjt FJDRjt. Definition 14: Full-quarter creation-destruction/accession-separation rate identity FJCRjt FJDRjt FARjt FSRjt. Definition 15: Total payroll identity W1jt W2jt WSjt. Definition 16: Payroll identity for consecutive-quarter employees W2jt W1jt WCAjt WSjt. Definition 17: Full-quarter payroll identity W3jt W2jt WCAjt. Definition 18: New hires/recalls identity Ajt Hjt Rjt. Definition 19: Periods of nonemployment identity NAjt NHjt NRjt. Definition 20: Worker-jobs in period t are the sum of accessions and beginning of period employment Mjt Ajt Bjt. Definition 21: Worker-jobs in period t are the sum of accessions to consecutive quarter status, separations, and full-quarter workers Mjt CAjt Sjt Fjt. Definition 22: Consecutive quarter accessions in period t – 1 are the sum of consecutive quarter separations in period t and full quarter accessions in period t CAjt1 CSjt FAjt FSjt. A.2.5

Aggregation of Job Flows

The aggregation of job flows is performed using growth rates to facilitate confidentiality protection. The rate of growth JF for establishment j during period t is estimated by: (A104)

JFjt Gjt . E jt

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

229

For an arbitrary aggregate k (ownership state substate-geography industry age group sex) cell, we have:

∑ j∈{K( j)k} Ejt Gjt Gkt . E kt

(A105)

where the function K( j ) indicates the classification associated with firm j. We calculate the aggregate net job flow as JFkt

(A106)

∑

JFjt.

j∈{K( j)k}

Substitution yields JFkt ∑ (E jt Gjt) Gkt E kt

(A107)

j

so the aggregate job flow, as computed, is equivalent to the aggregate growth rate times aggregate employment. Gross job creation/destruction aggregates are formed from the job creation and destruction rates by analogous formulas substituting JC or JD, as appropriate, for JF (Davis et al. 1996, p. 189 for details). A.2.6

Measurement of Employment Churning

The QWI measure employment churning (also called turnover) using the ratio formula: (A108)

(FAkt FSkt)/2 FTkt Fkt

for an arbitrary aggregate k (ownership state substate-geography industry age group sex) cell. In the actual production of the QWI, the three components of this ratio are computed as separate estimates and are released.

References Abowd, J. M., J. C. Haltiwanger, and J. I. Lane. 2004. Integrated longitudinal employee-employer data for the United States. American Economic Review 94 (2): 224–29. Abowd, J. M., P. A. Lengermann, and K. L. McKinney. 2002. The measurement of human capital in the U.S. economy. Technical Paper TP-2002-09. Longitudinal Employer-Household Dynamics (LEHD), U.S. Census Bureau. Abowd, J. M., B. E. Stephens, and L. Vilhuber. 2006. Confidentiality protection in the Census Bureau’s Quarterly Workforce Indicators. Technical Paper TP-200602. Longitudinal Employer-Household Dynamics (LEHD), U.S. Census Bureau.

230

J. Abowd et al.

Abowd, J. M. and L. Vilhuber. 2005. The sensitivity of economic statistics to coding errors in personal identifiers. Journal of Business and Economic Statistics 23 (2): 133–52. Benedetto, G., J. Haltiwanger, J. Lane, and K. McKinney. 2007. Using worker flows in the analysis of the firm. Journal of Business and Economic Statistics 25 (3): 299–313. Bureau of Labor Statistics. 1997a. BLS Handbook of Methods. U.S. Bureau of Labor Statistics, Division of Information Services, Washington D.C. Available at http://www.bls.gov/opub/hom/ ———. 1997b. Quality improvement project: Unemployment insurance wage records. Report of the U.S. Department of Labor. Chiang, H., K. Sandusky, and L. Vilhuber. 2005. Longitudinal EmployerHousehold Dynamics (LEHD) Business Register Bridge technical documentation. Internal Document IP-LEHD-BRB. LEHD, U.S. Census Bureau. Davis, S. J., J. C. Haltiwanger, and S. Schuh. 1996. Job creation and destruction. Cambridge, MA: The MIT Press. Longitudinal Employer-Household Dynamics Program. 2002. The Longitudinal Employer-Household Dynamics program: Employment Dynamics Estimates Project version 2.2 and 2.3. Technical Paper TP-2002-05-rev1. LEHD, U.S. Census Bureau. Stephens, B. 2006. Firms, wage dispersion, and compensation policy: Assessment and implications. Ph.D. diss. University of Maryland, College Park, Maryland. Stevens, D. W. 2007. Employment that is not covered by state unemployment insurance laws. Technical Paper TP-2007-04. LEHD, U.S. Census Bureau.

Comment

Katharine G. Abraham

This chapter describes in some considerable detail the sources and methods used to construct the data files that underlie the new Quarterly Workforce Indicators (QWI) produced by the U.S. Census Bureau. This innovative program draws on a wide variety of data sources to produce county-level estimates of earnings, employment, and job flows, disaggregated by industry, age of worker, and sex of worker. The resulting estimates already have proven to be of considerable interest to local planners and policymakers, and it is easy to imagine additional uses for them. The chapter should be a valuable resource for users of the QWI data as well as for researchers who may be interested in working with the underlying data files. Unavoidably, given the ambitious nature of the exercise undertaken and the limitations of the underlying source data, development of the QWI has confronted a variety of data problems. The QWI files draw heavily on administrative records—including unemployment insurance (UI) wage Katharine G. Abraham is a professor in the Joint Program in Survey Methodology and a faculty associate of the Maryland Population Research Center at the University of Maryland, and a research associate of the National Bureau of Economic Research.

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

231

records, employer reports to state employment security agencies, and the Census Bureau’s Person Characteristics File based on the Social Security Administration’s Numident file—which were not developed for statistical purposes. Other information is drawn from large national surveys that have better statistical properties but cover only a fraction of the population. Much of the chapter is devoted to explaining the methods currently used to address the various shortcomings of the underlying source data, as well as improvements in those methods planned for the future. My comments review briefly some key issues that the QWI developers have had to confront. • Miscoding of individual identifiers. If not corrected, miscoding of individual identifiers will lead to overstatement of worker flows, misrepresentation of workers’ earnings trajectories, and misstatement of the earnings of both departing and newly hired workers. A 1997 study of UI wage records conducted by the Bureau of Labor Statistics found that approximately 7.8 percent of individual Social Security numbers were miscoded (U.S. Bureau of Labor Statistics 1997). Abowd and Vilhuber (2005) describe a clever automated method for identifying and correcting miscodes that may occur in the middle of an ongoing spell of employment, but this method cannot capture coding mistakes that are caught by reporters and permanently corrected, or coding mistakes that are never caught. By design, the Abowd and Vilhuber procedure is conservative, producing recodes for only about 0.5 percent of wage records. While they are somewhat dated, the larger figures from the BLS study suggest that there may be a substantial amount of miscoding in individual identifiers that the Abowd and Vilhuber procedure does not capture. Further research will be needed to determine the severity of individual identifier miscoding, and what it implies for various potential uses of the QWI and associated data files. • Failure to identify continuing firms or establishments with new identification numbers. Similar to the problems associated with miscoding of individual identifiers, treating continuing establishments as new businesses leads to overstatement of business births and business deaths, as well as to overstatement of worker flows. This is perhaps the moststudied of all of the various potential problems with the QWI source data, and the techniques employed to identify establishment matches in business register data have improved a great deal over the past ten years. A clever recent innovation pioneered in the course of developing the QWI is the use of information on flows of groups of workers across establishments to identify cases in which a firm that appears to be a new birth is really a reincarnation of an old firm. While there undoubtedly are remaining cases in which continuing businesses are not

232

J. Abowd et al.

identified as such, this has to be a less serious problem than it would have been even a few years ago. • Missing information on individual characteristics. In the QWI, missing information on individual characteristics is filled in using multiple imputation techniques. Information on individuals’ age and gender is derived from Social Security records and is missing for just 3 percent of QWI records. Place of residence is missing for about 10 percent of records. The only individual-level information on education presently available for use in building the QWI files is that derived from the Survey of Income and Program Participation (SIPP) and Current Population Survey (CPS), meaning that education is missing and must be imputed for most records. This is done based on the relationship of education to age, earnings and industry in the 1990 Census. The very high rates of imputation for education cannot help but make users of these data uneasy. The planned incorporation of direct information on education for the approximately one-sixth of the population that completed the 2000 Census Long Form will be a positive step, but the share of people for whom education must be imputed will remain large. • Missing information on employer characteristics. Employer-provided information contained in the business register files is used to assign NAICS codes and a geographic location to establishments, as well as to characterize the structure of the firms to which these establishments belong. Though specific percentages are not cited, a significant number of imputations must be performed to produce a complete data file (see Konigsberg et al. 2005, for a discussion of allocations and imputations in the Quarterly Census of Employment and Wages based on the same employer characteristic source data as the QWI). The best imputations likely are those that can be based on records for the same establishment from other time periods; such information, however, is not always available. As with the data for individuals, the use of imputed information on employer characteristics may be a problem for analytical uses of the data. • Missing information on the specific establishment in which each worker is employed. When a firm consists of just one physical establishment, there is no difficulty in determining where a person employed by that firm works; in cases where the firm has more than one establishment, however, the assignment of individual workers to specific establishments generally is not reported. Only in Minnesota do the UI wage records indicate which establishment of a multiple-establishment firm employs which workers. As described in the chapter, the data for Minnesota are used to develop a model for probabilistically assigning workers to specific worksites within their firm that is then applied to the information available for other states. Whether a model fit using

LEHD Infrastructure Files and Creating Quarterly Workforce Indicators

233

Minnesota data can reasonably be applied to other locations is, of course, very much an open question. One of the most intriguing uses of the QWI data files is to analyze the geography of economic development, looking, for example, at where people live, where they work, and the patterns of travel between those locations. Errors in the assignment of workers to establishments could be especially problematic for this sort of analysis. In addition to these data quality issues, the chapter also notes current limitations in the scope of the QWI data set. Two in particular seem important. First, it is not presently possible to track workers who move from one state to another. Second, the self-employed are presently excluded from the QWI universe. Depending on the question one was interested in answering, both of these exclusions could be substantively important. If, for example, significant numbers of displaced workers move into selfemployment, using the QWI data to study the earnings consequences of job loss could produce misleading conclusions. The chapter indicates that work is underway to address these current limitations of the QWI. A final point to note is that noise is added to the QWI records to protect the confidentiality of the underlying information. The designers of the process used to fuzz the QWI data pay attention to preserving their statistical properties, and the chapter suggests that the analytic validity of the files should not be adversely affected. This can be asserted confidently, however, only with respect to the examination of relationships that were anticipated in the design of the fuzzing process. The preceding comments are in no way intended to be critical of the authors or to disparage the work that has been done to produce the Quarterly Workforce Indicators. As a practical matter, there is no real alternative to the use of administrative statistics to produce local labor market information at the level of detail contained in the QWI. Further, though they are sometimes discussed in a way that suggests they can be taken as truth, survey data also suffer from a variety of sampling and nonsampling errors. These are seldom as well documented as the potential errors in the QWI described in the chapter, but that does not mean they do not exist. Still, it is important to recognize and remember that a good deal of the information that underlies the Quarterly Workforce Indicators is imputed rather than measured directly. In some cases, this will not matter very much; in other cases, the use of imputed data could lead to results that are misleading. Given the complexity of the process used to construct the indicators, it is rather difficult to know what degree of confidence to place in the picture they paint. Documenting the methods used to construct the data is an important first step and one the authors are to be commended for having taken. Further work will be required to develop a fuller under-

234

J. Abowd et al.

standing of the quality properties of the QWI estimates and data files, and of their suitability for different analytic purposes. References Abowd, J. M., and L. Vilhuber. 2005. The sensitivity of economic statistics to coding errors in personal identifiers. Journal of Business and Economic Statistics 23 (2): 133–52. Konigsberg, S., M. Piazza, D. Talon, and R. Clayton. 2005. Quarterly Census of Employment and Wages (QCEW) Business Register Metrics. Paper presented at the Joint Statistical Meetings, August. Minneapolis, Minnesota. U.S. Bureau of Labor Statistics. 1997. Quality improvement project: Unemployment insurance wage records. Unpublished report, Washington, D.C.

6 The Role of Retail Chains National, Regional, and Industry Results Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

6.1 Introduction The U.S. retail trade sector has undergone dramatic change in recent decades. The share of U.S. civilian employment associated with retail trade has increased from 12.6 percent in 1958 to 16.4 percent in 2000, and retail employment has more than doubled. In addition to this growth, the sector has been affected in important ways by changes in technology and societal trends such as suburbanization and changes in consumer preferences. The structure of retail markets, affected by all these forces, has been continuously evolving. A major feature of this evolution has been the growth of large national retail chains. This has been coupled with a dramatic decrease in the share of retail activity accounted for by small single location or mom-and-pop stores. In 1948, single location retail firms accounted for 70.4 percent of retail sales, but only 60.2 percent by 1967 (U.S. Census Bureau 1971). By 1997, this share had fallen further to 39 percent. In 1948, large retail firms with more than 100 establishments accounted for 12.3 percent of retail sales, but this number grew to 18.6 percent in 1967 (U.S.

Ronald S. Jarmin is chief economist and chief, Center for Economic Studies (CES) at the U.S. Census Bureau. Shawn D. Klimek is a senior economist at the Center for Economic Studies (CES) at the U.S. Census Bureau. Javier Miranda is an economist at the Center for Economic Studies (CES) at the U.S. Census Bureau. This chapter was written by Census Bureau staff. It has undergone a more limited review than official Census Bureau publications. Any views, findings, or opinions expressed in this chapter are those of the authors and do not necessarily reflect those of the Census Bureau. We would like to thank Jeff Campbell, Tim Dunne, Mark Roberts, Brad Jensen, Emek Basker, and participants at the NBER-Conference on Research in Income and Wealth (CRIW) and the Business Data Linking Conference in Cardiff, Wales for useful comments. Any remaining errors are solely the responsibility of the authors.

237

238

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

Census Bureau 1971). By 1997, these large retail firms account for 36.9 percent of all retail sales. Many observers have noted the dramatic changes in the structure of retail markets. Among the more important changes is the rise of big box national retail chains, such as Wal-Mart. However, the figures cited above indicate that the trend away from mom-and-pops towards national chains has been underway since long before the advent of the big box stores. The trend also predates the wide scale adoption of information technology by retailers. Rather, the rise of technologically sophisticated national retail chains like Wal-Mart, Toys-R-Us, and Home Depot is simply part of the larger trend—underway for some time—towards larger scale retail firms. What is clear is that the dynamics of the changes during the post-World War II era in the retail sector are not well documented. This is due, in part, to a lack of comprehensive firm level longitudinal data that would allow researchers to describe and analyze the structure of retail markets. In this chapter, we use a recently constructed Census Bureau data set, the Longitudinal Business Database (LBD), to examine local retail markets over the 1976 to 2000 period. We believe these are the best data available to study trends across the entire U.S. retail sector over a long time period. These data are not perfect, however, and we discuss several remaining data gaps and measurement issues. The chapter proceeds as follows. In section 6.2 we summarize some of the trends that have characterized the retail sector in the United States over the last several decades. We discuss data and measurement issues in section 6.3. We provide some basic but informative descriptions of different types of firms in national and regional retail markets in section 6.4 and offer conclusions and discuss future research in section 6.5. 6.2 Trends in the U.S. Retail Sector Like the rest of the U.S. economy, the retail trade sector has been undergoing significant structural changes in recent decades. However, since everyone is a consumer and interacts with businesses in the retail sector regularly, these changes have not come without controversy. The trend away from smaller scale mom-and-pop retailers and towards large national chains of big box stores is often blamed in the popular media for a host of social, economic, and environmental ills. Our purpose is not to participate in this debate, but to improve the tools researchers and policymakers have at their disposal to measure changes in the structure of the retail sector and to begin to understand the forces that underlie them. 6.2.1 Basic Features of the Recent Evolution of U.S. Retail Markets To lay the groundwork for the rest of the chapter, it is useful to review, from a more macro perspective, what has been going on in the retail sector

The Role of Retail Chains: National, Regional, and Industry Results

Fig. 6.1

239

U.S. retail employment and share of the employed civilian population

Sources: Statistical Abstract of the U.S., Economic Report of the President, and own calculations from the LBD.

over the last several decades. Figure 6.1 shows the growth of U.S. retail employment from 1958 to 2000. We see that, on the Standard Industrial Classification (SIC)1 basis, retail employment grew from just under 8 million in 1958 to over 22 million in 2000. The figure also shows that the share of retail in overall U.S. employment has gone up from 12.6 percent to 16.4 percent. Retail employment saw a dramatic increase of roughly 175 percent over the 1958 to 2000 period but, as shown in figure 6.2, the number of retail establishments increased by only a modest 17 percent. It is a striking feature of the evolution of retail markets that over the last four decades of the twentieth century, the U.S. population increased by just over 100 million persons (or 56 percent), but the number of retail establishments serving them grew at a much slower rate. Figure 6.2 also shows how the composition of the increase in retail establishments is accounted for by single location establishments (mom-and-pop stores) and establishments owned by multiple location retailers (chain stores). The figure shows that the number of single location retail establishments actually decreases slightly over the 1. We use a SIC definition of the retail sector in this chapter. The Census Bureau adopted NAICS in 1997, but maintained SIC codes on its business register until 2001. Given difficulties in reclassifying all historic retail establishment data in the LBD on the North American Industry Classification System (NAICS) basis (see Bayard and Klimek 2004), we decided to use SIC definitions.

240

Fig. 6.2

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

Number of retail establishments

Sources: Statistical abstract of the U.S. and own calculations from the LBD.

period while the number of chain store locations more than doubles. Retail establishments operated by multiple location chain retail firms accounted for 20.2 percent of all retail establishments in 1963 and increased to 35 percent by 2000. The ascendancy of chain stores is clearly one of the most important developments in the evolution of retail markets in the United States. Chain stores differ in many ways from the single location mom-and-pop stores that once dominated retail. This has always been the case, but it has become more important over time. Figure 6.3 shows that until around 1980 single location retailers and chains had roughly equal shares of overall retail sector employment. Since 1980, the chain store share of employment has increased to almost two-thirds of total retail employment. Contrast this with figure 6.2, which shows that chain stores make up a relatively constant one-third of all retail establishments. Between 1976 and 2000, employment at single location retailers grew by roughly 2 million workers. Employment growth at the smaller number of chain store retailers, on the other hand, was slightly under 8 million. Thus, we see that all the growth in the number of retail outlets and most of the growth in retail employment has come from retail firms that operate multiple retail establishments. An obvious consequence of the faster growth of retail employment compared to retail establishments is that the average size of retail establishments has grown substantially over time. Figure 6.4 shows that the size of the average retail establishment has more than doubled between 1958 and 2000. Retail customers today are not shopping at the same kind of stores that existed forty years ago. They are far more likely to be patronizing large chain stores. Even the nature of the small single location, mom-and-pop

The Role of Retail Chains: National, Regional, and Industry Results

Fig. 6.3

241

Retail employment at single location and chain stores

Source: Own calculations from the LBD.

Fig. 6.4

Average retail establishment size

Source: Statistical Abstract of the U.S. and own calculations from the LBD.

stores has changed. In results discussed further in section 6.4, we see that single location retail firms have on average increased in size since 1976. This may be due to technological changes that increase optimal store sizes, or competitive pressure exerted by the growth of large chain retailers. 6.2.2 Analyses of the Evolution of Retail Markets Researchers have developed both theoretical and empirical models that attempt to explain many of the features of retail markets. However, researchers have been hampered by a lack of detailed and comprehensive data to produce a set of stylized facts about the structure of the retail sec-

242

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

tor. We hope that data sets such as the LBD will provide the tools researchers need to make more progress. The feature of retail markets that attracts the most attention in the academic literature is the emergence of dominant chain firms. Bagwell, Ramey, and Spulber (1997) show how firms can come to dominate retail markets through large investments in cost reduction and vigorous price competition. Holmes (2001) explains how investments in information technology can lead to lower inventories, more frequent deliveries, and larger store sizes. Doms, Jarmin, and Klimek (2004) estimate the impact of investments in information technology on retail firm performance. They find that large firms account for nearly all the investment in IT in the retail sector and that IT improves the productivity of large firms more than it does for small firms. However, as shown in the previous section, modern retail markets are marked by the simultaneous presence of large chain stores and small momand-pops. While the relative importance of the two classes of retailers has changed significantly over time, the chains have not yet driven out all the mom-and-pops. Dinlersoz (2004) and Ellickson (2005) have models that explain the simultaneous presence of dominant and fringe retailers. Basically, they view retail markets as segmented between large chain firms that invest in sunk costs, such as advertising, and small mom-and-pops that do not, but instead offer other retail attributes, such as better customer service. These models predict that the number of chains operating in retail markets increases less than proportionately to increases in market size, and that the number of single location mom-and-pops grows roughly proportionately. Put differently, the average size of chain stores grows with market size and the average size of mom-and-pops does not. Also, Campbell and Hopenhayn (2005) show that models where margins decline with additional entry can explain observed market structures where the number of retailers decline with market size. Several observers have noted the important link between structural change in the retail sector and productivity growth. Sieling, Friedman, and Dumas (2001) and McKinsey (2002) both note that competitive pressure from technology-intensive chain stores such as Wal-Mart leads to productivity growth in the sector both by displacing less efficient retailers and by stimulating productivity improvement at surviving retail firms. Foster, Haltiwanger, and Krizan (2006) use economic census data to decompose changes in aggregate productivity. They show that net entry accounts for nearly all the productivity growth in the retail sector. The entry of establishments owned by chains is especially important as they are typically more productive than even the surviving incumbents. In a detailed analysis of the displacement of existing establishments induced by the entry of a Wal-Mart, Basker (2005) shows that in the short run, Wal-Mart entry boosts county retail employment by several hundred.

The Role of Retail Chains: National, Regional, and Industry Results

243

She uses a data set of the entry of Wal-Marts into counties, and uses publicly available County Business Patterns (CBP) data to examine the ex post change in the employment and number of producers. Although the short run impact is positive, county retail employment eventually falls as smaller retailers exit the market. The end result is that retail employment is actually larger (by about fifty jobs) than it was prior to Wal-Mart entering the county, while the number of establishments falls. However, she also finds an adverse affect on the wholesale sector, which loses about twenty jobs. Many of the empirical findings for retail are limited by the quality of available data. Campbell and Hopenhayn (2005) and Basker (2005), for example, both use publicly available CBP data. These data are annual with a long time series, but cannot be used to measure the dynamics other than the net entry of establishments and firms. Other studies are limited to particular states or industries. Most do not have the industry coverage and detailed geography to describe changes in local retail markets. The goal of this chapter is to use the rich establishment-level microdata contained in the LBD to construct a set of stylized facts about the dynamics of the retail sector. The data allow us to examine results for the national and county markets, different categories of firms, establishments and firms, urban and rural counties, across different industries for the universe of retailers with paid employees. Even though much of our analysis does not use microdata, most of our measures could not be constructed without it. 6.3 Data and Measurement Issues The discussion in the previous section helps us consider the data requirements for analyzing the dynamic structure of retail markets. The concept of producer dynamics as described in economics textbooks is pretty straightforward. Producer dynamics capture the entry and exit of sellers in an abstract market for a good or service. Theoretical models describing the behavior of buyers and sellers in various market settings show that the structure (e.g., the number and/or size distribution of sellers) and the presence (or absence) of barriers to entry (e.g., sunk costs) are important factors in determining how efficiently markets operate. In this context it is critical we understand what defines a market. The theoretical literature abstracts away from the definition of a market, but this definition is at the very heart of empirical work. Empirical analyses of markets ideally require data at the firm-product level where product refers to some bundle of characteristics that would include price, location, and other product characteristics. However, such detailed data are rarely available. Thus, most empirical analyses of producer dynamics do not precisely measure the concepts that are so important for understanding market outcomes. The detailed geographic codes and firm ownership information in the LBD allows us to consider some of these issues.

244

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

6.3.1 Using the Longitudinal Business Database to Study the Evolution of Retail Markets The Census Bureau’s Longitudinal Business Database (LBD)2 is being developed by CES as part of its mandate to construct, maintain, and use longitudinal research data sets. While falling short of the ideal dataset, several unique features of the LBD make it a powerful tool for studying producer dynamics and the evolution of retail markets. These include: • Establishment (store) level data for the universe of retailers with paid employees • Information for each establishment on the following: • Longitudinal linkages • Firm affiliation (i.e., firm structure and ownership changes) • Location • Year of birth (provides age for continuers) • Year of death • Detailed industry codes (SIC and/or NAICS) • Size (based on payroll and employment) • The LBD can be linked to Economic Census and survey data at the establishment and firm levels to provide more detailed data on inputs and outputs not available from administrative sources. • Long time series These features allow researchers to flexibly define markets and track changes in their structure over time. Linked to data on demand conditions and other unique features of particular markets, the LBD can be an extremely useful tool to researchers interested in producer dynamics. Following we discuss how we use these features of the LBD to examine the evolution of retail markets. We also point out remaining data gaps and measurement issues. First we provide a brief description of the basic features of the LBD. The LBD is based on the Census Bureau’s Business Register (BR)3 and contains longitudinally linked establishment data for all sectors of the economy. Currently, it covers the period between 1975 and 2001. For this chapter, the main advantage is that longitudinally linked data are available annually for all retail establishments in the United States. The quality of these links is critical to constructing accurate measures of establishment entry and exit, so a few additional points about its construction are useful (a detailed description can be found in Jarmin and Miranda [2002]). 2. The LBD contains confidential data under Titles 13 and 26 United States Code (U.S.C.). However, it can be accessed by researchers with approved projects at Census Bureau Research Data Center (RDC). Information on accessing these and other confidential Census Bureau microdata can be found at www.ces.census.gov 3. Formerly known as the Standard Statistical Establishment List (SSEL).

The Role of Retail Chains: National, Regional, and Industry Results

245

The LBD is created by linking annual snapshots of the BR files. For this purpose the BR contains a number of numeric establishment and firm identifiers that can be used to track establishments over time. In particular, the Permanent Plant Number (PPN) was introduced in 1981 to facilitate longitudinal analysis. It is the only numeric establishment identifier on the BR that remains fixed as long as the establishment remains in business at the same location. However, research using the Longitudinal Research Database (LRD)—a manufacturing sector precursor to the LBD—showed that there are breaks in PPN linkages leading to spurious establishment births and deaths. Other numeric identifiers can change over time with various changes in the status of an establishment (e.g., ownership changes). For these reasons, name and address matching was used to augment the numeric identifiers to create the longitudinal linkages for the LBD. Successive years of the BR were first linked using numeric identifiers. The matches (i.e., numerically identified continuers) were set aside and the residuals were submitted to name and address matching using sophisticated statistical record linkage software. The improved establishment level identifier allows us to create the most accurate measures of establishment entry and exit for any Census Bureau data set. Establishment and firm identifiers in the LBD combined with precise location information allow us to examine the entry and exit behavior of firms and establishments within specific geographic markets. The length and frequency are especially useful for these purposes, particularly for a sector as dynamic as retail trade. No other data source provides annual coverage for the universe of employer establishments and firms for as long a period as the LBD. Other data sources share some, but not all, of these characteristics. For example, the Census of Retail Trade also covers the universe of establishments, but only occurs every five years. This implies that entry and exit of retail establishments and firms between Census years would be missed. The Annual and Monthly Surveys of Retail Trade occur more frequently, allowing the measurement of changes at the annual or even monthly level, but these data only collect information from a relatively small sample of firms. This means that we no longer have universal coverage of the sector, and the entry and exit of nonsampled firms would be missed. The Bureau of Labor Statistics (BLS) also has a longitudinally linked version of their business register, but they only have information for a taxpaying unit within a state. This means that the BLS data could not be used to address questions about the role of regional or national firms, as we discuss in the following section. Finally, it is important to stress that the LBD gives us the ability to match establishments with their parent firm. This allows us to analyze both establishment-and firm-based measures of market structure. The relationship between the two measures is not obvious. On the one hand, firm dynamics omit relevant information regarding the entry and exit of estab-

246

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

lishments, as firms already producing in the market expand the number of establishments in the market. This information is vital to understanding how firms expand their operations. On the other hand, establishment dynamics will miss vital information on the ownership and control of establishments, which may be an important determinant of establishment behavior. Given the very different nature of these alternative measures and the implications on aggregate statistics, we compute statistics for both establishments and firms. 6.3.2 Measurement Strategy and Issues The ability to identify retail firms and to locate them in specific geographic markets is critical to our study. Firms are not homogeneous entities; some firms are large, have more resources, and may have experience in multiple markets. These differences are likely to drive differences in firm behavior and outcomes. Along these lines, there has been much popular attention regarding the displacement of small mom-and-pop stores by large national chains. In this section, we describe measurement issues related to our identification of firms and the markets in which they operate. We use the information in the LBD to identify and distinguish between four types of retail firms in much of the analysis that follows. Our classification is based on the number of states a firm operates in similar to Foster, Haltiwanger, and Krizan (2006). First, single store retailers are defined as one type, which we also consider to be representative of mom-and-pop stores. Second, we classify multi-unit firms into three types of chain firms: local, regional, and national. A firm is a local chain if it operates multiple establishments in only one state. A firm is a regional chain if it operates in at least two states but no more than ten states. Finally, a firm is a national chain if it operates in more than ten states.4 We use detailed information in the LBD to analyze the changes taking place in small geographic areas. This apparently simple task presents us with several challenges. Ideally we would like to define markets based on some measure of the geographic clustering of retailers and the population that they serve. However, county is the smallest reliable geographic unit of analysis that is available in the LBD. Coding to finer levels is less of a priority for the Census Bureau since few economic statistics are published for geographic units smaller than the county level, and as a result these measures are not as reliable.5 With these constraints we define local markets 4. We also explored an alternate definition using a measure of distance for all establishments within a firm. We find that this measure does differ somewhat from a number of states based definition. We decide to stay with the literature. 5. Depending on the availability and quality of a physical street address, the Census Bureau can, and does, assign more detailed geography codes. Depending on the year, between 60 and 75 percent of establishments have Census Block and Tract codes. In Jarmin and Miranda (forthcoming), we have assigned many of these establishments latitude and longitude coordinates.

The Role of Retail Chains: National, Regional, and Industry Results

247

based on the administrative definition of a county. This has both advantages and disadvantages. On the one hand, defining local markets in this fashion is clearly arbitrary. A local retail market can encompass multiple counties, particularly in metropolitan areas. At the same time one county can encompass multiple local markets, as is often the case in physically large or densely populated counties. On the other hand, even though the county unit is a relatively crude way to define retail markets, an advantage is that there is a large amount of county level information (e.g., population) that researchers have available to control for market characteristics. One market characteristic that receives a lot of attention in the literature is size. We have a wide variety of options available in measuring the size of a county market. For this chapter, we use a parsimonious and accessible measure of market size. In addition, for the statistics we generate and report, we do not want individual counties to change market type over the period under study. Thus, we classify counties as metropolitan, micropolitan, or rural based on their 2000 Core Base Statistical Area (CBSA) code.6 Even at this crude level of geography we find that about 4 percent of establishments in the LBD have inconsistent county codes. Census assigns these codes every year based on their physical or mail address. As a consequence it is not unusual in our data to see establishments that border county lines switching back and forth. This is primarily an artifact of updates to the census files that map street names to counties. In our empirical analysis, we assign a unique county code to establishments observed switching county codes.7 We assign the county coded during the latest census year when possible; otherwise, we assign the modal county for the establishment. Our eventual goal is to use variation in many dimensions at the county level to control for differences in market characteristics including demographic composition, population density, tax structure, communications infrastructure, and proximity to other population centers. There are 1,083 counties classified as metropolitan areas, 682 counties classified as micropolitan areas, and 1,336 counties classified as nonmetro areas based on CBSA codes. We refer to these nonmetro areas as rural areas. We exclude from our computations the states of Alaska and Hawaii as well as outlying U.S. territories. Table 6.1 shows that most of the 2000 U.S. population of individuals and firms is located in metropolitan areas. Approximately 17.3 percent of the population of individuals and 13.5 percent of the population of establishments is located in rural or micropolitan areas. On average, rural areas are less than 7 percent the size (in popula6. Detailed information on these new geographic definitions can be found in Office of Management and Budget (2000). 7. Miranda (2001) documents that approximately 4 percent of establishments show changes in county codes.

248

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

Table 6.1

Metro Micro Rural Metro Micro Rural

U.S. Retail markets by CBSA and rural areas in 2000 Counties

Population

1,083 682 1,336

229,783,293 29,023,781 19,229,414 212,173 42,619 14,404

Firms

Establishments

Employment

Payroll

Totals by market type 961,264 1,223,079 159,969 176,701 120,242 129,161

18,660,642 2,187,425 1,256,810

319,571,179 31,296,137 17,625,669

Averages by market type 888 1,129 235 259 90 96

17,231 3,212 939

295,080 45,956 13,163

Source: Own calculations from the LBD. Note: This number represents the number of firms operating in a CBSA type. Chain firms can operate in counties of, potentially, all three types. Thus, there is double counting of firms in the table. The number of retail firms operating in the U.S. in 2000 was 1,066,510.

tion terms) of metropolitan areas. The average micropolitan area is about 20 percent the size of the average metropolitan area. The decision to open (or close) an establishment in a particular market is made at the firm level. In this sense, the ability to identify firm dynamics in small geographic areas is critical for understanding firm behavior as well as their response to market changes. The detailed establishment-level data in the LBD allow us to identify when a firm first enters a county, when it exits a county, and whether it has a presence in other county markets. We can also identify firm expansions or contractions in a particular market, and whether it does so by adjusting employment at existing establishments or by adjusting the number of establishments. Note that as a result of our focus on local markets, a firm can be an entrant simultaneously into multiple markets and also account for one or more market exits in different locations. Similarly, an establishment entry is not necessarily a firm entry if the firm was already present in that market. Finally, the closure of an establishment does not necessarily generate a firm exit if the firm remains operational in the county. In the chapter we restrict our analysis to retail firms. The quality of the industry codes available on the LBD is critical to the construction of a retail sector micro data set. New establishments, especially those that begin operations between census years (i.e., those ending in two or seven) often have missing or poor quality industry codes. Between 1 percent and 10 percent of records have missing codes in the BR depending on the year and whether it is a single-unit or multi-unit establishment. Valid and improved codes are eventually obtained from direct Census Bureau collections or other sources and incorporated into the BR. These clean-up activities are concentrated in particular years, usually in preparation for an economic

The Role of Retail Chains: National, Regional, and Industry Results

249

census. To maximize the quality of industry codes on the LBD, we choose the best code available for each establishment and take advantage of codes obtained from various sources and at different times. In particular, we use census or survey collected data whenever possible, but we may use an administrative code if no other data is available.8 Industry codes are subject to change for particular establishments over time. This occurs for about 4.5 percent of the establishments classified as retail at some point in their operational existence. There are two possible reasons for this. First, establishment may legitimately decide to change its type of activity. Second, errors in the data are possible. We address both issues by assigning each establishment in our data a unique two-digit SIC that remains fixed over the establishment’s entire history. When possible, we use industry codes collected in surveys or the economic census for the unique SIC. Alternatively, we assign the unique SIC using the most recent SIC available on the file. A current limitation of the LBD is that it is based primarily on a SIC basis. From 1976 to 1996, the SIC industry codes were the basis for all Census Bureau publications. From 1997 onward, data have been published on a NAICS basis. The Census Bureau continued to maintain SIC industry codes on the BR through 2001. Since 2002, the Census Bureau maintains only NAICS industry codes on the BR, resulting in a potential time series break in the LBD data. In addition, it is possible that the quality of the SIC codes declined between 1998 and 2001. The LBD contains information on two important measures of establishment size, payroll and employment. Revenue information contained in the BR is not currently on the LBD since it is only available for single-unit firms and at the employer identification number (EIN) level for multi-unit firms. While payroll and employment are clearly two important measures of economic activity at the establishment, they only measure inputs to retail production. Success or failure of an establishment or firm should depend on profits. This means that researchers wishing to use detailed data on establishment and firm profits (or productivity) must rely on Census Bureau censuses and surveys. Finally, the LBD covers a relatively long period. It extends back to 1975, and covers the recessions of the early 1980s and early 1990s and spans a period of significant technical change and innovation. However, this may not be long enough to actually witness much of the structural change in the retail sector. As figure 6.3 shows, employment by chain stores surpassed that of single-establishment firms in 1977. It is likely that in order to observe the long run changes in the retail sector we would need a data set that extended 8. Industry codes are obtained from multiple sources and these can change depending on the year. The most reliable code is obtained from survey forms in Census years. Other sources include administrative data from the Internal Revenue Service (IRS), Social Security Administration (SSA), and the Bureau of Labor Statistics (BLS).

250

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

back to the 1940s or 1950s, when we would expect to find relatively few chain stores and the dominance of mom-and-pop stores. As we show in the following section, different types of geographic markets might be at different stages in this process, and we focus on the long run differences from 1976 to 2000. 6.4 Results In an average year, there are over 1.4 million retail establishments associated with over 1 million firms. The database used in this section consists of all retail establishments from 1976 to 2000. Data elements available for the period include industry, geography, payroll, and employment. In 2000, these firms employed more than 22 million workers and generated over $368.5 billion in payroll. The section is organized in the following manner. First, we examine the trends in the national market for our four types of firms: mom-and-pops, and local, regional, and national chains. Next, we look at similar patterns, but disaggregated by the three types of county markets: metropolitan, micropolitan, and rural. Finally, we summarize the results at the two-digit SIC industry level. 6.4.1 National Market, by Type of Firm In this subsection, we analyze some basic trends in the structure of retail markets averaged across all county markets. We first look at trends in the number and size of retail establishments (i.e., stores) by retail firm type. We then look at the basic establishment entry and exit statistics, also by retail firm type. Basic Results on Retail Market Structure: Trends in the Number of Size of Retail Establishments Figure 6.5 shows the mean number of retail establishments per 1,000 county residents over the 1976 to 2000 period broken out by the four types of firms. Overall, the mean number of retail establishments drops from 7.44 to 5.88 establishments for all counties. The only type of firm that experiences a decline in the number of establishments per capita over the period is the mom-and-pops. The number of mom-and-pop stores falls from 6.2 to 4.25 stores (or 31.4 percent) during this period. All three types of chains see the number of establishments increase during this period. Overall, chain stores increase from 1.32 to 1.76 establishments, or a 36.6 percent increase. On average, the composition of firm types in these markets is shifting from mom-and-pops to chains. Figure 6.6 combines the number of establishments and employment data to examine the shift in establishment size within these types of firms. We find that all types of firms grow on average, even the mom-and-pop stores. Mom-and-pops grow on average since their employment remains

The Role of Retail Chains: National, Regional, and Industry Results

Fig. 6.5

Mean number of establishments per 1,000 residents—all counties

Source: Own calculations from the LBD.

Fig. 6.6

Mean establishment size by firm type—all counties

Source: Own calculations from the LBD.

251

252

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

relatively constant, but the number of establishments on average declines during this period. However, they only grow from about five employees to about seven employees. We find that firms of all types have larger store sizes during this period, with the largest increase coming from national chains. Local chain stores increase employment from roughly nine to fifteen employees, regional chains from roughly twelve to nineteen, and national chains from roughly fifteen to twenty-five. Basic Results on Retail Market Structure: Establishment Entry and Exit The firm entry, exit, and continuer rates in tables 6.2, 6.4, and 6.5 are defined as in Dunne, Roberts, and Samuelson (1988). We define Nft–1 as the number of establishments owned by retail firms of type f in period t – 1, Xft as the number of establishments owned by firms of type f that were active in period t – 1 but are no longer active in period t, and Eft as the number of establishments owned by firms of type f that were not active in period t – 1, but are active in period t. Finally, we define Cft as the number of establishments owned by firms of type f that were active in both period t – 1 and t. Entry, exit and continuer rates are: Entry Rate:

ERft Eft/Nft1,

Exit Rate:

XRft Xft/Nft1

Continuer Rate:

CRft Cft/Nft1.

where f is in {single-unit, local chain, regional chain, national chain}. All rates are relative to the number of firms operating in the prior period, implying that XR CR 1 for each type of firm. We can also weight by employment to construct the entrant, exit, and continuer employment share.9 Table 6.2 reports these rates averaged across all years by retail firm type. The top panel shows unweighted entry, exit, and continuer rates. Recall that figure 6.5 shows a relatively large decline in the number of single-unit establishments per capita and slight increases in the number of establishments per capita for chains. The top panel of table 6.2 confirms that single location firms have higher rates of exit than entry and, thus, on average experience net exit each year. As we move to the different types of chains, the larger the chain, the lower the rates of both entry and exit (except for a slightly higher entry rate for regional chains). The overall effect is that net entry (ER-XR) is positive for all types of chains, and larger chains have higher rates of net entry. In the bottom panel of table 6.2, we present entry, exit, and continuer 9. In Dunne, Roberts, and Samuelson (1988), the entrant share of employment (ESH) is divided by the period t employment, but in this chapter we divide by period t 1 employment. The exit share of employment (XSH) is constructed the same way, dividing by the period t employment.

The Role of Retail Chains: National, Regional, and Industry Results Table 6.2

253

Establishment entry and exit rates for the U.S. retail sector (National rates averaged across all years, 1976–2000)

Unweighted Entry Rate (ER) Exit Rate (XR) Continuer Rate (CR) Weighted by employment Entrant Share (ESH) Exit Share (XSH) Continuer Share (CSH)

Single

Local

Regional

National

0.149 0.151 0.849

0.092 0.085 0.915

0.093 0.076 0.924

0.088 0.069 0.931

0.078 0.108 0.892

0.078 0.056 0.944

0.065 0.046 0.954

0.055 0.043 0.957

Source: The LBD.

rates weighted by employment. Across all firm types, entrants and exits tend to be smaller than continuing firms, thus the weighted entry and exit rates are lower than their unweighted counterparts. The results on employment-weighted shares show that the net entry of employment for chains is actually highest for local, regional, and then national chains on average during the period. 6.4.2 Results by Market and Firm Type In the previous section, we examined the national retail market; however, we have already shown that there are considerable differences across county types. In this section, we examine changes in market structure and dynamics across the three county market types and by firm type. We start by summarizing the changing nature of the distribution of the number of retail establishments and firms operating in county markets and retail employment by county type. We then look at firm entry into and exit from these county markets by county type. We focus on firm entry since the firm is the relevant decision maker in the market. Table 6.3 describes the distribution of establishments, firms, and employment per capita. It reports the mean number of establishments, firms, and employees per 1,000 county residents within each county type for both 1976 and 2000. We also report the standard deviation to provide a sense of the variation across counties within each type of county. We see a number of important differences between the three types of county markets. At the beginning of the period, rural counties have on average two more establishments per capita than do metropolitan counties, but they also have two more firms and nine fewer employees per capita. This implies that we observe a larger number (on a per capita basis) of smaller firms in rural areas. Micropolitan counties also have more establishments and firms per capita than metropolitan counties, but not as many as rural counties. In terms of employment, micropolitan and metropolitan

254

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

Table 6.3

County retail market structure: Number of establishments, firms, and employees by market type (based on per capita county level aggregates) Mean

Standard deviation

Year

Market Type

Establishments

Firms

Employment

Establishments

Firms

Employment

1976 1976 1976 2000 2000 2000

Metro Micro Rural Metro Micro Rural

6.3 7.5 8.3 5.2 6.0 6.4

5.8 7.2 8.1 4.5 5.6 6.1

47.8 48.4 38.2 70.9 71.2 52.7

1.9 2.0 2.8 1.9 2.0 2.9

1.8 1.9 2.8 1.7 1.8 2.9

25.5 18.6 18.1 37.6 28.2 28.2

Source: The LBD.

counties have roughly the same number of retail employees per capita. From 1976 to 2000, there is a significant decline in the number of establishments and firms in all types of county markets. At the same time, we observe a significant increase in the retail employment across all types of counties. Metropolitan and micropolitan counties continue to have roughly the same levels of retail employment, and rural counties are still significantly smaller. The overall effect is that the average size of an establishment has grown in each type of region. Finally, we see that the variance of the establishment and firm distributions did not change over time, but that the variance of the employment distribution increased over the period from 1976 to 2000. In table 6.4, we present average firm entry, exit, and continuer rates by metropolitan, micropolitan, and rural county types. As in table 6.2, we show the annual rates averaged over the entire period of 1976 to 2000. Like the results for establishments in table 6.2, we see that single-unit firms have higher entry and exit rate across all market types. Local chains have slightly higher rates of entry and exit than do regional and national chains. Table 6.4 shows only small differences between regional and national chains. Table 6.4 reveals that average net entry rates for single-unit retailers are negative for all market types. This is similar to figure 6.5, which showed the drop in the average number of single-unit establishments per capita across all counties. In contrast, net entry rates are nonnegative for chain retailers. Firm turnover rates are computed as the sum of the entry and exit rates (ER XR). These are a measure of churning within retail markets. We see from table 6.4 that single-unit retailers experience more churning that do chain stores. More interesting perhaps is the finding that turnover rates increase with market size. Metropolitan counties, in particular, experience more turnover across all types of retail firms than do micropolitan or rural markets. The difference in retail firm turnover between metropolitan and rural county market types is 0.006, 0.017, 0.038, and 0.019 for single units,

The Role of Retail Chains: National, Regional, and Industry Results Table 6.4

Entry Rate (ER) Rural Micro Metro Exit Rate (XR) Rural Micro Metro Continuer Rate (CR) Rural Micro Metro

255

Firm entry and exit rates for the U.S. retail sector (Rates by market type averaged across all years, 1976–2000) Single

Local

Regional

National

0.143 0.144 0.151

0.085 0.087 0.094

0.077 0.082 0.097

0.077 0.077 0.089

0.153 0.150 0.151

0.078 0.077 0.087

0.061 0.065 0.079

0.064 0.063 0.070

0.847 0.850 0.849

0.922 0.923 0.913

0.939 0.935 0.921

0.936 0.937 0.930

Source: The LBD.

local, regional, and national retail chains, respectively. Thus, we see that large metropolitan retail markets are characterized by fewer competitors per capita than rural and micropolitan county markets, but that competition in metropolitan markets is marked by higher firm turnover, and that this higher turnover is more pronounced among chain store retailers. Further, our firm turnover measure may understate the degree of volatility in county markets since retail chains can change their scale of activity in county markets by opening or closing stores. Our measure does not capture when firms expand or contract the number of stores in a county, as long as they continue to operate at least one store in the county. Table 6.5 shows employment-weighted entry, exit, and continuer rates. As before, we see that entrants and deaths tend to be smaller than continuing firms as reflected by the lower weighted entry and exit rates. This result is true across market types. Also note the net gain in employment from entry and exit of retail stores across market types for all retail chains. This is not the case for mom-and-pops, which show the highest losses in metropolitan areas. 6.4.3 Industry Differences In this section, we look at differences in producer dynamics and the role of chain stores across two-digit retail industries. First, we compare the number of county markets served by the four firm types in 1977 and 2000. We are trying to understand how the role of these firm types within county retail markets has changed over time and determine if there are systematic differences in these changes across different retail industries. The results of this exercise are reported in table 6.6. One important thing to note in table 6.6 is that many county markets are

256

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

not served by all retail firm types. Expectedly, most of the 3,101 U.S. counties (excluding Alaska and Hawaii), are served by single-unit firms in most two-digit SIC retail industries. However, the situation is quite different when looking at the different chain types. Indeed, it is often the case that the majority of U.S. counties are not served by one or more chain types within these broad two-digit SIC industries. From table 6.1 we know that rural counties are the dominant county market type numerically, are quite small, and may not offer sufficient demand to justify the scale of many

Table 6.5

Employment-weighted firm entry and exit rates for the U.S. retail sector (Mean by market type, 1976–2000)

Entrant Share (ESH) Rural Micro Metro Exit Share (XSH) Rural Micro Metro Continuer Share (CSH) Rural Micro Metro

Single

Local

Regional

National

0.078 0.078 0.078

0.072 0.078 0.078

0.055 0.058 0.067

0.060 0.051 0.055

0.107 0.107 0.109

0.053 0.052 0.057

0.039 0.040 0.047

0.040 0.040 0.043

0.893 0.893 0.891

0.947 0.949 0.943

0.961 0.960 0.953

0.960 0.960 0.957

Source: The LBD.

Table 6.6

Number of county markets served by different retail firm types (1977 and 2000, by two-digit SIC) Single

SIC 52 Building Materials and Hardware 53 General Merchandise 54 Food Stores 55 Auto Dealers and Gas Stations 56 Apparel and Accessories 57 Home Furnishing and Equipment 58 Eating and Drinking Places 59 Miscellaneous Retail Source: The LBD.

Local

Regional

National

1977

2000

1977

2000

1977

2000

1977

2000

3,005 2,835 3,089

2,960 2,138 3,072

1,909 1,485 2,327

1,765 629 2,352

1,484 1,886 1,891

1,380 843 2,277

1,157 2,087 1,770

1,490 2,673 1,806

3,096

3,066

2,441

2,504

1,954

2,407

1,770

2,039

2,904

2,518

1,865

1,092

1,544

1,180

1,852

1,763

2,848

2,792

1,666

1,429

1,020

1,035

954

1,393

3,095 3,067

3,088 3,060

2,062 2,480

2,384 2,224

1,603 1,631

2,275 1,804

1,465 2,101

2,010 2,204

The Role of Retail Chains: National, Regional, and Industry Results

257

chain retailers. Nevertheless, some retailers such as Wal-Mart have declared intentions for substantial expansion of the next several years.10 It will be interesting to see whether chains will continue to expand into new markets. The changes over the period in the number of county markets served by the different firm types are quite striking. We see that the number of counties served by at least one mom-and-pop retailer actually falls in every industry. The fall is not dramatic, but that fact that we observe a decline is surprising given the ubiquity of small retailers. On the other side, we find that the number of markets being served by a national chain is increasing for all industries and that some of the increases are dramatic. Results for local and regional chains vary across the different industries. General Merchandise firms show a very interesting trend. As expected, given the rise of stores such as Wal-Mart and Target and the consolidation of once-regional department stores, we see that the number of county markets served by national retail chains has grown substantially over the period. This growth is accompanied by dramatic reductions in the number of markets served by single-unit, local chains, and regional chains of general merchandise firms. The trends in the number of county markets served by the various firm types differ substantially across retail industries. In Eating and Drinking Places, there is only a small reduction in the number of markets served by single-unit producers and there are large increases in the number of markets served by all types of chains. Contrast that with the trends in Apparel and Accessories, where we see that the number of markets served by all firm types decreases as the industry shrinks. While changes in the number of markets served by the different types of firms are interesting, we also focus on how entry and exit rates (establishment and firm) differ across industries. We construct a more detailed data set with entry and exit rates defined within the county, year, two-digit SIC, and chain type. While more detailed industries at the six-digit level are potentially available in the LBD, we already have a significant number of industries at the two-digit level where we cannot construct an entry or exit rate (since Nft–1 0). We mitigate this problem by computing entry and exit rates as in Davis, Haltiwanger, and Schuh (1996): Efct ERfct [(Nfct Nfct1)/2] Xfct XRfct [(Nfct Nfct1)/2] 10. Wal-Mart’s 2005 annual report indicates that it plans to open 1,000 new supercenters in the United States over the next five years and Lee Scott, Wal-Mart’s CEO says there is room in the United States for 4,000 more Wal-Mart supercenters.

258

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

We summarize industry differences in entry and exit rates using a series of simple regressions. We include dummies for both firm and county market type. We also include a series of dummies for each five-year period from 1976 through 2000. The omitted group is mom-and-pop stores in rural markets during the period of 1996 to 2000. We present entry rate results for both establishments and firms in table 6.7. Looking at the intercept terms, we see that the industry with the highest establishment and firm entry rates is SIC 58, Eating and Drinking Establishments (this still holds if one uses the other coefficients to calculate entry rates for chains in nonrural counties). The industry with the lowest establishment and firm entry rates is SIC 52, Building Materials and Hardware.

Table 6.7 SIC

Intercept Time period 1976–1980 1981–1985 1986–1990 1991–1995 Market type Metro Micro Firm type National chain Regional chain Local chain Intercept Time period 1976–1980 1981–1985 1986–1990 1991–1995 Market type Metro Micro Firm type National chain Regional chain Local chain

Establishment and firm entry rate regressions 52

0.083

53

54

55

56

Panel A: Establishment entry rates 0.140 0.133 0.089 0.106

57

58

59

0.106

0.154

0.114

0.032 0.023 0.007 –0.011

–0.017 –0.026 –0.013 –0.008

0.010 0.017 0.000 –0.013

0.046 0.043 0.022 0.000

0.034 0.030 0.026 0.004

0.033 0.046 0.026 0.009

0.045 0.042 0.030 0.009

0.031 0.034 0.018 0.009

0.024 0.015

0.017 0.008

0.021 0.013

0.010 0.002

0.022 0.007

0.018 –0.005

–0.003 –0.002

0.013 0.001

–0.035 –0.040 –0.047

–0.078 –0.073 –0.073

–0.082 –0.090 –0.075

–0.041 –0.022 –0.029

–0.050 –0.038 –0.063

–0.053 –0.034 –0.052

–0.071 –0.068 –0.059

–0.046 –0.038 –0.062

0.088

0.141

Panel B: Firm entry rates 0.133 0.099 0.120

0.101

0.166

0.121

0.032 0.024 0.010 –0.012

–0.016 –0.010 0.010 –0.006

0.013 0.024 0.010 –0.008

0.043 0.041 0.014 –0.006

0.024 0.020 0.019 0.002

0.039 0.048 0.026 0.007

0.045 0.041 0.028 0.007

0.030 0.037 0.021 0.009

0.026 0.014

0.009 0.005

0.028 0.009

0.011 0.004

0.020 0.008

0.031 0.007

–0.001 0.001

0.013 0.004

–0.030 –0.028 –0.059

–0.091 –0.056 –0.083

–0.074 –0.062 –0.080

–0.029 –0.021 –0.047

–0.054 –0.036 –0.078

–0.029 –0.034 –0.073

–0.072 –0.052 –0.078

–0.031 –0.033 –0.075

Source: Own calculations from the LBD. Notes: Unit of Observation is a {county, year, firm type} cell. Regressions are run by 2-digit SIC with controls for time period, market type, and firm type. All coefficients are significant at the 5 percent level.

The Role of Retail Chains: National, Regional, and Industry Results

259

The pattern of estimated time period dummies generally show that entry rates are declining over time. We observe monotonic declines in the time period dummies in several industries. Only with SIC 53, General Merchandise Stores, do we observe a lower entry rate in the initial period than we do in the final period. This finding holds for both establishment and firm entry rates. With the exception of Eating and Drinking Places, SIC 58, entry rates are highest in metropolitan markets and slightly higher in micropolitan markets. This is similar to results for the entire retail sector shown in table 6.4. We find mixed results for the chain type dummies. The negative coefficients imply that the mom-and-pop stores have the largest entry rates, regardless of industry or unit of measure (establishment or firm). We find exit rate results for both establishments and firms similar to those for the entry rate. The results, presented in table 6.8, again show that SIC 58 has the highest establishment and firm exit rates and SIC 52 has the lowest. We also find that exit rates are declining over time, with the effect being monotonic in about half the industries. We generally find that exit rates are highest in metropolitan markets and slightly higher in micropolitan markets than in rural markets. We find mixed results for the different types of chains. The negative coefficients imply that the mom-and-pop stores have the largest exit rates, regardless of industry or unit of measure (establishment or firm). We also find that firm exit rates are next highest for regional chains for all industries, with no pattern for local and national chains across the industries. This pattern does not hold for establishment exit rates. 6.5 Conclusion This chapter provides a rich set of stylized facts describing the evolution of U.S. retail markets over the last thirty years. We use the Longitudinal Business Database, which offers a long time series of longitudinal data covering all retail establishments with paid employees. Detailed information on establishment location and firm ownership allows us to examine changes in market structure and producer dynamics, focusing on the role of retail chains. These data allow us to corroborate several important trends already described by other empirical work, as well as document some new findings. We document the steady ascendance of retail chains in terms of both their share of employment and establishments, as well as the decline of relatively small mom-and-pops. Customers shop at much larger stores today than they did thirty years ago. Interestingly, we find there are fewer establishments per 1,000 residents, but they are significantly larger. The absolute growth in the size of the national chain store is particularly striking in this regard. However, we also observe that single location mom-and-pop stores

260

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

Table 6.8 SIC

Establishment and firm exit rate regressions 52

Intercept Time period 1976–1980 1981–1985 1986–1990 1991–1995 Market type Metro Micro Firm type National chain Regional chain Local chain Intercept Time period 1976–1980 1981–1985 1986–1990 1991–1995 Market type Metro Micro Firm type National chain Regional chain Local chain

0.095

53

54

55

56

Panel A: Establishment exit rates 0.147 0.130 0.103 0.164

57

58

59

0.114

0.164

0.130

0.030 0.015 0.006 0.003

0.000 0.004 –0.005 0.003

0.028 0.016 0.014 0.005

0.088 0.046 0.028 0.004

–0.017 –0.022 –0.021 –0.003

0.031 0.019 0.002 –0.003

0.043 0.022 0.011 –0.003

0.029 0.010 0.000 0.000

0.018 0.008

0.015 0.006

0.019 0.009

0.010 0.002

0.017 0.006

0.021 0.002

–0.011 –0.006

0.005 –0.001

–0.072 –0.056 –0.057

–0.114 –0.083 – 0.072

–0.084 –0.095 –0.079

–0.067 –0.041 –0.055

–0.098 –0.062 –0.056

–0.082 –0.042 –0.050

–0.083 –0.100 –0.083

–0.073 –0.055 0.061

0.097

0.158

0.162

0.119

0.173

0.139

0.032 0.018 0.012 0.006

–0.014 0.000 0.000 –0.015

0.020 0.017 0.017 –0.004

0.083 0.044 0.017 –0.002

–0.005 –0.009 –0.012 0.005

0.037 0.021 0.006 –0.004

0.045 0.027 0.013 –0.009

0.029 0.011 0.003 –0.005

0.019 0.007

0.014 0.004

0.027 0.007

0.010 0.001

0.017 0.010

0.021 0.002

–0.007 –0.004

0.004 0.001

–0.058 –0.036 –0.061

–0.120 –0.053 –0.069

–0.089 –0.061 –0.074

–0.056 –0.035 –0.057

–0.100 –0.055 –0.067

–0.071 –0.038 –0.061

–0.081 –0.069 –0.081

–0.058 –0.036 –0.065

Panel B: Firm exit rates 0.136 0.115

Source: Own calculations from the LBD. Notes: Unit of Observation is a {county, year, firm type} cell. Regressions are run by 2-digit SIC with controls for time period, market type, and firm type. All coefficients are significant at the 5 percent level.

have grown larger over time, perhaps as a response to competitive pressures from chain stores. Our analysis by county market type shows that rural markets are still served by a relatively large number of small mom-and-pop stores. These areas are experiencing net losses of this type of store. Our regional analysis shows that there are fewer competitors in larger markets, but competition in these markets is marked by higher firm turnover across all firm types. The chapter also shows interesting differences across broad retail industries. Chain stores and mom-and-pop stores appear to be able to coexist in some industries better than others. Independent general merchandise stores and apparel and accessories store owners are disappearing from

The Role of Retail Chains: National, Regional, and Industry Results

261

many markets while independent eating and drinking places can still be found in most markets. In future work, we will delve deeper into the relationship between market size and market structure. How does the mix of ownership types change as market size changes? How does firm turnover change as market size changes? Asplund and Nocke (2006) develop a model with predictions regarding firm turnover and market size. They argue that turnover should be higher in larger markets. The LBD is ideal to look at this issue. How does firm size change in response to changes in market size? We can examine over a long period of time the relationship between establishment/firm size and how it varies across firm type.

References Asplund, M., and V. Nocke. 2006. Firm turnover in imperfectly competitive markets. Review of Economic Studies 73 (2): 295–327. Bagwell, K., G. Ramey, and D. F. Spulber. 1997. Dynamic retail price and investment competition. RAND Journal of Economics 28 (2): 207–27. Basker, E. 2005. Job creation or destruction? Labor-market effects of Wal-Mart expansion. Review of Economics and Statistics 87 (1): 174–83. Bayard, K., and S. D. Klimek. 2004. Creating a historical bridge for manufacturing between the Standard Industrial System and the North American Industry Classification System. 2003 Proceedings of the American Statistical Association, Business and Economic Statistics Section (CD-ROM): 478–84. Campbell, J., and H. Hopenhayn. 2005. Market size matters. Journal of Industrial Economics 53 (1): 1–25. Davis, S. J., J. C. Haltiwanger, and S. Schuh. 1996. Job creation and destruction. Cambridge, MA: MIT Press. Dinlersoz, E. M. 2004. Firm organization and the structure of retail markets. Journal of Economics and Management Strategy 13 (2): 207–40. Doms, M. E., R. S. Jarmin, and S. D. Klimek. 2004. Information technology investment and firm performance in U.S. retail trade. Economics of Innovation and New Technology 13 (7): 595–613. Dunne, T., M. J. Roberts, and L. Samuelson. 1988. Patterns of Firm Entry and Exit in U.S. Manufacturing Industries. RAND Journal of Economics 19 (4): 495–515. Ellickson, P. B. 2005. Supermarkets as a natural oligopoly. Duke University Department of Economics. Working paper 05-04. Foster, L. S., J. Haltiwanger, and C. J. Krizan. 2006. Market selection, reallocation and restructuring in the U.S. retail trade sector in the 1990s. Review of Economics and Statistics 88 (4): 748–58. Holmes, T. J. 2001. Bar codes lead to frequent deliveries and superstores. RAND Journal of Economics 34 (4): 708–25. Jarmin, R. S., and J. Miranda. 2002. The longitudinal research database. Center for Economic Studies. CES Working paper CES-WP-02-17. ———. Forthcoming. The impact of Hurricane Katrina on business establishments. Journal of Business Valuation and Economic Loss Analysis.

262

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

McKinsey Global Institute. 2002. How IT Enables Productivity Growth MGI Reports, November. Available at www.mckinsey.com/knowledge/mgi/IT/ Miranda, J. 2001. LBD documentation: Geography. Center for Economic Studies, U.S. Census Bureau, Technical Notes CES-TN-2001-02, January. Office of Management and Budget. 2000. Standards for defining metropolitan and micropolitan statistical areas. Federal Register 65 (249): 82228–38. Sieling, M., B. Friedman, and M. Dumas. 2001. Labor productivity in the retail trade industry 1987–99. Monthly Labor Review Online 124 (12): 3–14. U.S. Bureau of the Census. 1971. Census of business, 1967; Vol. I, Retail tradesubject reports. Washington, D.C.: U.S. Government Printing Office. ———. 1994. Statistical abstract of the United States: 1994 (114th edition) Washington, D.C.: U.S. Government Printing Office.

Comment

Jeffrey R. Campbell

Technology introduction takes place firm-by-firm and establishment-byestablishment. Even a good idea that falls from the sky (the classic neutral technology shock) must be read and incorporated into a production plan. For this reason, the analysis of individual producers’ birth, growth, and death occupies a central place in productivity analysis. The Longitudinal Research Database provided the first observations of this process for the United States’ Manufacturing sector, and its analysis by Dunne, Roberts, and Samuelson (1988), Bartelsman and Dhrymes (1998), and others created a new appreciation of creative destruction’s contribution to productivity growth. Of course, these empirical developments would have been impossible without the contributions of Jovanovic (1982) and Hopenhayn (1992) to the theory of industry dynamics. Manufacturing led U.S. economic growth through the 1960s, but Retail Trade and Services have worn the yellow jersey since then. Further progress relating productivity growth to industry dynamics therefore requires our empirical and theoretical work to catch up to this new leading sector. Jarmin, Klimek, and Miranda have given us a substantial push in this direction. Although they are not the first to examine producer-level data from Retail Trade, they are the first (to my knowledge) to do so in light of that sector’s central economic fact: the replacement of stand-alone momand-pop stores by large chain stores with low prices. Today, Wal-Mart’s rise occupies the headlines, but regional and nation chain growth inspired the anti-chain-store movement of the 1920s and 1930s. The specific players and their tactics have changed, but the issues at hand remain the same: do

Jeffrey R. Campbell is a senior economist at the Federal Reserve Bank of Chicago and a faculty research fellow of the National Bureau of Economic Research. E-mail: jcampbell @frbchi.org

The Role of Retail Chains: National, Regional, and Industry Results

263

new low-cost retailers improve welfare by lowering prices or retard it by lowering wages and displacing other competitors? Previous work on the Manufacturing sector has left us ill prepared for these questions, because the dominant approach to that sector presumed some form of atomistic competition (either the price-taking perfect variety or the price-setting monopolistic variety) in which strategic interactions are absent. Evidence in Campbell and Hopenhayn (2005), Campbell (2006), and Yeap (2005) shows that atomistic competition cannot even rationalize basic features of the data like the dependence of establishment size, prices, and turnover on market size. The first step in understanding industry dynamics and productivity growth in the retail and service sectors is to confront the strategic aspects of their market interactions. Jarmin, Klimek, and Miranda contribute to this by delineating the important players in any retail market and by reporting useful stylized facts about trend rates of displacement and turnover. In this discussion, I wish to complement their contribution with a relatively simple model of dynamic retail competition with both chain stores and independents. Retail has great potential for strategic complexity. Firms operate several distinct technologies and differentiate themselves geographically. It is not difficult to specify a model that embodies all of these features. However, such a model’s complexity precludes its analytic characterization. There might be one equilibrium or many, and they do not lend themselves to local comparative statics results like those Hopenhayn (1992) develops for a competitive industry. With this in mind, the model I develop vastly simplifies the spatial aspects of competition so that we can learn something about competition between dominant chain producers and a fringe of high-cost independent producers. A Model To build a model with nontrivial dynamics and strategic interaction, I draw on my previous work with Jaap Abbring (Abbring and Campbell 2006). We develop a model of Markov-perfect duopoly dynamics with stochastic demand, sunk costs of entry, and irreversible exit. Firms make their continuation decisions oldest first, and we focus on the unique equilibrium in which firms’ exits follow a last-in first-out pattern. In this discussion, I construct a symmetric equilibrium in a similar model with a fringe of monopolistically-competitive independent producers. Primitives Consider a region with a central city and a large number L of outlying villages. The city’s name is 0, and the villages names are j 1, . . . , L. The villages are arranged in a circle with the city at its center. A single road connects each village to the city. We denote the population of location j in

264

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

year t with C jt. These follow independent (across locations) Markov chains. With probability , C jt C jt–1. With the complementary probability, Cjt is a draw from a uniform distribution on [Cˆv /L, Cˇv /L] for a village and [Cˆc, Cˇc ] for the city. Consumers have identical incomes measured in money ( y) and they allocate their purchases across an outside good available everywhere at a price of 1 and the good of interest. This latter good is not necessarily available at the same price everywhere. If a consumer purchases q units of this good at a price of p in her home location, then her utility level is q

0 D

(x)dx y pq.

1

If she has to travel to make the same purchase, she must pay a transportation cost T (in units of money). For simplicity, assume that a villager may only travel to the city and ignore the possibility of a city dweller shopping in a village. Consumers’ travel costs are random. The c.d.f (x) Pr[T x] governs their distribution There are two production technologies. One has higher fixed costs and lower variable costs than the other. I refer to these as the big-box and independent technologies. Each village has one potential entrant per period. This firm must choose between entering at that location with the independent technology or remaining out of the market. This opportunity always goes to a new firm, so the decision to remain inactive is irreversible. The city has two potential entrants each period, and each of them has access to only one of the technologies. As with the villages’ potential entrants, they cannot delay their entry decisions. The sunk cost of entering with the big-box technology is b. Producing with this technology in any period after entry requires paying the fixed cost

b. The only way of avoiding this fixed cost is to exit irreversibly. This technology’s constant marginal cost of production is b. Entering with the independent technology requires no sunk cost and a per-period fixed cost of

i b. A higher marginal cost i b and a shorter life span offset these advantages. A firm entering with the independent technology can produce for only one period. Relaxing this extreme assumption in future versions of this model is clearly desirable. Firms with the big-box technology discount future profits with the constant discount factor 1. The model has two physical state variables, the number of firms which produced in→the city in the previous period, N 0t , and the vector of market populations Ct (C 0t , C1t , . . . ,C Lt ). Each period, the sequence of actions proceeds as follows. 1. All potential entrants and incumbent firms (in the city) observe the → realization of Ct. 2. Any incumbent firms make their continuation decisions simultaneously.

The Role of Retail Chains: National, Regional, and Industry Results

265

3. The city’s big-box potential entrant decides whether or not to enter. 4. The city’s independent potential entrant decides whether or not to enter. 5. The villages’ potential entrants make their entry decisions simultaneously. 6. With observations of their travel costs and all firms’ entry and continuation decisions, consumers select their shopping locations. 7. After the consumers arrive at their shopping locations, firms simultaneously choose quantities. An auctioneer then sets prices to clear the locations’ markets. Equilibrium With the model’s primitives in place, we seek a Markov-perfect equilibrium. We first characterize the static parts of the model, which correspond to stages 5 through 7 in the previous list. With these solved, I build on results from Abbring and Campbell (2006) to characterize the dynamics of the big-box sector. To simplify the analysis, I proceed under two assumptions: entering as the third big-box producer and entering the city as an independent with a big-box firm committed to production are dominated strategies. With the proposed equilibrium in place, finding conditions that guarantee this will be the case is not hard. Static Play Begin with the firms’ quantity decisions. All firms’ profits are linear in the number of customers shopping at their locations, so we can consider their choices of quantity per customer. By construction, at most one firm serves each village. Its producer surplus per customer is [D–1(q) – i]q. Denote the profit-maximizing choice of q with qi∗ and the resulting per customer surplus with ∗i [D–1(q∗i ) – i]q∗i . For a firm exclusively operating the big-box technology in the city, the choice of q is similar. The resulting per customer profit and its associated quantity are ∗b(1) and q∗b(1). If there are two big-box firms operating, then their quantity decisions correspond to the standard Cournot solution. Denote the per customer duopoly profit for a firm with marginal cost w facing a rival with ∗b(2) and the per customer duopoly quantity (summed across both firms) with q∗b(2). The quantity choices and the resulting prices determine villagers’ choices of shopping locations. A resident of a village with no producer chooses to shop in the city if her utility gain from doing so exceeds her travel cost. That is, if q∗b(Nt0)

0

D1(x)dx D1[q∗b(N 0t )]q∗b(N 0t ) T.

The last term on the left-hand side is the total purchase cost of the q∗b(N 0t ) units of the good. If we call the left-hand side of this inequality Wc(N 0t ),

266

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

then the fraction of such villagers choosing to shop in the city is [Wc(N0t )]. For residents of villages with producers, purchasing from the local producer is the alternative to shopping in the city. The utility gain from shopping locally (compared with consuming the entire budget in the outside good) is Wv(N 0t )

q∗

i

0

D1(x)dx y D1(q∗i )q∗i .

Clearly, the local producer’s profit maximization guarantees that this is positive, so the fraction of consumers choosing to shop locally is 1 – [Wc(N 0t )– Wv(N 0t )]. The remaining consumers shop in the city. Given the number of firms serving the city, a village’s potential entrant rationally forecasts Wc(N t0) and consumers’ travel decisions and decides to enter only if the corresponding profit is nonnegative. That is, if C jt{1 [Wc(N 0t ) Wv (N 0t )]}∗i κi 0. Clearly, there is a threshold value of population C i(N 0t ), which sets this j profit to zero. Entry into village j is profitable if C t C i(N 0t ). Because L is large and the villages’ populations are statistically independent, we can apply a law of large numbers to show that the number of villagers traveling to the city is a nonstochastic function of only Nt0, the number of competitors in the city. In any given period, the number of residents of villages with no local producer equals 1/2{[C i (N 0t )] 2 – (Cˆv)2}/(Cˇv – Cˆv). The remaining villagers have the option of purchasing from a local producer. Putting these together, we get that the number of villagers shopping in the city equals [C i(N 0t )]2 (Cˆv)2 M(N 0t ) [Wc(N 0t )] 2(Cˇv Cˆv) (Cˇv)2 [C i(N 0t )]2 [Wc(N 0t ) Wv(N 0t )] . 2(Cˇv Cˆv) Dynamic Big-Box Competition The sunk costs of entry and incumbents’ priority in serving a market make the problem of an entrant using the big-box technology dynamic. To characterize the evolution of big-box competition, consider the dynamic game with only the big-box firms as players and payoffs given by the outcome of the static competition described above. I construct a very simple Markov-perfect equilibrium for this game. It is symmetric in the sense that duopolists’ continuation decisions follow the same mixed strategy. The equilibrium construction begins with the problem of a pessimistic duopolist who believes (irrationally) that the rival firm will never exit. Its current profit is [C 0t M(2)](2)/2 – b. It will earn this until the next time that C 0t changes, at which point the new demand value will be statistically

The Role of Retail Chains: National, Regional, and Industry Results

267

independent of its current value. The conjecture that the rival will never exit allows us to show that the following piecewise-linear function of C t0 gives this duopolist’s value.

[C t0 M(2)](2) 1 b v˜(2) 2 v(C 0t , 2) 1 (1 )

0

if C 0t C2 otherwise.

Here, C 2 is the largest value of C that sets v(C,2) to zero and v˜(2)

v(C, 2) dC. (Cˇ Cˆ ) Cˇ

Cˆ

Let C 2 be the unique value of C which sets v(C, 2) to ϕb. If C 0t exceeds this threshold, then creating a duopoly through entry is rational given the pessimistic expectation that the incumbent will never exit. The next step is to consider the problem of an incumbent monopolist that expects • the potential entrant will actually enter if and only if C 0t C 2, and • the potential entrant will never exit following entry. With these expectations, the value of such a monopolist is also piecewise linear in C.

if C 0t C 2

v(C 0t , 2)

1 v(C 0t , 1) [C 0t M(1)(1) b v˜ (1)] if C1 C 0t C 2 1 (1 ) 0

otherwise.

In parallel with the case of the pessimistic duopolist, C1 is the largest value of C that sets v(C, 1) to zero and Cˇv v(C, 1) v˜(1) dC. Cˆv (Cˇv Cˆv)

Entry places this incumbent into the position of the pessimistic duopolist, so v(C, 1) v(C, 2) if C C 2. Otherwise, this incumbent expects to earn the monopoly profit until either C decreases below C1 or increases above C 2. The players in this game are any initial incumbents and the entire sequence of potential entrants. A Markovian strategy for a player is a pair of functions As(N, C ) and AE (N, C ) which give probabilities of survival and entry as a function of the number of incumbent firms and the current demand state. A strategy forms a symmetric Markov-perfect equilibrium if any action it prescribes with positive probability yields a weakly higher

268

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

payoff than any other action given that all other players follow the same strategy. Consider the following strategy built from the value functions v(C, 1) and v(C, 2).

0 1 A (1, C) 0 1 A (1, C) 0 AE(0, C)

1

E

S

if C C 1 otherwise, 2 if C C otherwise, if C C1 otherwise,

1

AS(2, C) p(C) 0

if C C2 if C C2 , 1 C otherwise

where v(C, 1) p(C) . v(C, 1) {[C M(2)](2)/2 κb v˜(2)} Verifying that this strategy forms a symmetric Markov-perfect equilibrium begins by showing that v(C, 1) and v(C, 2) give the values of a monopolist and duopolist when all firms follow this strategy. The key to this is to note that the mixed strategy p(C ) yields an expected payoff of zero to a firm that chooses not to exit. Such a firm trades off low duopoly profits (partially offset by the probability of a favorable later realization of C ) with the possibility of outlasting the rival and becoming a monopolist. With this established, deviations from the given strategy cannot improve either firm’s payoff by construction. Equilibrium Summary How would data generated by this equilibrium appear to an econometrician? The big-box sector will be either empty, a monopoly, or a duopoly at any given moment. Changes in demand will shift it between those three states. If we think of each village’s independent producer as an establishment, then the econometrician observes entry when the village acquires a producer and exit when the village’s producer exits. These changes will arise from idiosyncratic village-level demand shocks and in response to changes in the big-box sector. Specifically, an increase in C 0t can induce bigbox entry, thereby lowering prices and drawing villagers with low transportation costs to the city. This lowers the profitability of operating the in-

The Role of Retail Chains: National, Regional, and Industry Results

269

dependent technology in a village of any given size, so the expansion of the big-box sector comes at the expense of the independent producers. Accordingly, the number of independent producers shrinks. These dynamics mimic the salient facts Jarmin, Klimek, and Miranda document: big-box and independent retailers compete for the same customers, and the entry and exit rates of both types of firms are positive. What is to be Done? The present model helps us see Jarmin, Klimek, and Miranda’s findings in the context of a single market outcome. While that in itself could be helpful and might inspire the creation of new stylized facts, it is only one small step towards quantifying the welfare and productivity contributions of chain retailers. Although the model has some obvious shortcomings, addressing all of them is not the most obvious high marginal product task at hand. I would like to focus my conclusion on one task that is central: understanding the possibilities for technological change and diffusion in the retail trade sector. Big-box retailers (and before them chain retailers) have a well-deserved reputation for deploying new technology. The macroeconomic consequences of this are large, as documented by Basu, Fernald, Oulton, and Srinivasan (2003). Nevertheless, there exists no consensus view on the constraints and possibilities for developing retail technology. Are most innovations accidental or the outcome of deliberate research? How do leadingedge technologies diffuse from their origin to the industry as a whole? How important is true innovation relative to imitation of other industries’ practices? Without answers to these questions, it will be hard to judge how impeding chain store development changes growth and welfare. I expect answers to come from theory, case studies, and further econometric work on large enterprise data sets. References Abbring, J. H., and J. R. Campbell. 2006. Last-in first-out oligopoly dynamics. Federal Reserve Bank of Chicago Working paper 2006-28. Bartelsman, E. J., and P. J. Dhrymes. 1998. Productivity dynamics: U.S. manufacturing plants, 1972–1986. Journal of Productivity Analysis 9 (1): 5–34. Basu, S., J. G. Fernald, N. Oulton, and S. Srinivasan. 2003. The case of the missing productivity growth, or does information technology explain why productivity accelerated in the United States but not in the United Kingdom? In NBER Macroeconomics Annual 2003, ed. M. Gertler and K. Rogoff, 9–63. Cambridge, MA: MIT Press. Campbell, J. R. 2006. Competition in large markets. Federal Reserve Bank of Chicago Working Paper 2005-16. Campbell, J. R., and H. A. Hopenhayn. 2005. Market size matters. Journal of Industrial Economics 53 (1): 1–25.

270

Ronald S. Jarmin, Shawn D. Klimek, and Javier Miranda

Dunne, T., M. J. Roberts, and L. Samuelson. 1988. Patterns of firm entry and exit in U.S. manufacturing industries. RAND Journal of Economics 19 (4): 495–515. Hopenhayn, H. A. 1992. Entry, exit, and firm dynamics in long run equilibrium. Econometrica 60 (5): 1127–50. Jovanovic, B. 1982. Selection and the evolution of industry. Econometrica 50 (3): 649–70. Yeap, C. 2005. Competition and market structure in the food services industry: Changes in firm size when market size expands. University of Minnesota. Unpublished Manuscript.

7 Entry, Exit, and Labor Productivity in U. K. Retailing Evidence from Micro Data Jonathan Haskel and Raffaella Sadun

Introduction The retail sector has gradually become one of the most prominent industries of the U.K. economy, absorbing approximatively 20 percent of total employment in 2004 and experiencing average annual employment growth rates of about 1 percent per annum over the last decade (EUKLEMS 2008). The expansion of the sector does not seem to be matched by an equally impressive productivity performance. As documented by Basu et al. (2003), while retail trade, hotels, and catering account for about three-quarters of the U. S. Total Factor Productivity (TFP) acceleration between 1995 and 2003 (Domar-weighted industry TFP growth), the same sector seems to account for about a third of the U. K. TFP deceleration. These stylized facts have made the retail industry an area of both policy and academic interest. The purpose of this chapter is to inform the recent debate surrounding the productivity of the U. K. retail sector with new evidence arising from Jonathan Haskel is head of the economics department at Queen Mary, University of London. Raffaella Sadun is a research officer at the Centre for Economic Performance, London School of Economics. Financial support for this research comes from the ESRC/EPSRC Advanced Institute of Management Research, grant number RES-331-25-0030, and is carried out at CeRiBA, the Centre for Research into Business Activity, at the Business Data Linking Branch at the ONS; we are grateful to all institutions concerned for their support. This work contains statistical data from ONS which is crown copyright and reproduced with the permission of the controller HMSO and Queen’s Printer for Scotland. The use of the ONS statistical data in this work does not imply the endorsement of the ONS in relation to the interpretation or analysis of the statistical data. We thank Ralf Martin for helpful discussions and Felix Ritchie (ONS) for help on the data. We also thank participants at the CRIW conference and our discussant David Audretsch. Any errors are our own.

271

272

Jonathan Haskel and Raffaella Sadun

previously unexplored micro data sources. The chapter investigates the U.K. retail sector using store- and firm-level data between 1998 and 2003. First, we present the first—to the best of our knowledge—exhaustive description of the U. K. retail sector using micro data sources.1 Second, in the spirit of Foster, Haltiwanger, and Krizan (2006), we look at the contributions of firm entry and exit for the productivity growth of the sector. Third, we provide some new evidence of the recent shift of large U. K. retailers toward smaller retail formats (also documented by Griffith and Harmgart [2005]), which followed the introduction of new and more restrictive planning constraints for the opening of large retail stores. Based on a companion work (Haskel and Sadun 2007), we suggest that this change in the store configurations of the major U. K. retailers might be one of the factors behind the recent TFP slowdown experienced by the industry in the United Kingdom.2 The plan of the chapter is as follows. In section 7.2 we document the data sources, then describe, in section 7.3, entry and exit. Section 7.4 looks at productivity levels and growth and regulations that might have affected it, and section 7.5 concludes. 7.2 Data 7.2.1 Time Period and Industries The data in this chapter comes from the Annual Respondents Database (ARD). This is a comprehensive business database that is based on the Annual Business Inquiry (ABI) performed by the Office for National Statistics (ONS). Regarding time period, the data available to us is annual from 1997. As we shall see, however, the 1997 data is not accurate, therefore in practice our analysis starts in 1998. At the time of writing the 2003 data was the final period available.3 As for industries, the ARD database covers almost all firms with Standard Industrial Classification (SIC) codes from 2010 to 93050. The retailing sector is covered by SIC92 codes from 52111 to 52740 (i.e., all codes beginning with 52). Retailing is then split into seven broad categories, as listed in table 7.1.

1. With the exception of Haskel and Khawaja (2003), an early version of this chapter. The main difference between this chapter and the previous version is that this one uses an extra year of data, and computes numbers using a different employment measure. The latter turns out to make a substantial difference since the earlier employment measure was available only for a subset of firms, causing many firms to be dropped. This affects the productivity decompositions. 2. See Haskel and Sadun (2007), and Haskel et al. (2007). 3. We particularly thank Felix Ritchie for helping in the timely provision of the 2002 and 2003 data.

Entry, Exit, and Labor Productivity in U.K. Retailing Table 7.1

273

Industries covered in UK ARD retailing data

SIC code

Industry

Notes

521

Retail sales in nonspecialized covering food, beverages, or tobacco (for example)

Includes supermarkets and department stores

522

Food, beverages, tobacco in specialized stores

523

Pharmaceutical and medical goods, cosmetic, and toilet articles

Includes chemists

524

Other retail sales of new goods in specialized stores

Includes sales of textiles, clothing, shoes, furniture, electrical appliances, hardware, books, newspapers and stationary, cameras, office supplies, computers. Clothing is the biggest area

525

Secondhand

Mostly secondhand books, secondhand goods, and antiques

526

Not in stores

Mostly mail order and stalls and markets

527

Repair

Repair of personal goods, boots and shoes, watches and clocks

7.2.2 Units of Analysis A crucial issue in what follows will be whether the analysis is by store, chain of stores, or chain of chain of stores. This section sets out in some detail what data are available to us.4 To summarize: 1. Employment, entry, and exit data are available at the store level. The store is defined as a Local Unit (LU). 2. Productivity data are available at the firm level. The firm is defined as a Reporting Unit (RU). Business Structure: Enterprises, Enterprise groups, and Local Units The fundamental business data set in the United Kingdom is the Interdepartmental Business Register (IDBR). This business register is compiled using a combination of tax records on Value Added Tax (VAT) and PayAs-You-Earn (PAYE), information lodged at Companies House, Dun and Bradstreet data, and data from other surveys. The IDBR has been operating since 1994 (before that the IDBR register information was rather uncoordinated across different government departments). The IDBR tries to 4. It follows closely Criscuolo, Haskel, and Martin (2003).

274

Jonathan Haskel and Raffaella Sadun

capture the structure of ownership and control of firms and plants or business sites that make up the U. K. economy using three aggregation categories: local units, enterprises, and enterprise groups. Their meaning is best illustrated by means of an example set out in figure 7.1. Consider the left hand panel. Suppose that Brown is a single business, operating in a single location, producing goods for a single industry. Now consider the right side of the panel. Smith and Jones Holdings are a holding company, registered in London. In turn, they own two businesses, Smith and Jones, who are involved in separate industrial activities. Smith has four shops (or more generally plants/business sites, that is, a particular geographic location where trade occurs): Smith North, Smith South, Smith East, and Smith West. Jones has a shop, Jones North and a Research and Development lab, Jones R&D. Brown, being responsible for a single business activity, is an enterprise. Smith and Jones Holdings, owing businesses with distinct business activities, is called an enterprise group.5 Smith and Jones are two enterprises. All business sites, a business entity at a single mailing address, are called local units. Consequently, if Jones R&D is located at a different site than Jones North the enterprise Jones would consist of two local units. If Jones R&D was located at the same site as Jones North the two would form one local unit for the IDBR.6 (The diagram also refers to reporting units; this will be explained later.) Maintaining Information on Business Structure: Enterprise Groups, Enterprises and Local Units The Annual Register Inquiry (ARI) is designed to maintain the business structure information on the IDBR (Jones 2000). It began operation in July 1999 and is sent to large enterprises (over 100 employees) every year, to enterprises with twenty to ninety-nine employees every four years, and to smaller enterprises on an ad hoc basis. The ARI currently covers around 68,000 enterprises, consisting of about 400,000 local units. It asks each enterprise for employment, industry activity, and the structure of the enterprise. This is straightforward for the Brown enterprise in our example. A multisite enterprise such as Smith receives a form and is asked to report on its overall activity and employment. It will also be sent four extra forms to report the same for each local unit. If Smith has closed a local unit it must report this on the form. If a local unit has opened it has to fill out extra forms, which are obtained from ONS by an automated procedure. Returns from the ARI update the IDBR in the summer of each year. 5. A holding company responsible for a number of enterprise groups is called an apex enterprise. 6. The two could nevertheless be separate local units depending on the survey. If, for example, an R&D survey which collects data just for the R&D part of the business was undertaken, this would identify them as distinct. Thus, some care has to be taken in matching business using different surveys.

Entry, Exit, and Labor Productivity in U.K. Retailing

Fig. 7.1

275

Plants and firms in the IDBR

Maintaining Information on Employment, Turnover and Other Data As well as the structure of business information, the IDBR holds other data, such as address and SIC code. However, since the IDBR is based mostly on tax data (plus old records from previous inquiries), it also sometimes contains other data. Output information on the IDBR comes from VAT records if the original source of business information was VAT data. Employment information comes from PAYE data if that is the source of the original inclusion. Thus, as long as the single-local unit enterprise Brown is large enough to pay VAT (the threshold was £52,000 in 2000/01), it would have turnover information at the enterprise and local unit level. On the other hand, if Brown does not operate a PAYE scheme, it will have no employment information. However, employment data is required to construct sampling frames and hence is interpolated from turnover data. For the multi-local unit enterprise Smith, no turnover information will be available for Smith’s local units, since most multi-local unit enterprises do not pay VAT at the local unit level. If the PAYE scheme is operated at the local unit level, it would have independent employment data. 7.2.3 The ABI and the ARD While the IDBR holds much useful information, more data is required on outputs and other inputs in order to calculate GDP. Thus, the ONS con-

276

Jonathan Haskel and Raffaella Sadun

ducts a business survey based on the IDBR called the Annual Business Inquiry (ABI). The ABI covers production, construction, and some service sectors, but not public services, defense, and agriculture.7 The ARD consists of the panel micro-level information obtained from successive crosssections of the ABI. The questions asked on the ABI for retailing vary somewhat. They are required to provide details on turnover (total and broken down in retail and nonretail components, and by commodity sold), expenditures (employment costs, total materials, and taxes), items defined as work in progress, and capital expenditures (separately for acquisitions and disposals). They also have to answer sections related to import or export of services and on the use of e-commerce and employment, with further data on parttimers. However, the survey form can be sent in a long or in a short format. The main difference between the two types of formats is that in long format firms are required to provide a finer detail of the broad sections defined previously. For instance, in the long format firms break down their disposals and acquisitions information about twenty different items, whereas in the short format they only report the aggregate values. Also, in the long format, firms answer on questions such as the total number of sites and the amount of squared meters they consist of. Reporting Units, Selected and Nonselected Data The ABI is covered by the Statistics of Trade Act (1947); therefore, the firms are obliged by law to provide data if they get a form.8 To reduce compliance costs, however, the ABI is not a census of all local units. This is in two regards: aggregation and partial sampling. Regarding aggregation, en7. The ABI replaces Annual Employment Survey, Annual Census of Production and Construction (ACOP/ACOC), and the six following Annual Inquiries: wholesale, retail, motor trades, catering, property, service trades. In Catering and Allied Trades, between 1960 and 1979 there was a benchmark inquiry into catering roughly every four years or so, but from 1979 the inquiry became annual. There has been a property inquiry since the mid-1950s, but until 1994 data was only collected on capital expenditure. From 1995, the range of data was extended to bring the inquiry in line with the other DS inquiries. The first major inquiry into Wholesaling and Dealing was carried out in respect of 1950, as part of the Census of Distribution. Subsequently, periodic large-scale detailed inquiries were conducted in respect of 1959, 1965, 1974, and 1990, but simpler annual inquiries were conducted for most intervening years and for all years since 1991. The first major inquiry into motor trades was carried out in 1950 as part of the Census of Distribution. Subsequently, periodic large-scale inquiries were conducted in respect of 1962, 1967, and 1972, although simple annual inquiries were carried out in most intervening years. By 1977 the annual inquiry was collecting detailed information on turnover and purchases. Regarding retailing, from 1950 periodic Censuses of Distribution were conducted, the last of which was in 1971. Full-scale inquiries covering every retail business and every retail outlet were taken for 1950, 1961, and 1971, with large-scale inquiries for 1957 and 1966. The first annual retailing inquiry was conducted in respect of 1976 with a sample of 30,000 units. Throughout the late 1970s and 1980s the inquiry varied from year to year in terms of both sample size and the amount of information collected. From 1991 to 1997 the sample remained reasonably constant at around 12,000. 8. Companies who have to fill out a form can refer to http://www.statistics.gov.uk/about/ business_surveys/abi/default.asp for help and information.

Entry, Exit, and Labor Productivity in U.K. Retailing

277

terprises normally report on all their local units jointly. There are two major exceptions. First, if the enterprise has local units in both Britain and Northern Ireland, there is a legal requirement for the ONS to keep data for these two areas separate, and therefore enterprises are required to report data separately in this case. Second, there is separate reporting on LUs if a business explicitly requests such a split. So, for example, Smith may decide to report on North and South combined and East and West separately. Returned data is at what is called the reporting unit (RU) level. Some examples of the possible RU structures are shown for our example at the bottom of figure 7.1. Brown forms one RU (A) only, whereas Smith has two RUs (comprising of Smith North and Smith South, and Smith East and Smith West). Jones has one RU, comprising Jones North and Jones R&D.9 Thus, these RUs are the fundamental unit for reported data on the ARD. It is worth noting at this point that the RU and LU distinction is crucial for our analysis. For example, entry and exit at the LU level might look very different to that at the RU level. Regional issues are also important here; looking at RU data when an RU reports on a number of LUs where the LUs are based in different regions may give a very different picture to looking at LUs. Regarding sampling, to reduce costs, only reporting units above a certain employment threshold (currently 25010) are all sent an ABI form every year. Smaller reporting units are sampled by size-region-industry bands.11 In the ARD, all data returned from reporting units is held on what is called the selected file. Other data is held on the nonselected file. Since the nonselected RUs are not sent a form, the nonselected data is of course the IDBR data. 7.2.4 Firms (RU) and Stores (LU) in UK Retailing We now document some basic facts regarding the number of retail firms (RU) and stores (LU) operating in the United Kingdom. Table 7.2 sets out some of the relevant data for 2003, the most recent period available. First, in column 1, top panel, there were 196,286 RUs in all retailing in 2003 and 285,291 LUs. Recall that RUs can report on one or more LUs, so the higher number of LUs is to be expected. Many of these RUs and LUs, by number, are in “Other Retail,” “Food, Beverages, Tobacco,” and “Nonspecialized Stores.” The remainder of the top panel shows data on the numbers of LUs that RUs report on. Column 3 shows that 10,745 RUs report on more than 9. On other surveys the RU structure might be slightly different, for example, on the R&D survey Jones might report on Jones R&D only that would be its RU for that survey. This matters when matching surveys. 10. The threshold was lower in the past. See Barnes and Martin (2002) for more details. 11. The employment size bands are 1–9, 10–19, 20–49, 50–99, and 100–249; the regions are England and Wales combined, Scotland, and Northern Ireland (NI). Within England and Wales industries are stratified at 4-digit level, NI is at two-digit level, and Scotland is at a hybrid 2/3/4-digit level (oversampling in Scotland and NI is by arrangement with local executives). See Partington (2001).

278

Jonathan Haskel and Raffaella Sadun

one LU. Thus, as column 4 shows, 185,541 RUs, the bulk of the LUs, just report on one LU (i.e., these are stand-alone firms). The remaining columns sum up to 10,745 in column 3. So, for example, the final column shows that only 171 RUs report on more than 100 LUs. In sum, approximately two-thirds of retailing outlets were accounted for by stand-alone businesses (185,541/285,291). Looking at the individual sectors, the distribution of units is the same in all seven. These data are just numbers of RUs and LUs. The lower panel shows the average employment that these units account for. Here the picture, not surprisingly, is rather different. Columns 1 and 2 of the lower panel show mean employment in RU and LU (headcount, not FTE) is 14.14 and 9.73 in all retailing, respectively. Mean employment for Reporting Units with a single Local Unit is 3.66. But looking at the last column, the RU who reports on more than 100 LUs has average employment per RU of over 9,000. This figure suggests a very high concentration of employment across few retail firms, especially in Nonspecialized Retail. Table 7.2 suggests there are many LUs and RUs by number and considerable concentration of employment. Table 7.3 gives some more details on this. Consider the top left panel, which shows data for all industries. The first number, 185,541, is the same as in table 7.2, column 4, top cell, namely, the number of RUs who are stand-alone. As the second column shows, this group accounts for 94.4 percent of the total number of RUs. Reading further across the table, however, total employment in these LUs is 678,496, which accounts for 24.4 percent of all employment. By contrast, looking at the bottom row of the top left panel, those reporting on more than 100 local units (171 RUs, just 0.1 percent of total numbers of RUs), account for 56.7 percent of employment in all retailing. For “Nonspecialized Stores” (mostly supermarkets), 77.2 percent of employment is accounted for by just 37 RUs, who are below 1 percent of the total number of RUs. Likewise in “Pharmaceuticals” and “Other,” the largest group accounts for a very small number of RUs by number, but 47.5 and 47.9 percent of total employment. By contrast, secondhand stores are concentrated by both number and size in small groups, and so is, to a lesser extent, “Food, Beverages, and Tobacco.”12 The concentration of employment is also shown in table 7.4, which reports the percentage of the sector’s employment in the top 5 and 10 RUs and LUs. Looking at the RU data, in nonspecialized stores just ten stores account for over half of total employment.13 12. One issue for us is whether significant RUs change industry over time (e.g., for many retailers are wholesalers as well and could be classified in different industries over time). To check this, we looked at the six largest supermarkets in the data set and found that they were consistently classified to one industry (SIC52119). Evidently, we do not have this problem in the data set for these companies. 13. The previous data has shown the relation between RUs and LUs. Above RUs are of course enterprise groups; in unreported tables we computed that most enterprise groups consist of one RU (i.e., the mean number of RUs that each enterprise group consists of is 1.01 in all sectors).

9.73 22.15 4.34 7.58 7.64 2.62 6.05 3.61

10,745 915 1,653 768 6,841 161 266 141

# of RU with more than 1 LU (3) 9,425 749 1,478 667 6,020