Large Sample Methods in Statistics (1994): An Introduction with Applications 1138106011, 9781138106017

This text bridges the gap between sound theoretcial developments and practical, fruitful methodology by providing solid

460 79 17MB

English Pages 394 [395] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Large Sample Methods in Statistics (1994): An Introduction with Applications
 1138106011, 9781138106017

Citation preview

LARGE SAMPLE METHODS IN STATISTICS AN INTRODUCTION WITH APPLICATIONS

LARGE SAMPLE METHODS IN STATISTICS AN INTRODUCTION WITH APPLICATIONS Pranab K. Sen and

Julio M. Singer

First published 1993 by Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 Reissued 2018 by CRC Press © 1993 by Taylor & Francis CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-tion that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. A Library of Congress record exists under LC control number: 92046163 Publisher's Note The publisher has gone to great lengths to ensure the quality of this reprint but points out that some imperfections in the original copies may be apparent. Disclaimer The publisher has made every effort to trace copyright holders and welcomes correspondence from those they have been unable to contact. ISBN 13: 978-1-138-10601-7 (hbk) ISBN 13: 978-0-203-71160-6 (ebk) Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

To our inspiring mothers Kalyani Sen and Edith Singer

Contents

P re f a c e

xi

1

O b je c tiv e s a n d S co p e: G e n e ra l I n t r o d u c ti o n 1.1 Introduction 1.2 Large sample methods: an overview of applications 1.3 The organization of this book 1.4 Basic tools and concepts 1.5 Concluding notes 1.6 Exercises

1 1 3 10 16 29 29

2

S to c h a s tic C o n v e rg e n c e 2.1 Introduction 2.2 Modes of stochastic convergence 2.3 Probability inequalities and laws of large num bers 2.4 Inequalities and laws of large numbers for some dependent variables 2.5 Some miscellaneous convergence results 2.6 Concluding notes 2.7 Exercises

31 31 34 48

W e a k C o n v e rg e n c e a n d C e n tr a l L im it T h e o r e m s 3.1 Introduction 3.2 Some im portant tools 3.3 Central limit theorems 3.4 Projection results and variance-stabilizing transform ations 3.5 Rates of convergence to norm ality 3.6 Concluding notes 3.7 Exercises

97 97 102 107 125 147 151 152

3

72 85 91 92

viii

CONTENTS

4

L a rg e S a m p le B eh avior o f E m p irical D istrib u tio n s an d Or d er S ta tis tic s 155 4.1 Introduction 155 4.2 Prelim inary notions 157 4.3 Sample quantiles 166 4.4 Extrem e order statistics 173 4.5 Em pirical distributions 184 4.6 Functions of order statistics and empirical distributions 188 4.7 Concluding notes 195 4.8 Exercises 195

5

A s y m p to tic B eh avior o f E stim a to rs and Test S ta tistic s 5.1 Introduction 5.2 A sym ptotic behavior of m axim um likelihood estimators 5.3 A sym ptotic properties of [/-statistics and related estim ators 5.4 A sym ptotic behavior of other classes of estimators 5.5 Asym ptotic efficiency of estim ators 5.6 Asym ptotic behavior of some test statistics 5.7 Concluding notes 5.8 Exercises

201 201 202 210 219 228 234 244 246

6

L a rg e S a m p le T h eory for C ategorical D a ta M odels 6.1 Introduction 6.2 Nonparam etric goodness-of-fit tests 6.3 E stim ation and goodness-of-fit tests: param etric case 6.4 Asym ptotic theory for some other im portant statistics 6.5 Concluding notes 6.6 Exercises

247 247 249 253 262 266 266

7

L arge S a m p le T heory for R eg ressio n M od els 7.1 Introduction 7.2 Generalized least-squares procedures 7.3 R obust estimators 7.4 Generalized linear models 7.5 Generalized least-squares versus generalized estim ating equations 7.6 N onparam etric regression 7.7 Concluding notes 7.8 Exercises

273 273 276 291 300 314 317 322 323

CONTENTS

8

In v a ria n c e P r in c ip le s in L a rg e S a m p le T h e o r y 8.1 Introduction 8.2 Weak invariance principles 8.3 Weak convergence of partial sum processes 8.4 Weak convergence of empirical processes 8.5 Weak convergence and statistical functionals 8.6 Weak convergence and nonparam etrics 8.7 Strong invariance principles 8.8 Concluding notes 8.9 Exercises

ix

327 327 328 332 343 354 360 366 367 368

R e fe re n c e s

371

In d e x

377

Preface

Students and investigators working in Statistics, Biostatistics or Applied Statistics in general are constantly exposed to problems which involve large quantities of d ata. Since in such a context, exact statistical inference may be com putationally out of reach and in many cases not even m athem atically tractable, they have to rely on approxim ate results. Traditionally, the ju sti­ fication for these approxim ations was based on the convergence of the first four m om ents of the distributions of the statistics under investigation to those of some norm al distribution. Today we know th a t such an approach is not always theoretically adequate and th a t a somewhat more sophisticated set of techniques based on the convergence of characteristic functions may provide the ap p ropriate justification. This need for more profound m athe­ m atical theory in statistical large sample theory is even more evident if we move to areas involving dependent sequences of observations, like Survival Analysis or Life Tables; there, some use of m artingale structures has dis­ tinct advantages. Unfortunately, most of the technical background for the understanding of such methods is dealt with in specific articles or textbooks w ritten for an audience with such a high level of m athem atical knowledge, th a t they exclude a great portion of the potential users. This book is intended to cover this gap by providing a solid justifica­ tion for such asym ptotic methods, although at an interm ediate level. It focuses prim arily on the basic tools of conventional large sample theory for independent observations, but also provides some insight to the rationale underlying the extensions of these methods to more complex situations in­ volving dependent measurements. The m ain thrust is on the basic concepts of convergence and asym ptotic distribution theory for a large class of statis­ tics commonly employed in diverse practical problems. Chapter 1 describes the type of problem s considered in the text along with a brief sum m ary of some basic m athem atical and statistical concepts required for a good understanding of the remaining chapters. C hapters 2 and 3 contain the es­ sential tools needed to prove asym ptotic results for independent sequences of random variables as well as an outline of the possible extensions to cover the dependent sequence case. C hapter 4 explores the relationship between

xii

PREFACE

order statistics and empirical distribution functions w ith respect to their asym ptotic properties and illustrates their use in some applications. Chap­ ter 5 discusses some general results on the asym ptotics of estim ators and test statistics; their actual application to Categorical D a ta and Regression Analysis is illustrated in C hapters 6 and 7, respectively. Finally, Chapter 8 deals with an introductory exposition of the technical background required to deal with the asym ptotic theory for statistical functionals. The objec­ tive here is to provide some m otivation and the general flavor of the prob­ lems in this area, since a rigorous treatm ent would require a much higher level of m athem atical background, th a n the one we contem plate. The eight chapters were initially conceived for a one-semester course for second year students in Biostatistics or Applied Statistics doctoral program s as well as for last year undergraduate or first year graduate program s in Statistics. A more realistic view, however, would restrict the m aterial for such purposes to the first five chapters along w ith a glimpse into C hapter 8. Chapters 6 and 7 could be included as supplem entary m aterial in Categorical Data and Linear Models courses, respectively. Since the tex t includes a number of practical examples, it may be useful as a reference tex t for investigators in many areas requiring the use of Statistics. The authors would like to th an k the numerous students who took Large Sample Theory courses at the D epartm ent of B iostatistics, University of North Carolina at Chapel Hill and D epartm ent of S tatistics, University of São Paulo, providing im portant contributions to the design of this text. We would also like to thank Ms. Denise Morris, Ms. M ónica Casajús and Mr. W alter Vicente Fernandes for their patience in the typing of the manuscript. The editorial assistance provided by Antonio Carlos Lim a with respect to handling T)eX and was crucial to the completion of this project. We are also grateful to Dr. José G alvão Leite and Dr. B ahjat Qaqish for their enlightening comments and careful revision of portions of the manuscript. Finally we m ust acknowledge the C ary C. Boshamer Foundation, Univer­ sity of North C arolina at C hapel Hill as well as Conselho Nacional de De­ senvolvimento Científico e Tecnológico, Brazil and Fundação de Amparo à Pesquisa do Estado de São Paulo, Brazil for providing financial support during the years of preparation of the text.

C H A PTER 1

Objectives and Scope: General Introduction

1.1 I n tr o d u c tio n Large sample methods in Statistics constitute the general m ethodology un­ derlying fruitful simpler statistical analyses of d ata sets involving a large num ber of observations. Drawing statistical conclusions from a given d a ta set involves the choice of suitable statistical models relating to the ob­ servations which incorporate some random (stochastic) or chance factors whereby convenient probability laws can be adopted in an appropriate m an­ ner. It is with respect to such postulated probability laws th a t the behavior of some sample statistics (typically, an estim ator in an estim ation problem or a test statistic in a hypothesis testing problem ) needs to be studied carefully so th at the conclusions can be drawn with an adequate degree of precision. If the number of observations is small an d /o r the underlying probability model is well specified, such stochastic behavior can be eval­ uated in an exact manner. However, with the exception of some simple underlying probability laws (such as the norm al or Poisson distributions), the exact sampling distribution of a statistic may become very complicated as the number of observations in the sample becomes large. Moreover, if the d ata set actually involves a large num ber of observations, there may be a lesser need to restrict oneself to a particular probability law, and general statistical conclusions as well can be derived by allowing such a law to be a mem ber of a broader class. In other words, one may achieve more robust­ ness with respect to the underlying probability models when the num ber of observations is large. On the other hand, there are some natural (and m ini­ m al) requirements for a statistical m ethod to qualify as a valid large sam ple m ethod. For example, in the case of an estim ator of a param eter, it is quite natu ral to expect th a t as the sample size increases, the estim ator should be closer to the param eter in some meaningful sense; in the literature, this property is known as the c o n s is te n c y of estim ators. Similarly, in testing a null hypothesis, a test should be able to detect the falsehood of the null hypothesis (when it is not true ) with more and more confidence when the

2

OBJECTIVES AND SCOPE: GENERAL INTRODUCTION

sample size becomes large; this relates to the consistency of statistical tests. In either case, there is a need to study general regularity conditions un­ der which such stochastic convergence of sample statistics holds. A second natu ral requirem ent for a large sam ple procedure is to ensure th a t the cor­ responding exact sampling distribution can be adequately approxim ated by a sim pler one (such as the norm al, Poisson, or chi-squared distribu­ tions), for which extensive tables and charts are available to facilitate the actual applications. In the literature, this is known as con v erg en ce in d is tr ib u tio n or central lim it th eo r y . This alone is a very im portant topic of study and is saturated with applications in diverse problems of statistical inference. Third, in a given setup, there are usually more than one procedure satisfying the requirem ents of consistency and convergence in distribution. In choosing an appropriate one within such a class, a nat­ ural criterion is optimality in some well-defined sense. In the estim ation problem , this optim ality criterion may relate to m in im um varian ce or m in im u m risk (with respect to a suitable loss function ), and there are vital issues in choosing such an optim ality criterion and assessing its adapt­ ability to the large sample case. In the testing problem, a test should be m o st p o w erfu l, but, often, such an optim al test may not exist (especially in the m ultiparam eter testing problem ), and hence, alternative optim al­ ity (or desirability) criteria are to be examined. This branch of statistical inference dealing with a sy m p to tic a lly op tim al p roced u res has been a very active area of productive research during the past fifty years, and yet there is room for further related developments! Far more im portant is the enormous scope of applications of these asymptotically optim al procedures in various problems, the study of which constitutes a m ajor objective of the current book. In a data set, observations generally refer to some measur­ able characteristics which conform to either a continuous/discrete scale or even to a categorical setup where the categories may or may not be ordered in some well-defined manner. S tatistical analysis may naturally depend on the basic n atu re of such observations. In particular, the analysis of ca t­ eg o r ic a l d a ta m od els and their ramifications may require some special atten tio n , and in the literature, analysis of qualitative data (or d iscrete m u ltiv a r ia te analysis) and g en era lized linear m od els have been suc­ cessfully linked to significant applications in a variety of situations. Our general objectives include the study of large sample methods pertinent to such m odels as well. Pedagogically, large sample theory has its natural roots in probability theory, and the past two decades have witnessed a phenomenal interaction between the theory of random (stochastic) processes and large sample sta­ tistical theory. In particular, weak convergence (or invariance principles) of stochastic processes and related results have paved the way for a vast

LARGE SAMPLE METHODS: AN OVERVIEW OF APPLICATIONS

3

simplification of the asym ptotic distribution theory of various statistical es­ tim ators and test statistics; a complete coverage of this recent development is outside the scope of the current book. There are, however, other mono­ graphs [viz., Serfling (1980), LeCam (1986) and Pfanzagl (1982)] which deal with this aspect in much more detail and at considerably higher level of sophistication. O ur goal is entirely different. We intend to present the basic large sam ple theory with a m inim um coating of abstraction and at a level com m ensurate with the usual graduate programs in Applied Statistics and Biostatistics; nevertheless, our book can also be used for M athem at­ ical Statistics program s (with a sm all amount of supplem entary m aterial if planned at a m ore advanced stage). As such, a measure theoretic orien­ tatio n is m inim al, and the basic theory is always illustrated with suitable examples, often picked up from im portant practical problems in some areas of Applied Statistics (especially, B iostatistics). In other words, our main ob­ jective is to present the essentials of large sample theory of Statistics with a view toward its application to a variety of problems th a t generally crop up in other areas. To stress our basic m otivation, we sta rt with an overview of the applica­ tions of large sam ple theory in various statistical models. This is done in Section 1.2. Section 1.3 deals w ith a brief description of the basic coverage of the book, and Section 1.4 with a review of some background m athem at­ ical tools.

1.2 Large sa m p le m e th o d s: a n overview o f a p p lica tio n s In a very broad sense, the objective of sta tistica l in fere n c e is to draw conclusions about some characteristics of a certain population of interest based on the inform ation obtained from a representative sample thereof. In general, the corresponding strategy involves the selection of an appro­ priate family of (stochastic) models to describe the characteristics under investigation, an evaluation of the compatibility of such models with the available d ata (goodness of fit) and the subsequent estim ation of or tests of hypotheses about the param eters associated with the chosen family of models. Such m odels may have different degrees of complexity and depend on assum ptions w ith different degrees of restrictiveness. Let us examine some examples. E x a m p le 1.2.1: Consider the problem of estimating the average height ß of a population based on a random sample of n individuals. Let Y denote th e height of a random ly selected individual and F the underlying distri­ bution function. In such a context, three alternative (stochastic) models include the following (among others):

4

OBJECTIVES AND SCOPE: GENERAL INTRODUCTION

a) F is assumed symmetric and continuous with mean /i and finite vari­ ance a 2 and the observations are the heights Y 1, . . . , Yn of the n indi­ viduals in the sample; b) F is assumed norm al with m ean µ and known variance σ 2 and the ob­ servations are the heights Y 1, . . . , Yn of the n individuals in the sample; c) The assum ptions on F are as in either (a) or (b) b u t the observa­ tions correspond to the num bers of individuals falling within each of m height intervals (grouped d ata). This is, perhaps, a more realistic model, since, in practice, we are only capable of coding the height measurements to a certain degree of accuracy (i.e., to the nearest mil­ limeter). E x am p le 1.2.2: Consider the OAB blood classification system (where the O allele is recessive and the A and B alleles are codom inant). Let P0 , PA , pB and pAB respectively denote the probabilities of occurrence of the phe­ notypes 0 0 , (AA,AO), (BB, BO) and AB in a given population; also, let qQ, qA and qB respectively denote the probabilities of occurrence of the O, A and B alleles in th at population. This genetic system is said to be in Hardy-W einberg equilibrium if the following relations hold: po = q2 o, pA = q 2 2qQqA, pB = + 2 qQqB and pAB - 2 qA qB . A problem of general concern to geneticists is to test whether a given population satisfies the Hardy-Weinberg conditions based on the evidence provided by a sample of n observational units for which th e observed phenotype frequencies are n Q, n a ’ n B an 2) possible choices, designated as x \ , . . . , æ 5, respectively. It is con­ jectured th a t the level may have some influence on the m ean life of a lamp. For each level consider a set of n lamps taken at random from a production lot, and let Y ^ denote the life length (in hours, say) corre­ sponding to the j i h lamp in the ith lot (i = 1 , . . . , s; j = 1 , . . . , n). It may be assumed th a t the Yij are independent for different i (= 1 , . . . , s) and j (= 1, . . . , 5 ). Further, assume th a t Yij has a distribution function F i, de­ fined on = [0,∞ ), i = 1 , . . . , s. In this setup, it is quite conceivable th at Fi depends in some way on the level χ i , i = 1 , . . . , s. Thus, as in Example

LARGE SAMPLE METHODS: AN OVERVIEW OF APPLICATIONS

5

1.2.1, we may consider a variety of models relating to the Fi, am ong which we pose the following: a) Fi is norm al with mean µi and variance σ2i, i = 1 , . . . , s, where the of may or may not be the same. b) Fi is continuous (and symmetric) with m edian form is not specified.

, i — 1, . . s but its

c) Although Fi may satisfy (a) or (b), since the Yij are recorded in class intervals (of width one hour, say), we have to consider effectively ap­ propriate (ordered) categorical d ata models. In model (a), if we assume further th a t a \ — • • • = a 2 = a 2, we have the classical (n orm al th eo ry ) m u ltisa m p le lo ca tio n m od el; in (b), if we let Fi(y) = F (y — /*,•), i = 1 , . . . , s, we have the so called n o n p a r a m e t­ ric m u ltisa m p le location m od el. Either (a) or (b) m ay be m ade more complex when we drop the assum ption of homogeneity of the variances a 2 or allow for possible scale perturbations in the shift model, i.e., if we let Fi(y) = F { ( y —/i2)/cr2}, i = 1 wher e the a 2 are not necessarily the same. For the m ultisample location model (normal or not), we m ay incorporate the dependence on the levels of X by writing m = m ( x i ) , i = 1 furtherm ore, if we assume th a t the fii( x ) are linear functions of known coefficients which may depend on x*, we may where are the write (unknown) regression p aram eters. This leds us to the lin e a r m od el:

For example, we may take z^ = a?*, k = 0, . . .,