Parametric Statistical Theory (de Gruyter Textbook) [Reprint 2011 ed.] 3110140306, 9783110140309

339 58 87MB

English Pages 390 [388] Year 1994

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Parametric Statistical Theory (de Gruyter Textbook) [Reprint 2011 ed.]
 3110140306, 9783110140309

Table of contents :
Preface
Introduction
Chapter 1 Sufficiency and completeness
1.1 Introduction
1.2 Sufficiency and factorization of densities
1.3 Sufficiency and exhaustivity
1.4 Minimal sufficiency
1.5 Completeness
1.6 Exponential families
1.7 Auxiliary results on families with monotone likelihood ratios
1.8 Ancillary statistics
1.9 Equivariance and invariance
1.10 Appendix: Conditional expectations, conditional distributions
Chapter 2 The evaluation of estimators
2.1 Introduction
2.2 Unbiasedness of estimators
2.3 The concentration of real valued estimators
2.4 Concentration of multivariate estimators
2.5 Evaluating estimators by loss functions
2.6 The relative efficiency of estimators
2.7 Examples on the evaluation of estimators
Chapter 3 Mean unbiased estimators and convex loss functions
3.1 Introduction
3.2 The Rao-Blackwell-Lehmann-Scheffé-Theorem
3.3 Examples of mean unbiased estimators with minimal convex risk
3.4 Mean unbiased estimation of probabilities
3.5 A result on bounded mean unbiased estimators
Chapter 4 Testing hypotheses
4.1 Basic concepts
4.2 Critical functions, critical regions
4.3 The Neyman-Pearson Lemma
4.4 Optimal tests for composite hypotheses
4.5 Optimal tests for families with monotone likelihood ratios
4.6 Tests of Neyman structure
4.7 Most powerful similar tests for a real parameter in the presence of a nuisance parameter
Chapter 5 Confidence procedures
5.1 Basic concepts
5.2 The evaluation of confidence procedures
5.3 The construction of one-sided confidence bounds and median unbiased estimators
5.4 Optimal one-sided confidence bounds and median unbiased estimators
5.5 Optimal one-sided confidence bounds and median unbiased estimators in the presence of a nuisance parameter
5.6 Examples of maximally concentrated confidence bounds
Chapter 6 Consistent estimators
6.1 Introduction
6.2 A general consistency theorem
6.3 Consistency of M-estimators
6.4 Consistent solutions of estimating equations
6.5 Consistency of maximum likelihood estimators
6.6 Examples of ML estimators
6.7 Appendix: Uniform integrability, stochastic convergence and measurable selection
Chapter 7 Asymptotic distributions of estimator sequences
7.1 Limit distributions
7.2 How to deal with limit distributions
7.3 Asymptotic confidence bounds
7.4 Solutions to estimating equations
7.5 The limit distribution of ML estimator sequences
7.6 Stochastic approximations to estimator sequences
7.7 Appendix: Weak convergence
Chapter 8 Asymptotic bounds for the concentration of estimators and confidence bounds
8.1 Introduction
8.2 Regular sequences of confidence bounds and median unbiased estimators
8.3 Sequences of confidence bounds and median unbiased estimators with limit distributions
8.4 The convolution theorem
8.5 Maximally concentrated limit distributions
8.6 Superefficiency
Chapter 9 Miscellaneous results on asymptotic distributions
9.1 Examples of ML estimators
9.2 Tolerance bounds
9.3 Probability measures with location- and scale parameters
9.4 Miscellaneous results on estimators
Chapter 10 Asymptotic test theory
10.1 Introduction
10.2 Tests for a real valued functional
10.3 The asymptotic envelope power function for tests for a real valued functional
References
Author Index
Subject Index
Notation Index

Citation preview

de Gruyter Textbook Pfanzagl · Parametric Statistical Theory

Johann Pfanzagl

Parametric Statistical Theory With the assistance of R. Hamböker

w DE

G

Walter de Gruyter Berlin · New York 1994

Author Johann Pfanzagl Mathematisches Institut Universität Köln Weyertal 86-90 D-50931 Köln Germany 7997 Mathematics Subject Classification: 62-01 Keywords: Theory of estimation, testing hypothesis, confidence intervals, asymptotic theory, asymptotic optimality

® Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability.

Library of Congress Cataloging-in-Publication Data Pfanzagl, J. (Johann) Parametric statistical theory / Johann Pfanzagl. p. cm. — (De Gruyter textbook) Includes bibliographical references and index. ISBN 3-11-014030-6 (acid-free). - ISBN 3-11-013863-8 (pbk. ; acid-free) 1. Mathematical statistics. I. Title. II. Series. QA276.PA77 1994 519.5'4-dc20 94-21850 CIP

Die Deutsche Bibliothek — Cataloging-in-Publication Data Pfanzagl, Johann: Parametric statistical theory / Johann Pfanzagl. — Berlin ; New York de Gruyter, 1994 (De Gruyter textbook) ISBN 3-11-013863-8 Pb. ISBN 3-11-014030-6 Gewebe

© Copyright 1994 by Walter de Gruyter & Co., D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printed in Germany. Printing: Arthur Collignon GmbH, Berlin. Binding: Lüderitz & Bauer GmbH, Berlin.

Preface

Scire tuum nihil est, nisi te scire hoc sciat alter. (Persius, I, 27)

This book presents a survey of advanced parametric statistical theory. It requires a basic knowledge of measure theory and probability theory, at about the level of Ash (1972) or Dudley (1989). For the German speaking reader, familiarity with the material in Bauer's (1990a,b) "Maß- und Integrationstheorie" and about half of his "Wahrscheinlichkeitstheorie" (with an English edition forthcoming) can be considered as an excellent preparation. Auxiliary results from measure theory and probability theory which are not easily found in textbooks (like uniform versions of limit theorems, or measurable selection theorems) are summarized in the last section of each chapter. Students with these mathematical prerequisites and some knowledge of elementary statistics will have no problem in mastering the material in this book. It corresponds to a two-term course, totalling about 150 hours (including exercises). The book is a result of classes held over many years for students of mathematics at the University of Cologne. It owes much to some of my former assistants who cooperated with me in the supervision of exercises and diploma theses. Among these are C. Hipp, U. Einmahl, L. Schroder, and R. Hamböker. I am particularly obliged to R. Hamböker. Without his unresting help I would have been unable to bring the manuscript into its final shape. The TßX version was done by E. Lorenz.

Cologne, January 31, 1994

J. Pfanzagl

Introduction

Mathematical statistics is a tool for drawing conclusions from realizations of a random variable to the underlying probability measure. A basic knowledge of the physical background of the random phenomenon usually gives some information about the general nature of this probability measure: Prior to the observation we know that the probability measure belongs to some family of probability measures, say ψ. Statistical theory becomes particularly simple if this basic family is parametric, i.e. if φ = {P$ : ϋ e Θ}, where θ is a subset of a Euclidean space. In asymptotic theory we consider in this book parametric families (P$ : ι? 6 Θ}, where the observations are independent and identically distributed. Our approach to the statistical procedures is simple-minded and earthbound, in a sense. Starting from the possible uses of an estimator, we try to develop concepts for the evaluation of estimators, and methods for the construction of "good" estimators. To consider estimation as a special decision procedure seems to be of no help. To squeeze every statistical problem into the Procrustean-bed of decision theory prevents the statistician from considering the possible solutions according to criteria inherent to the particular problem. Paraphrazing a statement of Wittgenstein's in one of his letters to Russell, a book consists of two equally important parts: The part which has been written, and the part which has been omitted. In accordance with our simple-minded approach we have omitted some of the traditional contents of textbooks on mathematical statistics when they appeared to us neither practically useful nor mathematically interesting. As an example we mention the celebrated CramerRao bound. This bound is not attainable, in general, thus lacking any clear operational significance for finite sample sizes, and it plays no role in asymptotic theory since it is based on the assumption of unbiasedness for every sample size. Other concepts which are omitted as irrelevant for applications can be characterized by the catchwords "admissible" and "minimax". Robust methods and Bayesian theory fell victim to space limitations. Throughout the book, the considerations are restricted to dominated families of probability measures, to avoid complications of purely mathematical nature, without relevance for applications. Most results are presented under the more restrictive assumption of mutual absolute continuity. We think that this is no serious shortcoming for the statistician interested in applications. (Thereby, we also avoid the discussion about the interpretation of statistical procedures the results of which are a composite of probabilistic and "exact" assertions.)

viii

Introduction

The book contains some historical remarks. They reflect the predilections of the author, and do not cover the whole field in a systematic way. Parameters and functionals The inference based on the realization of a random variable usually aims at a particular feature of the underlying probability measure, expressed by some functional κ, defined on ty. The considerations of this book are mainly restricted to finite dimensional functionals κ : φ —* Κ9. Basically it is the functionals (like means, or quantiles, or failure rates) which have a meaningful interpretation. Parameters which are used for a description of the basic family φ may have a meaningful interpretation if they play a role as a functional. For the purpose of illustration, let {Ea : α > 0} be the family of exponential distributions. If Ea describes a life distribution, α is expected duration of life, α log 2 the median life time etc. If one is uncertain whether the exponential distribution fits reality with sufficient accuracy, one may be inclined to consider the larger family of gamma distributions {Γα>5 : α, b > 0} (which contains Ea as Γ α> ι) as a more realistic model. But what is the use of an estimate for the parameter α under the assumption that the observations are distributed as Fa)b? In this case, the mean life time is αδ, the median life time οΓ(6) (with T(6) defined by ^(b) xb~le~xdx = Γ(6)/2). Embedding {Ea : a > 0} in the larger family modifies the meaning of the parameter a, and we have to get clear about which functional we wish to estimate. Some statisticians might be inclined to consider the true distribution Γ αι & as a "distorted" or "contaminated" exponential distribution, and to find the way back to the original "undistorted" distribution by projecting Γα)ί, into the family {Ea : a > 0}, i.e. to assign to each pair (a, 6) the value α(α, 6) for which a certain distance between Ea and Γ Λ) 6 is minimal. The functional κ(Γ αι (>) = α(α, b) thus defined has the advantage of admitting robust estimators. It has the shortcoming of lacking any meaningful interpretation. To answer a meaningful question one has to decide which functional is to be estimated, and whether the interest is in a functional of the true (if distorted) distribution, or in a functional of the original distribution (in case this functional is identifiable from observations originating from the distorted distribution). "Exact" versus asymptotic theory Chapters 3, 4 and 5 contain results of "exact", i.e. nonasymptotic, nature about confidence coefficients, significance levels, unbiasedness and optimality. These results are exact only within the model. Since models are hardly ever accurate images of reality, the results of the "exact" theory hold approximately only as soon as we turn to applications. Therefore, nothing is lost by the application of asymptotic results, and much can be gained. Roughly speaking, "exact"

Introduction

ix

solutions in parametric theory are feasible for exponential families only. Even in this restricted area, certain problems remain unsolved. This is typically the case for "curved" exponential families. The most popular example for the failure of the exact theory is the Behrens-Fisher problem. Asymptotic theory takes advantage of the regularity brought in by many independent repetitions of the same random experiment, a regularity which finds its simplest expression in the Central Limit Theorem. Whereas distributions of different estimators are incomparable in general, and "optimal" estimators a rare exception, limit distributions are normal in regular cases, and this makes a comparison feasible. Asymptotic theory is "general" in the sense that it applies to arbitrary parametric families, subject to some regularity conditions. If only results of the exact theory were available, many practical problems would necessarily remain unsolved or — more probably — they would be treated under unrealistic assumptions (e.g., an omnipresence of normal distributions), thus risking a model bias the amount of which is unknown. In contrast to this, asymptotic theory offers the possibility of representing reality by a more accurate model, thus reducing the danger of a model bias. The crucial drawback of asymptotic theory: What we expect from asymptotic theory are results which hold approximately (like estimators which are approximately median unbiased and the concentration of which is approximately maximal). What asymptotic theory has to offer are limit theorems. By taking a limit theorem as being approximately true for large sample sizes, we commit an error the size of which is unknown. To obtain bounds for such errors is not impossible. (As an example think of the Berry-Esseen bound for the accuracy of the normal approximation to the distribution of sample means.) However: Such error bounds will usually grossly overestimate the true errors. Edgeworth expansions are an efficient tool for reducing the errors of asymptotic results and for clarifying their general structure. Realistic information about the remaining errors may be obtained by simulations. The asymptotic considerations in this book are mainly restricted to the i.i.d. case, i.e. to independent, identically distributed observations. Moreover, we confine ourselves to an asymptotic theory of first order, i.e. to an approximation by limit distributions. We hope that the reader who thoroughly understands the problems connected with the simplest case will have a solid base for the study of more general and more refined asymptotic methods.

Contents

Preface Introduction

v vii

Chapter 1 Sufficiency and completeness 1.1 Introduction 1.2 Sufficiency and factorization of densities 1.3 Sufficiency and exhaustivity 1.4 Minimal sufficiency 1.5 Completeness 1.6 Exponential families 1.7 Auxiliary results on families with monotone likelihood ratios 1.8 Ancillary statistics 1.9 Equivariance and invariance 1.10 Appendix: Conditional expectations, conditional distributions

1 1 4 8 12 17 22 31 44 49 55

Chapter 2 The evaluation of estimators 2.1 Introduction 2.2 Unbiasedness of estimators 2.3 The concentration of real valued estimators 2.4 Concentration of multivariate estimators 2.5 Evaluating estimators by loss functions 2.6 The relative efficiency of estimators 2.7 Examples on the evaluation of estimators

65 65 68 74 82 90 93 95

Chapter 3 Mean unbiased estimators and convex loss functions 3.1 Introduction 3.2 The Rao-Blackwell-Lehmann-ScherTe-Theorem 3.3 Examples of mean unbiased estimators with minimal convex risk 3.4 Mean unbiased estimation of probabilities 3.5 A result on bounded mean unbiased estimators

101 101 104 109 112 120

xii

Contents

Chapter 4 Testing hypotheses 4.1 Basic concepts 4.2 Critical functions, critical regions 4.3 The Neyman-Pearson Lemma 4.4 Optimal tests for composite hypotheses 4.5 Optimal tests for families with monotone likelihood ratios 4.6 Tests of Neyman structure 4.7 Most powerful similar tests for a real parameter in the presence of a nuisance parameter

123 123 128 133 135 139 143

Chapter 5 Confidence procedures 5.1 Basic concepts 5.2 The evaluation of confidence procedures 5.3 The construction of one-sided confidence bounds and median unbiased estimators 5.4 Optimal one-sided confidence bounds and median unbiased estimators 5.5 Optimal one-sided confidence bounds and median unbiased estimators in the presence of a nuisance parameter 5.6 Examples of maximally concentrated confidence bounds

157 157 160

Chapter 6 Consistent estimators 6.1 Introduction 6.2 A general consistency theorem 6.3 Consistency of M-estimators 6.4 Consistent solutions of estimating equations 6.5 Consistency of maximum likelihood estimators 6.6 Examples of ML estimators 6.7 Appendix: Uniform integrability, stochastic convergence and measurable selection

187 187 189 193 200 203 209

Chapter 7 Asymptotic distributions of estimator sequences 7.1 Limit distributions 7.2 How to deal with limit distributions 7.3 Asymptotic confidence bounds 7.4 Solutions to estimating equations 7.5 The limit distribution of ML estimator sequences 7.6 Stochastic approximations to estimator sequences 7.7 Appendix: Weak convergence

225 225 231 236 240 247 251 255

146

167 171 174 180

213

Contents

Chapter 8 Asymptotic bounds for the concentration of estimators and confidence bounds 8.1 Introduction 8.2 Regular sequences of confidence bounds and median unbiased estimators 8.3 Sequences of confidence bounds and median unbiased estimators with limit distributions 8.4 The convolution theorem 8.5 Maximally concentrated limit distributions 8.6 Superefficiency

xiii

263 263 269 276 278 288 296

Chapter 9 Miscellaneous results on asymptotic distributions 9.1 Examples of ML estimators 9.2 Tolerance bounds 9.3 Probability measures with location- and scale parameters 9.4 Miscellaneous results on estimators

305 305 316 320 332

Chapter 10 Asymptotic test theory 10.1 Introduction 10.2 Tests for a real valued functional 10.3 The asymptotic envelope power function for tests for a real valued functional

337 337 338

References Author Index Subject Index Notation Index

345 361 365 371

341

Chapter 1 Sufficiency and completeness

1.1 Introduction Motivates the concepts of "exhaustive" and "sufficient" statistics which preserve all "information" in the sample.

Let (X, A) be a measurable space, and S : [Χ, Λ) —> (Υ, Β) a measurable map. Moreover, let φ denote a family of p-measures (probability measures) P\A. If, instead of the observation x, only S(x) is known, this will, in general, result in some loss of "information": It will be impossible to match each statistical procedure based on χ by a statistical procedure based on S(x). This is, however, not necessarily so. It may occur, that for each statistical procedure based on χ there is a statistical procedure based on S(x) which is at least as good. If this is the case, we call the function 5 "exhaustive". Turning this intuitive idea into a mathematical concept meets with the difficulty that there is no general concept for comparing statistical procedures, varied as tests, confidence procedures or estimators. As far as tests are concerned, one could think of requiring that for any critical function φ : X —* [0,1] there is a critical function ψ ο S : Χ —> [0,1] with the same power function on φ (i.e. with Ρ (φ) -· Ρ(φ ο S) for all P e φ). The situation is almost hopeless if one tries to describe in mathematical terms that one estimator is "at least as good" as another. (See the discussion in Chapter 2.) Even if one would succeed in each instance: It is not clear, in advance, whether "exhaustive for tests" is the same as "exhaustive for estimators", say. One does not depend on special criteria for the evaluation of various statistical procedures if it is possible to obtain, smarting from 5(x), a random variable χ with the same distribution as the original observation x. Any statistical procedure, applied to χ, has then — for every P e φ — exactly the same performance as the statistical procedure applied to x. With P e φ unknown, the device leading from S(x) to x has to be independent of P, of course. In technical terms the transition from S(x) to x is accomplished by a Markov kernel M\Y x A.

2

1. Sufficiency and completeness

Definition 1.1.1. A statistic S : (Χ,Α) —* (Υ, Β) is exhaustive for φ if there exists a Markov kernel M\Y χ A such that M(y, A)P ο S(dy) = Ρ (A)

for A e A and Ρ € φ.

(1.1.2)

Recall Theorem 1.10.33 according to which, for Polish spaces (X, A), for any Markov kernel Μ\Y x A there exists a function m : Υ χ (0,1) —> X such that, for every y e Y, the induced distribution of u —» m(y, u) under £/ (the uniform distribution over (0,1)) is identical to M(y, ·), i.e. U o (u —> ra(y, u)) = M(y, ·). If S is exhaustive for φ, i.e. / M (y, A)P ο S(dy) = P(A) for A e Λ. and P G φ, we obtain (Ρ χ [/) ο ((χ, u) -> τη(5(χ), u)) = P. If we know S(x] and determine a realization u from the uniform distribution over (0,1), we obtain with m(S(x), u) a random variable which has exactly the same distribution as the original x, for every P e φ. There is a convenient alternative if the sufficient statistic S is boundedly complete and if there exists an ancillary statistic T : (X, A) —> (Z, C) such that x = /(S"(x),T(x)) for x 6 X. Then S and T are stochastically independent for P 6 φ by Basu's Theorem 1.8.2, and, with P0 e φ arbitrarily fixed, (Ρ χ PO) ο ((χ,η) -» /(S(x),T(u))) = Ρ for every Ρ 6 φ by the Addendum to Proposition 1.8.11. Example 1.1.3. For t? > 0 let EU denote the exponential distribution, given by its Lebesgue dentity x —* ύ~ι exp[i?~1x], x > 0. The function 5 η (χι,..., x n ) = Σ™1" ls exhaustive for the family {E$ : ϋ > 0}. Τϊ(χι,..., χη) := Xi/ Σ" χν·, i = l , . . . , τι, is ancillary, and ZT = 5' η (χι,...,χ η )Γί(χι,...,χ η ). Hence the condition χ — /(θ'(χ),Τ(χ)) is fulfilled with f ( y , z ) = yz. If we determine a realization (u\,..., un) from E™, then

t 1

1

Ui

η

" * / Σ > ···> Σ 1

XvUn

1

η

l ΣUi) 1

has for every ϋ > 0 the same distribution E$ as (xi, . . . , x n ). Definition 1.1.4. A statistic S : (X, A) —* (Y,B) is sufficient for φ if for every A e A there exists a conditional expectation, given S, say let

._

My)/My) for y e S»

Then hpoS (Y,B) is sufficient for φ iff for every pair P', P" e φ and every A e A there exists φ A '· (Y, B) —> ([0,1], BO) such that P' ο S( ([0, l],Bo) such that Ρ ο 3(φΑ,η) < P(A)

and Pn ο 3(φΑ,η) > Pn(A}.

By Lemma 1.3.12 and Corollary 1.3.14, applied with ψΑ,η ο 5 in place of φΑ, there exists a density of P with respect to P+Pn, say pn, which is S~1Bmeasurable. By Lemmas 1.3.18 and 1.10.6(1) this implies the existence of a density of P with respect to P, which depends on χ through S(x) only. Hence S is sufficient for φ. (ii) If S is sufficient for φ, there exists φΑ e Γι{Ρ81Α : P 6 φ}, which implies f

φΑ(ν)Ρ ο S(dy) = P(A)

for Α Ε Α, Ρ 6 φ . D

Lemma 1.3.12. Let P', P" be p-measures on A. Assume that for every A e A there exists φΑ : (X, A) —>· ([0,1],1Βο) such that ) < P'(A)

and

Ρ"(ΦΑ) > P"(A).

(1.3.13)

Then the densities ofP'\A and P"\A with respect to (P'+P")\A are measurable with respect to the σ-algebra generated by {A e A : φ A = IA (P1 + P")~a.e.}. Corollary 1.3.14. If φ A is Ao-measurable for every A e A, then there exist Ao-measurable densities of P'\A and P"\A with respect to (P' + P")\A. Proof of Lemma 1.3.12. Since the assumption is symmetric with respect to P' and P", it suffices to prove the assertion for P'. Let p' be a density of P'\A

1.3 Sufficiency and exhaustivity

11

with respect to (P1 + P"}\A. For r > 0 let Ar := {x e X : p'(x) < r (I - p'(x))}.

(1.3.15)

By (1.3.13) there exists ΦΑΓ such that Ρ'(φΑΓ} 0 was arbitrary, p' is .A-measurable. Π Proof of Corollary 1.3.14. φΑ = ΙΑ (Ρ' + P")-*.e. implies A = {x e X : ΨΑ(Χ) = 1} (P' + P"}· Since φ A is .4o-measurable, this implies A e AQ (P' + P") (i.e. there exists AQ e AQ such that (Ρ' + Ρ")(ΛΔ AQ) = 0, where "Δ" denotes the symmetric difference of sets). Hence A is in the completion of AQ with respect to (P' + P"). Since p' is Α-measurable, there exists (see Lemma 1.10.3) an AQ -measurable function which agrees (P' + P")-a.e. with p'. This is the .Ao-measurable version of the density of P'\A with respect to (P' + P")\A. D The following lemma is due to Halmos and Savage (1949, p. 239, Lemma 12). Lemma 1.3.18. Let AQ C A be a sub-σ-algebra. Let φ be a dominated family of p-measures on A. Assume that for every pair P', P" G φ there exists a density of P'\A with respect to (P' + P")\A which is AQ -measurable. Then for every P e φ there exists a density with respect to P+\A (defined in Lemma 1.2.1) which is AQ-measurable. Proof. Let P» = Σ~2~ η Ρ η . For η € IN let pn 6 dP/d(P + Pn) be AQmeasurable. W.l.g. we assume 0 < pn < 1. From P(A) = (P + Pn)(pnl/0 we obtain P((l - p n )l^) = Ρη(ρ·η\Α) for A G A, hence P((l - pn)f) = Pn(pn/) for every ^-measurable function / > 0. (1.3.19) With An := {pn = 1} and Bn := {pn — 0} we obtain Pn(An) = PniPnUj = P((l - Pn)\A J = 0

and

12

1. Sufficiency and completeness P(Bn) = P((l -pn)lBn)

= P„(pnl J = 0-

Let Λ« := ΗΓ^η, 5* := υΓ β η· Then ρη(Λ*) = Ο for η e IN implies Ρ* (Λ*) = Ο and therefore Ρ (A*) — 0. Moreover, P(Bn) = 0 for n e IN implies ,) = 0. Let

θ

Prom (1.3.19), applied with / = l^nsjlA^/Pni we obtain P((l -Pn) — lA;nBjlyl) = Pn(h*l>l;nB;lyl)· Pn

n

Multiplication by 2~ and summation over n e IN yields (1.3.20) Since P(^ Π 5«) = 1, we have P(IA^B^A) = P(A}. Since h*(x) = 0 for χ £ Λ£ Π P.£, we have P(h*lAcnBelA) = P*(h*lA). Hence (1.3.20) implies P(A) = P*(/i»l / i) ) i.e. h» e dP/ίίΡ». Since /ι* is ^lo-measurable, this proves the assertion. D

1.4 Minimal sufficiency Introduces the concept of a "minimal" sufficient statistic, rendering a maximal reduction, and gives a criterion for "minimality".

Throughout this section, φ is α dominated family ofp-measures on (X,A). To obtain a maximal reduction of the data χ to S(x) one should try to find a sufficient statistic 5» which is coarser than any other sufficient statistic. Definition 1.4.1. The sufficient statistic «S1* : (X, A) -+ (Υο,Βο) is minimal sufficient if for any sufficient statistic S : (Χ, Λ) -^ (Υ, Β) there exists a function Η : (Υ, Β] -» (Υο, Bo) such that 5, = H o S φ-a.e. Notice that some authors require H : ( S ( X ) , Β Π S ( X ) ) only. If S is sufficient, then with P* defined by (1.2.2) there exist (by Factorization Theorem 1.2.10 and Proposition 1.2.9) P*-densities of P which depend on χ through S(x) only. This suggests to start from densities qp e dP/dP» and to consider the partition consisting of the elements {ξ € X : hp(£) =

1.4 Minimal sufficiency

13

hp(x) for P e φ}, χ € X, with the intention to find a statistic 5*|(Χ, Λ) inducing this partition (i.e., a statistic which is constant on each element, and attains different values on disjoint elements of this partition). Since densities qp are unique up to P-null sets only, and the definition of a partition given above involves uncountably many P e φ, this intuitive idea has to be modified for technical reasons. In the following we show first that a minimal sufficient statistic exists under mild conditions (see Theorem 1.4.2). More useful for practical purposes is Theorem 1.4.4 which can be applied to show that a given sufficient statistic is, in fact, minimal. Finally we show that a sufficient statistic S with φ S boundedly complete is necessarily minimal (Proposition 1.4.8). These results are essentially due to Lehmann and Scheffe (1950, Section 6). Theorem 1.4.2. If Λ is countably generated, there exists a minimal sufficient statistic S* : (X,A) -* (IR+.Bj). Proof. Since A is countably generated, there exists a countable subset φ0 := {Pn : η e IN} C φ, which is dense in φ with respect to the sup-metric, such that φ < P, := ^2~nPn (see Remark 1.2.4). For η e IN let pn e dPn/dP* be a fixed version. Let S* : X —» IR+ be defined by

(i) 5» is sufficient for φ0, since pn(x) = Π η (5*(χ)), χ 6 X (with Πη denoting the projection of 1R+ onto its n-th component). Since φ0 is dense in φ, St is sufficient for φ by Exercise 1.2.6(i). (ii) If S : (X, A) —> (y, B] is sufficient, there exist measurable functions Λη : (K, ) -> (1R+, B+) such that hn o S 6 dPn/dPt. With the function H : Υ -» 1R+ , defined by

H(y} = (My)) neN , y e y , we obtain H(S(x)) = hence for φ-a.a. χ Ε X.

D

Since there exists a bimeasurable 1—1 map from (IR + ,B + ) to (H, IB) (see Parthasarathy, 1967, p. 14, Theorem 2.12), the minimal sufficient statistic given in Theorem 1.4.2 can always be assumed to be real valued. In spite of this, Theorem 1.4.2 is hardly useful for practical purposes. One will always prefer

14

1. Sufficiency and completeness

more natural versions of the minimal sufficient statistic, for instance continuous functions if X is a topological space (even if these statistics attain their values in IR , say, rather than in IR). Hence more useful than the general existence Theorem 1.4.2 are theorems which provide a tool for identifying a given sufficient statistic as a minimal one. Theorem 1.4.4. Let Λ be countably generated, and let (Y, ) be a Polish space. Assume that S : (Χ, Λ) —> (Υ, Β) is sufficient for φ, and that the functions hp occurring in factorization (1.2.8) have the following property. There exists a countable subset φ0 C φ such that, for y', y" 6 Y, hp(y') = hP(y") Then S is minimal

for all Ρ e φ0, implies y' = y".

sufficient.

Proof. W.l.g. we assume that φ0 = {Pn : n e IN} is dense in φ with respect to the sup-metric. Let Η : Y -» IR+ be defined by H(y) = (/ip n (y)) n6N · As can be seen from the proof of Theorem 1.4.2, h0(x\, . . . ,x n ) is continuous, relation (1.4.6) implies *Ζ1 «

1 + exp[-z'J

foreverv£ (Υ, Β) is a minimal sufficient statistic. Then any Markov kernel M\Y χ A fulfilling

/

M(y,A)PoS(dy} = P(A)

for A e Α, Ρ € φ,

(1.4.10)

is a conditional distribution ofx, given S, and therefore unique in the following sense: If Μ>\Υ χ A, i = 1,2, fulfill (1.4-10), thenM^y,·) = M2(y)·) fortyoSa.a. y 6 Y. Proof. From Lemma 1.4.11, applied with φΑ(χ) — M(S(x), of P, we obtain M(S(x),A)lAo(x)

= M(S(x), Α η A0)

) and P» in place

for P»-a.a. χ e X,

for A e A and A0 e A := {A e A : M(S(·), A) = \A P»-a.e.}.

Since Μ is a Markov kernel, A is a σ-algebra. By integration with respect to P we obtain ί M(S(x},A)lAo(x}P(dx)

=

for A e A, AQ e Α,

That M(·, A) is a conditional expectation of 1^, given S, follows if we prove that S~^B C A (P*). This follows from the minimality of S. From Lemmas 1.3.12 and 1.3.18, applied with φΑ(χ) = M(S(x), A), P' = P and P" = P», we obtain that the density of P\A with respect to P*\A is .Α-measurable. Since A is countably generated, the map χ —+ (pn(x))n N is sufficient (by Theorem

1.5 Completeness

17

1.4.2). Since S is minimal sufficient, there exists Η : (IR+,B+) —>· (Υ, Β) such that S(x) = H((pn(x))n£w) for P»-a.a. χ e X. Since pn is .Α-measurable, this implies S~l C Λ (P.). D Lemma 1.4.11. Assume that for every A e Λ there exists φ A '· (X,A] —> ([0, I],BQ) with the following properties (i) Ρ(ΨΑ) = P(A) for AeA, (ii) ΨΑ+Β = ¥M + ΨΒ P-a.e. ifAr\B = 0. Let .4 := {Λ e .4 : ¥>Λ = U

P-a.e. }. Then

P-a.e. for Α ε A, A0 e A. Proof. For every A e .4, A0 Ε A

Λ

Λ,1Αο}

(Ρ)

(Ρ) Λ

Since AQ £ «4 implies ^4« e A

we nave

= φΑ1Αο.

^n^0 < V^l/i c > hence °

^ = ψ Α, and therefore φΑηΑ0 — ΨΑ^ΑΟ

P-a.e.

D

1.5 Completeness Introduces the concept of [boundedly] complete families of p-measures and gives sufficient criteria for completeness and symmetric completeness. Definition 1.5.1. The family of p-measures φ \Α is [boundedly] complete if for every [bounded] function / : (X, A) -> (IR,B), P(/) = 0 for all P e φ implies / = 0 φ-a-e. If convenient, we follow the common abuse of language and speak of a [boundedly] complete statistic S if the family φ S is [boundedly] complete. The need for two different concepts, namely "completeness" and "bounded completeness", originates from statistical applications: Test theory uses "critical regions" or, more generally, "critical functions" which are bounded, whereas estimators are usually unbounded. The concepts of completeness and bounded completeness were introduced by Lehmann and Scheffe (1947, 1950). They isolate the properties of families of p-

18

1. Sufficiency and completeness

measures which are essential for the uniqueness of optimal unbiased estimators and optimal similar tests, respectively (which were implicitly used before by Scheffe, 1943 and Halmos, 1946). Every complete family is boundedly complete, and there are examples of families which are boundedly complete without being complete. (A first example of this kind was given by Lehmann and Scheffe, 1950, p. 312, Example 3.1. For some recent results see Mattner, 1991.) Lemma 1.5.2. Let $ be a class of functions f : (X, A) —* (IR, IB). The family φ of all p-measures P\A which are equivalent to some σ-finite measure μ, and which fulfill the condition P(\f\) < oo for f e #, is complete. Proof. Assume there exists a function / : (X, A) —> (IR, IB) with P(|/|) < oo for Ρ e φ. Assume that PQ{f ^ 0} > 0 for some (and therefore all) P0 e φ. We shall show that there exists PI e φ such that PI(/) / 0. If P0{/ > 0} > 0, there exists a p-measure PI with Po-density c'l{^>0j. + c"l^ 0. Since PI(|/|) < max{c',c"}P0(|/Q < oo, we have PI € φ. D Exercise 1.5.3. If φ0 C φ is [boundedly] complete and if P(A) = 0 for all P G Φο implies P(A) = 0 for all Ρ 6 φ, then φ is [boundedly] complete Regrettably, it is usually not an easy task to find out whether a given family of ^-measures is complete or not. Moreover, the routine applications in statistical theory involve samples of arbitrary size. Of course, families of product measures with identical components, say {Pn : Ρ € φ}, are usually not complete, even if φ is large. The relevant question is whether there is a sufficient statistic Sn\Xn such that {Pn ο Sn : P e φ} is complete. The most useful result for parametric theory concerns the completeness of various statistics in exponential families. It will be given in Theorem 1.6.10. If (xi,...,xn) is a realization from a p-measure Pn, the order in which E I , . . . , xn occur is irrelevant: For any permutation ( i i , . . . , in) of (1,..., n), the realization (x^,..., Xin) has the same distribution as ( z i , . . . , x n ) . A function Sn\Xn which is maximal invariant under permutations (i.e. fulfills S^x^,... ,x'n) =Sn(x",. -.,χ'ή) iff there is a permutation such that x" =Xj for j = l , . . . , n ) will be called order statistic (a name originating from the case X C IR in which Sn(x\,..., z n ) can be thought of as mapping (x\,...,xn) to (xi : n ,... • ••iXn:n))· Obviously, the order statistic is sufficient for every family {Pn : Ρ € φ}. (This, by the way, is a case where the concept of a "sufficient sub-σalgebra" — the sub-a-algebra of all measurable sets in An which are invariant under permutations — is more natural than the concept of a "sufficient statistic".)

1.5 Completeness

19

The functions on Xn depending on ( x j , . . . , x n ) through the order statistic are the functions of (χι,...,χ η ) which are invariant under all permutations of (ii,...,x n ). "Completeness of the order statistic" for {Pn : P e φ} is, therefore, the same as "symmetric completeness" of {Pn : P G φ}. Definition 1.5.4. {Pn : Ρ Ε φ} is symmetrically complete if for any permutation invariant function fn\Xn,

I

, . . .,xn)P(dxl)... P(dxn) = 0

for all P e φ

implies fn = 0 Pn-a.e. for P e φ. Exercise 1.5.5. If {Pn : Ρ 6 φ} is [boundedly] symmetrically complete for some sample size n, then φ is [boundedly] complete. Symmetric completeness of (Pn : Ρ £ φ} can be expected only if the family φ is very large. Completeness of φ is certainly a necessary condition for completeness of {Pn : Ρ e φ}. If φ is complete and closed under convex combinations, this suffices for symmetric completeness for arbitrary sample sizes. This is the content of Theorem 1.5.10. For smaller families φ, {Pn : P € φ} may fail to be symmetrically complete, even if the order statistic Sn is minimal sufficient. As an example consider a location parameter family φ — {P^ : i9 € H} with P$ = PO ο (χ —> χ+ϋ). If PO is the Cauchy- or the logistic distribution, Sn is minimal sufficient for {P£ : ϋ € H} (see Example 1.4.5 and Exercise 1.4.7). Yet the order statistic fails to be complete for any location parameter family. (Let v?|lR be a bounded function with φ(χ) + φ(—χ] Φ 0 and / φ(χ\ —X2)Po(dxi)Po(dx2) = 0. Then V ; 2(^i,^2) := ψ(^ι — ^2) + φ(χ (IR, B) i=i

/ ·

l)

*

'···'·

We shall show that (1.5.7) implies f TT I /n(zi,··. ,xn}\\^Ai(xi)Pi(dx\). ..Pn(dxn) — 0 (1.5.8) J ι for Ai E Ai and Pi Ε φί; i = l,..., η. n

The system of all sets A E X .Ai for which / fn(xi,...,xn}IA(x1,...,xn)P1(dxi)...Pn(dxn)

=0

n

is a Dynkin system. By (1.5.8) it contains all sets X Ai with Ai E Ai and is n

n

therefore equal to x Ai. This implies fn = 0 X Pj-a.e. It remains to prove (1.5.8). Let ·= I

fn(xi,X2,---,Xn}

with PJ 6 iPi, i = 2, . . . ,n, arbitrary and fixed. Since J g(xi)Pi(dx) — 0 for all PI £ Φΐ) completeness of ^ implies 5 = 0 ^-a.e. Therefore, /^(xi)!^! (xi)Pi(dxi) = 0 for AI E A\ and P\ Ε φ. This implies . . . P n (rfx n ) = 0. (1.5.8) follows by induction.

D

Lemma 1.5.9. If φ is closed under convex combinations, the following holds true for every η e IN: // a measurable function fn : Xn —» IR is permutation invariant, then P n (/ n ) = 0 for all Ρ Ε φ implies (x Ρ,) (/η) = 0 for all Pi Ε φ, i = Ι,.,.,η. Proo/. αΡ + (l - a)Q E φ for P,Q e φ, α G [0, 1] implies Σ?αίΉ e φ for Pi G φ, i = l, ... , n and Oj > 0, i = l, . . . , n, with j^" c*i = 1. Hence

1.5 Completeness

21

The right hand side is homogeneous in (αϊ,. ..,α η ). Hence this relation also holds true without the restriction £)" α»=1. Being a polynomial in ( α ϊ , . . . , αη) which is identically 0 for all a» > 0, i — 1,..., n, its coefficients are 0. The coefficient of Π" ai 1S / _^

ij/\JTri'/

\.

with the summation extending over all permutations (ii,... ,z n ) of (1,... , n). Since fn is permutation invariant, all terms are equal and therefore equal to 0. D

Theorem 1.5.10. Assume that φ is complete and closed under convex combinations. Then {Pn : P G φ} is symmetrically complete for every n G IN. Proof. Follows from Proposition 1.5.6 and Lemma 1.5.9.

D

Without φ being closed under convex combinations, the assertion is not necessarily true. Example: {Ν(μ^ : μ G IR} is complete, yet {N? ^ : μ G IR} is not symmetrically complete: The permutation invariant function η

l-nxl - (n ι

fulfills N ( f

n

) = 0 for μ G IR.

Corollary 1.5.11. Let $ be a class of functions f : (X, A) -> (H,B). Let φ be the family of all ρ -measures P\A which are equivalent to some σ-finite measure μ and which fulfill P ( \ f \ ) < oo for f e $. Then {Pn : Ρ € φ} is symmetrically complete for every n G IN. Proof. Follows immediately from Lemma 1.5.2 and Theorem 1.5.10.

D

If φ0 = {Ρ € φ : P(u) = 0}, where u is a given function, the family {P : Ρ € φ0} is not symmetrically complete: For any permutation invariant function / n _i, n

/

«(rc^/n-iCxi, . . .,xv-i,xv+it. . . ,xn)P(dxl) . . .P(dxn} = 0 (1.5.12) v=\

for P G φ0 ·

An interesting result of Hoeffding (1977, pp. 279/80, Theorems IB, 2B; see also Fr ser, 1954, p. 48, Theorem 2.1 for a more special result) implies that any permutation invariant function with expectation zero under Pn for every

22

1. Sufficiency and completeness

P e φ0 is °f the type (1.5.12) if φ consists of all p-measures equivalent to a σ-fmite measure μ. Moreover, if u is //-unbounded (i.e. μ{\η\ > c} > 0 for every c > 0), then {Pn : Ρ e φ0} is boundedly symmetrically complete.

1.6 Exponential families Surveys basic properties of exponential families, including results on minimality and completeness of sufficient statistics.

Exponential families play an important role in nonasymptotic parametric theory for two reasons: (i) Many important families of distributions are exponential, (ii) the practically useful results of nonasymptotic statistical theory are more or less limited to exponential families. The reader interested in a more complete presentation is referred to Barndorff-Nielsen (1978) or L.D. Brown (1986). Definition 1.6.1. A family φ of p-measures is of exponential type if it has — with respect to some σ-finite measure — densities of the following type. m

x -» C(P)g(x}^p[ai(P)Ti(x)],

(1.6.2)

with Ti: (Χ,Α) -> (IR,B), i = 1,... ,m. Observe that the factor g in (1.6.2) can always be eliminated by an appropriate choice of the dominating measure. The p-measures in φ are mutually absolutely continuous. To simplify our notations we assume throughout the following that the dominating measure μ is equivalent to φ. In the sequel we assume that (1.6.2) is a representation where the functions a,, i = 1,..., m, are affinely independent (i.e. Σ™ c^a^P) = CQ for all Ρ € φ implies Q = 0 for i = 0,1,..., m) and the functions T,, i = 1,..., m, are affinely μ-independent (i.e. J^™CiTj(z) — CQ for μ-a.a. χ Ε Χ implies Ci = 0 for i = 0, l , . . . , m ) . Exercise 1.6.3. Show that such a "minimal" representation always exists. If φ is of exponential type, then so is {Pn : P e φ}: η

TO

JJ (C(P}g(xv} exp[^a, v=l

i=l

1.6 Exponential families

23

t=l

with C„(P) - C(P)n, o\\ for η 6 INo, where || · || denotes the Euclidean norm. As (εη : η e NO} is a bounded subset of IROT, there exists EQ and INi C INo such that (e n )n€Ni —> ^o- We shall show that (α η ) η£ Κι —> αο· Since the representation (1.6.4) is assumed to be minimal, Pao is nondegenerate, i.e. its support is not contained in an (m — l)-dimensional flat. Hence, we have Pao{t e IRm : ε$ί = r} < 1 for all r e IR. It is easy to see that this implies the existence of r',r" € IR with r' < r" such that Pao{t e lRm : ejt < r'} > 0 and Pao{t G IRm : ε^ί > r"} > 0. Therefore, there exists a bounded open set B' C {t e IRm : ejt < r'} with Pao(B'} > 0 and a bounded closed set B" C {t e IRm : ε^ί > r"} with Pao(B") > 0. Let s',s" e IR be such that r' < s' < s" < r". As jeji-ejil < ||εη-ε0||·||ί||, the boundedness of B' implies the existence of n' e INi such that η > η' forn e INi implies \£^t— e$t\ < s'—r'y and therefore ε^ί < s' for all t e jS'. Hence a„t - a^t < \\an - ao\\s' for all t 6 B' and n e INi with n > n'. Since (Ρατι)η6Μι ~* Pa0 weakly, we obtain Pao(B'}< lim Pan(B')= lim n€Mi

1

lim C(an)exp[||an - OQ\\S'].

As Pa0(B') > 0, this implies C(a0) < lim C(on)exp[||an - a0||s'j. The dual argument yields lim £7(an)exp[||an - a0||s"] < C(a0).

n€l^i

Hence lim (7(αη)βχρ[||αη -α0||5"] , lim exp[L on - a0 (s - s J) < —M-—— -^-.,— f < 1. hm Cia^jexp \\\an - an\\s'\

T—

Γιι

n / //

/\1 ^--

n

This implies lim ||an — ao|| = 0, since s" > s'.

D

Theorem 1.6.9. Let φ be an exponential family of p-measures Ρ with densities (1.6.2). The statistic S : (X,A) -» (IR m ,B m ), defined by S(x} := (Τι(χ), . . . , T m (x)), is sufficient for φ. 5 is minimal sufficient if the functions Ρ —> a» (P), i = 1, . . . , m, ore affinely independent.

26

1. Sufficiency and completeness

Proof. With P0 € φ fixed, let

Then Ρ has P0-density /IP o S. (i) By Factorization Theorem 1.2.10 this implies that 5 is sufficient. (ii) To prove minimal sufficiency, let φ0 C φ be a countable subset such that {(ai(P),...,a m (P)) : Ρ € %} is dense in {(ai(P),. . . ,a m (P)) : Ρ 0} with b > 0 fixed, and ]T)" log xv is complete sufficient for every subfamily {Γ£ b : b > 0} with α > 0 fixed.

28

1. Sufficiency and completeness

If the interior of A := {(αι(Ρ), . . . ,a m (P)) : P 6 φ} is empty, but A is not contained in a (τη — l)-dimensional flat, 5p is called a curved exponential family. In this case, the sufficient statistic (Ti, . . . ,T m ) is minimal (according to Theorem 1.6.9.). It may be complete or not. Example 1.6.15 presents an exponential family, where this sufficient statistic is complete. Theorem 1.6.23 shows that (7i,.. . ,T m ) is not complete, if (fli(P), . . . ,a m (P)), Ρ e φ, are polynomial dependent and P o (Tl5 . . . , Tm) < A m . Example 1.6.15. A curved exponential family may be complete. For α > 0 let Π0 denote the Poisson distribution with parameter a, and let Ρ,, := IV ι χ Πβ,φΐ^],

t? > 0.

Since

this is a curved exponential family. We shall show that {Pj : ϋ > 0} is complete. The existence of some function / : ({0} U IN) —> IR fulfilling fy

oo

oo

Σ Σ /(Μ)ίΜ(Μ)} = 0

for ϋ > 0

k=0 t=Q ty

is equivalent to the existene of some function g : ({0} U IN) —> IR fulfilling 0

fortf>0.

(1.6.16)

k=0 £=0

In the following we shall show that g Ξ 0, hence / = 0. From (1.6.16), applied for Ό — 1, we obtain oo

oo

££>(*, 01 exp[-/|< oo.

(1.6.17)

k=oe=o Since for k e {0} U IN, t e IN, m 6 TL and ΰ > 1, relation (1.6.17) implies oo

oo

lim Σ \]g(k,£}umexp[-ue} = 0 ϋ ^°°ί^7^ι Rewriting (1.6.16) as oo

for m (Ξ 2.

(1.6.18)

oo

^] = 0

k=oe=i

for ύ > 0,

(1.6.19)

1.6 Exponential families

29

we obtain for ϋ —» oo that p(0,0) — 0. Assume now that g(k, 0) = 0 holds true for k = 0,..., Κ - 1. Multiplying both sides of (1.6.19) by ύκ and letting ϋ -> oo, we obtain from (1.6.18) and (1.6.19) that g(K,Q) = 0. Hence g(k,0) = 0 for all k 0.

(1.6.20)

k=Q£=L

Multiplying both sides by exp[Li?] and replacing ί by t + L, we may rewrite relation (1.6.20) as oo

oo

Σ Σ 2(fc' t+L)u~k exp[-^] = 0

for all ϋ > 0.

fc=01=0

This is equation (1.6.16) for the function (fc,£) —> g(k, £+L), sothat 0 let ^|B+ denote the exponential distribution with Lebesgue density χ —> ϋ~1€χρ\—χ/ϋ},

χ > 0.

Given c > 0, let P0 := EU ° (x -* xl(o, c )(^) + cl[ Ci00 )(x)). The p-measures P,s|IBn(0,c] are mutually absolutely continuous. It is easy to check that P$ has PI -density

This is an exponential family with ΓΙ(Ϊ)=Χ> a 1 (P u ) = -t?-1, Since a 1,02 are affinely independent, η

η

S„(ii ,...,!„) := Ε^'Σ1^}^)) l

(1.6.22)

l

is minimal sufficient for {P$ : ϋ > 0} by Theorem 1.6.9, but not complete. The following theorem, due to Wijsman (1958, p. 1031), explains why curved exponential families are usually not boundedly complete. Theorem 1.6.23. Let φ be an exponential family with coefficients

30

1. Sufficiency and completeness

which are polynomial dependent, i.e.: There exists a polynomial Π : ]Rm —>· 1R such that Π(αι(Ρ),.. . ,a m (P)) = 0 for every Ρ e φ. Assume that Ρ ο (Γι, . . . ,Tm) < λ"1. Γ/ien ί/ie statistic (Γι, . . . ,T m ) is ηοί complete. If the \m -density of Ρ ο (Γι, . . . , Tm) is bounded away from 0 on some sei of positive Xm -measure, then (Ti, . . . ,T m ) zs noi even boundedly complete. Since Ρ ο (Τι, . . . ,Tm) 0 fixed, and £]"(zi/ —A 4 ) 2 is complete sufficient for the subfamily {Ν? σ3 0} with μ € 1R fixed. For the subfamily with given coefficient of variation, Ρμ := Ν(μ ,^μ»), Sn remains minimal sufficient. {Ν^μ 0}. Since a\ , 02 are polynomial dependent (we have c2ai (Ρμ)2+ = 0), Sn is not complete. For instance " < °} = Φ(- \/^/c) for every μ > 0.

1.7 Auxiliary results on families with monotone likelihood ratios

Contains auxiliary results on families with monotone likelihood ratios, and establishes that exponential families are the only ones which have monotone likelihood ratios for arbitrary sample sizes.

32

1. Sufficiency and completeness

Throughout this section let θ = ($', ϋ") be an interval. To avoid technicalities irrelevant for statistical applications, we assume that the p-measures P#, ϋ € Θ, are mutually absolutely continuous. We start our considerations with some auxiliary results concerning p-measures Q0| IB. Let μ| IB denote a dominating measure, q(·, ύ) a μ-density of Q$. If Q$, "d e Θ, are mutually absolutely continuous, we may assume w.l.g. the existence of a set BQ C Β such that q(t, ϋ) > 0 for t e BQ, and Q$ (Bo) = 1 for ι?€θ. Definition 1.7.1. The family {Q$ : ϋ e Θ} has isotone likelihood ratios if the densities q(-,u) can be chosen such that t -» g(i, t?2)/i(t, t?i) is isotone if τ?ι < ι?2·

(1.7.2)

This condition is equivalent to 0 -> g(i2,0)/9(ti,0) is isotone if ij < t 2 .

(1.7.3)

The following criterion is due to Karlin (1957, p. 283, Theorem 1). Criterion 1.7.4. If ϋ —> g(i, i?) is differentiate for every t e BQ, ί/ie family : i? G θ} /ias isotone likelihood ratios i f f t — * · ^ logg(i, ι?) is isotone. Proof, (i) If t —> g(i,i92)/9(i,^i) is isotone for all $1 < 1?2, the function

is isotone. soone. (ii) If (^q(t^))/q(t^}

< (^q(s,u))/q(s,ti)

for ί < s, we obtain

Hence ϋ —> q(s,u)/q(t,u) is isotone, which is equivalent to (1.7.2).

D

Example 1.7.5. Let p|IR be the positive Lebesgue density of a p-measure. Let {Qu : ϋ e IR} be the location parameter family generated by p, i.e. Q$ has Lebesgue density χ —» p(x — ϋ}. Then {Q$ : ϋ e IR} has isotone likelihood ratios iff QQ is strongly unimodal (i.e. log ρ is concave). If ρ is differentiable, this follows immediately from Criterion 1.7.4. For the general case see Lehmann (1986, p. 509, Example 1). Examples of location parameter families having isotone likelihood ratios are normal, logistic and Laplace distributions.

1.7 Auxiliary results on m.l.r. families

33

Further examples of families with isotone likelihood ratios are the noncentral ί-, χ2- and F-distributions (see Lehmann, 1986, p. 295 and p. 428, Problem 4(i)), moreover the distributions of the correlation coefficients (see Example 1.7.12 and Exercise 1.7.13). Monotonicity of likelihood ratios implies other order relations and relations of topological nature. (See Pfanzagl, 1969a, for more details.) Definition 1.7.6. {Q$ : Ό G θ} is stochastically isotone if ύ —> Q$(— οο,ί] is antitone (equivalently: ϋ —> Q$[t, oo) isotone) for every t e JR. Proposition 1.7.7. A family with isotone likelihood ratios is stochastically isotone. More precisely: If {Q# : ι? £ Θ} has isotone likelihood ratios, then (i) t —> QUI(— oo>i]/Qtfi(— οο,ί] and t —> Q$3 [t, oo)/Q^1 [t, cc) are isotone on the interior of the "convex support" i/$i < i?2/ (ii) either one of the relations under (i) implies that ϋ —» Q$(— oo, i] is antitone for every t 6 IR. Proof, (i) ΌΙ < i?2 implies

,


h^2(y)/h^1(y)

is isotone on y if t?i < t?2-

Recall that S is sufficient for {P^ : τ? e Θ} by Factorization Theorem 1.2.10. An alternative definition starts from the weaker assumption that for arbitrary t?j G θ with $1 < ι?2, there exists an isotone function H^l^7 such that p(x, ύ·ζ)/ρ(χ, ΌΙ) = Ηΰι,ϋ2(Τ(χ)) for μ-a.a. χ Ε X, with the exceptional //-null set depending on $l,t?2· Then the densities can always be chosen such that (1.7.11) holds true (see Pfanzagl, 1967). If {P0 : ϋ e Θ} admits a sufficient statistic S with isotone likelihood ratios, then the family {P^ o S : ύ e θ} of p-measures on Β has isotone likelihood ratios in the sense of Definition 1.7.1. Observe that Definition 1.7.10 requires much more: that S is, in addition, sufficient for {P$ : ϋ G θ}. Example 1.7.12. Let φ = {^(nMl)M2) Ο, ρ 6 (-1,1)} and V* /x \ [ τ

Z^ \ v

α ^ I j i / l j ) · · · i \ n,yn)) ·— ΤΙ

tt-t

Ι

Ιx Ύ*

*)/

1 1

* —'

— \1 /1 7 1

xι*

n)\yv

— \ It 1

yn)

^^_^^_«^_^^^^^^^^^^^^^^^^^^^_^_^__^_^_^^_

n

n

,2/f ~yn)~) v=\

1/2

1.7 Auxiliary results on m.l.r. families

According to Fisher (1915) the distribution of rn under JVJ* Lebesgue density •η τ . - o -

σ2 σζ

35

Λ has

r e (-1,1),

with

-

fc=0

Μ" ·

Since £(ρ η (»·,02)/Ρη( Γ >0ι)) > 0 for ρι < ρ2, the family {^("1ιμ2ι Ο, ρ G (—1,1)} has isotone likelihood ratios. The statistic rn is, however, not sufficient for ψ. Therefore, Theorem 4.5.2 on the existence of uniform most powerful tests does not apply. Tests for the hypothesis ρ < QQ using a critical region {rn > Οη(ρο)} are certainly useful (thanks to their asymptotic properties for large n). They are, however, not uniformly most powerful (with n fixed). Exercise 1.7.13. Let φ = {Ν^>μ^ ^ : μ, G R, σ2 > Ο, ρ e (-1,1)}, and

v=\

v=\

According to Hsu (1940, p. 418, relation (36)) the distribution of Rn under Ν σ * σ ΐ has Lebesgue density

Since ^ logpn(r, ρ) = (n - 1)(1 - ρτ)~ 2 > 0, the family {^(μ,^,^,^,β) ° ^ = / e IR, σ2 > Ο, ρ e (-1, 1)} has isotone likelihood ratios by Criterion 1.7.4 (but Rn is not sufficient for φ). Example 1.7.14 (see Ghurye and Wallace, 1959). Let {P^|B : ϋ 6 IR} be a location parameter family with positive Lebesgue densities and isotone likelihood ratios. Then, for every n € IN, the family of convolution products (i.e. {PU ° ( ( ^ i j - ' - i ^ n ) —* Za 3 ^) : ^ e E*·}) nas isotone likelihood ratios, but Σ™ xv is sufficient for the location parameter family {P£ : ύ € IR}, n > 2, only if PO is a normal distribution (see Dynkin, 1951, and Ferguson, 1962).

36

1. Sufficiency and completeness

Proof. The location parameter family generated by PQ has isotone likelihood ratios iff PQ is strongly unimodal (see Example 1.7.5). Since strong unimodality of PO implies strong unimodality of the convolution product P*n (by Corollary 2.3.24), the location parameter family generated by P*n has isotone likelihood ratios by Example 1.7.5. D Proposition 1.7.15. Let {P$ : ϋ e Θ} be an exponential family, with μdensities χ -» C(d}g(x) exp [a(u)S(x)] ,

ϋ e Θ.

If a is isotone, the family {P^ : ϋ G θ} has isotone likelihood ratios in S, with

y e JR. Recall that this proposition covers, in particular, the following special cases: Binomial, Poisson, {Ν(μ 0} and {Γα){, : α, 6 > 0} with either parameter fixed. Since families of p-measures admitting a sufficient statistic with isotone likelihood ratios have various favorable statistical properties (see Theorem 4.5.2 and Theorem 5.4.3), it is of interest to know which families have a sufficient statistic with isotone likelihood ratios for arbitrary sample sizes. Regrettably, one-parameter exponential families are the only ones. For the following theorem see Borges and Pfanzagl (1963, p. 112, Theorem 1) or Heyer (1982, p. 98, Theorem 14.2). It characterizes 1-dimensional exponential families by the existence of isotone likelihood ratios for infinitely many sample sizes. A characterization without monotony of likelihood ratios requires severe regularity conditions on the densities and/or the sufficient statistic. The first results of this kind go back to Darmois (1935), Koopman (1936) and Pitman (1936). The best result available so far is due to Hipp (1974) who requires for some sample size n > 2 the existence of a sufficient statistic fulfilling a local Lipschitz condition, and p-measures on (IR, IB) equivalent to the Lebesgue measure. That something beyond continuity of the sufficient statistic is needed follows from the existence of a continuous function from Hn into IR which is 1-1 An-a.e. (see Denny, 1964). Theorem 1.7.16. Let φ be a family of mutually absolutely continuous pmeasures on a measurable space (X, A). Let PQ\ be ap-measure equivalent to



Assume there exists an infinite subset INo C IN, containing 1, with the following property: For every n € INo there exists a function Tn : (Xn,An) —> (IR, IB) and isotone functions Hp , P e φ, such that Η ρ oTn is a PQ -density o/P n .

1.7 Auxiliary results on m.l.r. families

37

Then there exists a function S : (X, A] -+ (H, B) and functions C : φ —> (0, oo) and a : φ —»IR such that x-+C(P}exp[a(P)S(x)], is a PQ-density of P. Proof, (i) Writing Τ and HP for ΓΙ and H^ we obtain for all π e JN0 P n

o -a-e.

(1.7.17)

We shall replace (1.7.17) by a similar set of equations which hold everywhere. These equations will then yield exponentiality by Lemma 1.7.41. (ii) Let C denote the class of all sets {x e X : HP(T(x)) < c} with P e φ, c > 0. Since the elements of C are of the form {x e X : T(x) < k} or {x € X : T(x] < fc}, the class C is totally ordered by inclusion. By Lemma 1.7.27 there exists a function SO : X —+ [0,1] fulfilling S0(y) = P0{x € X : S0(x) < S0(y)}

for all y e X

(1.7.18)

such that for every Ρ e φ, c > 0,

{χ y n ) ^ Ap(c). Since ( y i , . . . , y n ) € {~)™AP^Q) was arbitrary, this implies Pl^^p^Cj) C Ap(c). Similarly, Ροι(Πι° APi(°i)) ^ PO(AP(C}} implies Pl^Ap^Ci) D ^4p(c). This verifies condition (1.7.30). Hence Lemma 1.7.27 implies the existence of a measurable function Sn : Xn -» [0,1] such that for every P e φ, c > 0, τ < ^r\\ •JUl/ ^ Ι ι _^ i/=l

with 4n)(c) := J^{(xi,...,x„) 6 Xn : EI?Gp(So(x v ))
([0,1], BO) which is a conditional expectation of IT-^I given S, with respect to every P e φ, i.e. 1

CnS-lB)

forBe

, ΡT-'c ° S) = P(T~1C), the functions 5, Τ are P-independent. (ii) Let PO e φ be fixed. Since P o T\C = PO o T\C, relation (1.8.3), applied with B = y, yields P(T-ic°S-Po(T-lC))=0

for Ρ 6 φ.

Since φ S is boundedly complete, this implies ipT-icoS = Po(T-1C7) From this, P-independence of 5, T follows as in (i).

P-a.e. D

46

1. Sufficiency and completeness

Observe that bounded completeness in 1.8.2(ii) cannot be dispensed with. 1 The sample correlation coefficient rnn is ancillaryj for 1NJ \ : ^iι €^ K 2 2 ι (μι,μ2,σΙ,σ^,ρο) 1R, σ? > 0} (with 6o φ 0 fixed) and (see (4.7.16)) S,((zi,2/i),. · - ,(*n,2/n)) = (^n>y n > s nG?0> s n(l/)> r nfe]/)) is equivariant and minimal sufficient for this family, but rn and 5* are obviously not stochastically independent. Remark 1.8.6. A number of applications of Theorem 1.8.2(ii) is of the following type: {Ρ^,τ : tf e θ, τ 6 Τ} is such that the conditions of 1.8.2(ii) are fulfilled for every subfamily {Ρ$,τ : Ό G θ}, with τ G T fixed, i.e. there exists a statistic S\X which is sufficient for {Ρ^,τ '· "& G θ}, for every r 6 T, and {Ptf.T o S : ϋ G θ} is boundedly complete. Then S and T are stochastically independent under P$iT for every i? G θ, τ G T, provided P^)T ο Τ depends on τ only. Example 1.8.7. For a, b > 0 let ra,b\ IB+ denote the gamma distribution with Lebesgue density

x

~^

xb l e x p

~

~x°'

x>

°-

are Then xn and xn/(YYix^) stochastically independent under Γ" 6 for all o, 6 > 0. The statistic xn is sufficient for {Γ")Ζ> : α > 0} for every b > 0, and {Γ" 6 ο χη : a > 0} is complete.

Moreover, the distribution of x n /(n"^i/) " under Γ" 6 does not depend on a:

l

Hence the independence of xn and ϊη/(Πι χ") 1.8.6.

l/n

follows according to Remark

Example 1.8.7 infers properties of statistics for a given family of p-measures. For a converse result, characterizing the gamma distributions by properties of statistics, see Laha (1954) and Lukacs (1956, Section 7). Example 1.8.8. For the family {Ν(μ>σί) : μ G IR, σ2 > 0}, the statistics xn and T n (xi,... , x n ) are stochastically independent under Ν? σ^, for every μ G IR, σ2 > 0, provided T„(XI + μ,..., χη + μ) = Τ η (χι,..., χ η ). Since xn is sufficient for {Λ^ισ2) : μ G IR} for every fixed σ2 > 0, and {Ν£μ>σ2) ο χη : μ e IR} is

1.8 Ancillary statistics

47

complete, this follows according to Remark 1.8.6. As a particular consequence are we obtain that xn and (η"1 Σ"(χ«' ~~ ^n) 2 ) stochastically independent under Ν(μ 0. ι

In

•i

lt\

Exercise 1.8.9. The stochastic independence of xn and (n~l Y^(xv—xn)2) is not implied by the equivariance and invariance, respectively, under the transformation χ —> χ + μ. Show that xn (the sample median) and n~l Σ™ \xv — xn\ are not stochastically independent. (Hint: 3 € ( m , m + !),!>,,-*3|€ (0,1)} \x.-x3\ € (0,1)} tends to 0 as m tends to infinity.) Exercise 1.8.10. Show that (xn,yn) and (s£(z),Sn(2/),r n (x,y)) are stochastically independent under Ν? 2 σζ σ2 )· If ί? = 0, then rn(x, y) is stochastically independent of (z Proposition 1.8.11. Let Τ : (Χ,Α) —» (^,C) 6e ancillary. Assume that S : (Χ, Λ) —> (K, ) and Τ are stochastically independent under every Ρ 6 φ. Then the following holds true. For f : (Υ χ Ζ, Β χ C) —> (W,T>), the conditional distribution of χ —> /(5(x),T(x)), (X, .4) suc/i that fo(S(x),T(x))



forx^X,

then S is sufficient for φ. This result has the following intuitive interpretation: Let χ be a realization governed by P. If instead of χ only S(x) is known, compute f0(S(x),T(u)}, where u is a realization governed by PQ. The random variable generated by this combined experiment (which can be carried through without the knowledge of P) is distributed according to P.

48

1. Sufficiency and completeness

Proof. Since Μ(·,£>) is measurable for every D 6 T>, and since M(y, ·)|Χ> is a p-measure for y € Y, it remains to be shown that y —> M(y, D) is a conditional expectation of ID(/ ° (£>^))> given S. Since 5, T are P-independent, and since P o T = PQ ο Γ, this follows from J

lD(f(S(x),T(x)))lB(S(x)}P(dx)

= f ID (/(y, *)) i B (y)P ο s-(di,)P ο r(dz) =

JlD(f(y,z))lB(y)PoS(dy)P0oT(dz)

= / M(y, /?)ls(yjP ο S(dy)

for β e 5.

D

Criterion 1.8.13. Let S : (X, ) -> (Y, B) and T : (Χ, Λ) -> (Z,C) te measurable functions, P\A a p-measure. (i) // there exists a constant version of Ps\T-\c for every C G C (which necessarily is P(T~1C)), then S andT are P-independent. (ii) 7/5 andT are P-independent, then P(T) e PST (if T is P-integrable). Since P -independence of S andT implies P -independence of S and IT-IC for every C 6 C, this implies P(T~1C) e PslT-iC. Proof, (i) If there exists a constant KC G Ρ81τ-ιθι

we nave

f°r all 5 e

P(T~1C nS~1B)= l KclB(y)P o S(dy) = KCP(S-1B}. Applied for Β = Y this yields Kc = P(T~1C), hence P(T~1C Π S~1B) = P(T-1C}P(S~1B}. (ii) P-independence of S and T implies P-independence of Is- IB and T for every B e #, hence j T(x}ls-iB(x}P(dx] Therefore, P(T) e P5T.

= P(T)P(S~1B) = J

P(T)ls-iB(x)P(dx). Π

The reader interested in more details about the concept of ancillarity may consult the survey paper by Lehmann and Scholz (1992).

1.9 Equivariance and invariance

49

1.9 Equivariance and invariance Introduces transformation groups on X and equivariant statistics. A family of mutually absolutely continuous p-measures generated by a transformation group is dominated by the Haar measure. Every minimal sufficient statistic is almost equivariant. The conditional expectation of an almost invariant function with respect to an equivariant sufficient statistic is almost invariant.

Let (X, A) be a measurable space and G a group of 1 — 1 transformations α : X —> X, endowed with a σ-algebra T>, such that (a, b) —> ab is T> x V. D-measurable α —> α _, is T), D-measurable,

(1-9-1')

(α,χ) —> ax is T> x A, ^-measurable.

(1.9.1")

and

Throughout the following we assume that aA = A and aD = V for a e G. Gx := {ax : a € G} is the orbit of x, generated by G. Two orbits Gx', Gx" are either identical or disjoint. Definition 1.9.2. A function S : X —> Υ is equivariant if S(x') = S(x") implies S(ax') = S (ax")

for x', x" e X, a eG.

(1.9.3)

The function S is invariant if S (ax) = S(x)

for χ Ε Χ, α € G.

(1.9.4)

An invariant function is constant on each orbit. A function is maximal invariant if it attains different values on different orbits. Given a measure μ\Α, a measurable function S is μ-almost equivariant [μalmost invariant] if relation (1.9.3) [respectively (1.9.4)] holds for all χ',χ" outside a μ-null set which may depend on α € G. If 5" : X —> Υ is equivariant, the transformation group G, acting on X, induces a transformation group, say G, acting on Υ := S ( X ) , with elements α defined by uy:=S(aS-l{y}),

y e Ϋ.

(Notice that S(aS~l{y}) contains a unique element of y.) The induced transformation group G fulfills (1.9.1) with Β — n Y in place of A and T>, the set of all subsets ~D of G such that (a e G : e ~D} eT>,in place of T>. The map α —> a is a homomorphism, i.e. ab = .

50

1. Sufficiency and completeness

A measure v\D is left invariant if

v(aD) = v(D)

for D e T>, a e G.

The definition of right invariance is an obvious modification. The group G, endowed with a Hausdorff topology U, is a topological group if the operations (a, 6) —> ab and α —> α"1 are continuous. Theorem 1.9.5. If G is a locally compact topological group, there exists a left (as well as a right) invariant regular measure, the so-called left invariant Haar measure. Either of these measures is unique up to multiplication by a constant and finite [σ-finite] if G is compact [σ-compact]. Proof. Nachbin (1965), p. 65, Theorem 1, and p. 75, Proposition 4.

D

As a typical example of a transformation group we mention the linear transformations on IRP, assigning to each pair (a, c) £ R χ (Ο, οο) the transformation χ -» α + ex, χ e Hp. The group operation is (a", c")(a', c') = (a" + c"a', c"c'}; the pertaining left invariant Haar measure on (Rx(0, oo), IB χ 1Β+) has λ2density (u, υ) —»υ" 1 , (u, v) e IRx(0, oo). Starting from a p-measure P\A, let Ρα\Λ be the p-measure defined for α € G by Pa(A) := P(a~1A),

A e A.

In other words, Pa is the p-measure induced by P and the map χ —» ax. The relationship (Pa)b = Pba is straightforward. Lemma 1.9.6. Let v\D be left invariant and σ-finite, and μ\Ό σ-finite. Then v(D] = 0 implies μ(α£>) = 0 for v-a.a. α € G. Proof. For D 6 V with v(D] = 0 we have

f

=μχ v{(a,b) , let Qa(D) :- Q(a~1D) for D e V, a e G. If {Qa : a e G} is dominated by some σ-finite measure, then this family is dominated by the Haar measure.

1.9 Equivariance and invariance

51

Proof. Let μ\Τ> denote the dominating σ-finite measure. If v(D} — 0, there exists by Lemma 1.9.6 an element Q G G such that μ(αο£>) — 0. Since oD) = 0. D The following theorem, giving conditions under which any almost invariant statistic is equivalent to an invariant statistic, is due to Stein. The same result under conditions of a different nature occurs in Berk and Bickel (1968, p. 1573, Theorem). Theorem 1.9.8. Assume there exists a p-measure Π|Ζ> such that Π(£>) = 0 implies Il(Da) = 0 for a G G. Assume, moreover, that Τ : (Χ, Λ) —> (Z,C) is almost invariant with respect to P\A, where C is countably generated, and {z} G C for ζ G Z. Then there exists an invariant function Τ : (X, A) —» (Z,C) such that Τ = T P-a.e. Proof. Let A0 := {A G Λ : Ρ (Α Δ aA) = 0 for α G G} and AI := {A G A : A = aA

for α e G}.

We have AI C AQ. We shall show that AQ C AI (P). By Lemmas 1.10.3, 1.10.4 and 1.10.6(ii), this implies that for any ,Αο-measurable function T there exists an .A/-measurable function T such that P{T φ Τ} = 0. We have (T^CAaT^G) C {x G X : T(a~lx) / T(z)}. Hence Τ is «4o~measurable if it is almost invariant. Since any Αι, C-measurable function is invariant, this implies the assertion. It remains to prove AQ C AI (P). For A e AQ and α G G we have \A(X) = 1,4(02) for P-a.a. χ e X. This suggests to define the invariant set equivalent to A by A! := {x e X : / lA(bx)U(db) = l}. We shall show that A! e AI. x G -A/ implies that Dx := {b e G : bx £ A} is a Π-null set. Hence Dxa = {b e G : ba~lx $ A} is a Π-null set too, so that J lA(ba lx)Ii(db) = 1, i.e. a~lx E AI, or z e aAj. Therefore, AI C aAi for α e G, which implies AI = a AI for a e G. It remains to be shown that Ρ (Α Δ AI) = 0 for A € AQ. If A € AQ, we have I \IA(X) - U(ox)|P(dx) = 0

for α e G,

hence, by Pubini's theorem, I \IA(X) ~ Maz)|II(da) = 0

for P-a.a. x e X.

This implies lA(x) = / 1^(αχ)Π(£ία) for P-a.a. x e X, hence Ρ(ΑΔΑι) = 0. D

52

1. Sufficiency and completeness

Lemma 1.9.9. If G is σ -locally compact, then there exists a p-measure Tl\D such that Tl(D) = 0 implies U(Da) = 0 for a e G. Proof. By Theorem 1.9.5 there exists a right invariant Haar measure which is σ-finite, say v\D. Let G = ΣΤ Dn with Dn E V, 0 < v(Dn) < oo. The p-measure H|Z?, defined by

2~nv(D η

U(D) := l

is mutually absolutely continuous with respect to v. Hence Π has the asserted property if ν does. This is the case, since ν is right invariant. Π Theorem 1.9.10. Let φ be a family of mutually absolutely continuous pmeasures P\A which is closed under G, i.e. Ρ e φ implies Pa e φ for α 6 G. // S : (Χ, Λ) —» (Υ, Β) is minimal sufficient for φ, then S is ty-almost equivariant. Proof. Let A e Λ be arbitrary. Since S is sufficient, there exists a conditional expectation of laA, given S, say ψα : (F, B) —» ([0,1], BO), which does not depend on P, i.e. j given S, then hc°S is φ-almost invariant by Proposition 1.9.11. According to Theorem 1.9.8, Lemma 1.9.9 and Lemma 1.10.6(i), there exists a version he of the conditional expectation such that hc°S is invariant. Since {S(ax) : a e G} — S(X), this version is constant on S~1Y and therefore equal to P(T~1C}. Hence Λ

J

lc(T(x))lB(S(x)}P(dx)

Λ

=

P(T-iC)lB(S(x))P(dx)

= P(T~1C)P(S-1B)

for B e B, CeC. D

Since any invariant statistic is ancillary for {Pa : a G G}, this result comes close to Basu's Theorem 1.8.2 (which requires the sufficient statistic to be boundedly complete) . It seems doubtful whether there are natural applications which are not covered by Basu's theorem. A result which comes close to Corollary 1.9.15 for the special case of a location parameter family was obtained by Ghurye (1958, p. 160, Lemma 2). Corollary 1.9.15 was stated explicitly (without regularity conditions) by Eraser (1966, p. 148, Theorem 2). For a more detailed study of this and related questions see Hall, Wijsman and Ghosh (1965), Berk (1972), and Landers and Rogge (1973).

To avoid pitfalls, the reader interested in results relating "invariance" and "sufficiency" should be aware of the fact that some results of this type might be close to trivial: If the equivariant sufficient statistic is real valued and continuous (for some sample size greater than one), then the family of p-measures can be represented as a location parameter family of normal distributions, or a scale parameter family of gamma distributions. This theorem was proved by Dynkin (1951) for 1-dimensional exponential families. For a proof under mild regularity conditions (and more details on the history of this subject) see Pfanzagl (1972). In this paper, the transformation group is assumed to be commutative

1.10 Conditional expectations, conditional distributions

55

(a condition which is necessary anyway). Hipp (1975) avoids assuming commutativity from the beginning by strengthening the topological conditions on the transformation group (assuming local compactness).

1.10 Appendix: Conditional expectations, conditional distributions

Contains definitions and results on conditional expectations and conditional distributions which are needed in the text. Proofs can be found in Ash (1972), Witting (1985), and Bauer (1990b).

Throughout this section, P is a p-measure on a measurable space (X, A], and AO C A is a sub-σ-algebra. Definition 1.10.1. For a sub-a-algebra AO C A, /0 : (Χ,Αο) -» (1R,B) is a conditional expectation of / e £ι(Χ, Α, P), given AO, if P(fol-A0) = P(flA0)

for A) e A-

(1-10-2)

A function fo fulfilling (1.10.2) always exists. It is unique up to a P-null set. The symbol ΡΛ°/ denotes the equivalence class of all functions fulfilling (1.10.2). In many cases the following variant of Definition 1.10.1 is more natural. Definition 1.10.1'. For a function 5 : (X,A) -> (y, ), /0 : (Y,B) -» (IR,B) is a conditional expectation of / e C\(X, A, P), given 5, if P o S ( f Q l B ) = P(/ls-ijg)

for every B e B.

(1.10.2')

The symbol Psf denotes the equivalence class of all functions fulfilling (1.10.2'). Abusing this notation we occasionally write Psf and P*0/ for a particular element. If fo is a conditional expectation of /, given 5, then /o o S is a conditional expectation of /, given S~1B. Conversely, any conditional expectation fo of /, given 5-1 , can be represented as a contraction of S by the Factorization Lemma 1.10.6.

56

1. Sufficiency and completeness

Lemma 1.10.3. Let (X, A) be a measurable space, and (Y,B) a Polish space. If f : X —> Υ fulfills f~lB C Λ (Ρ) for a p-measure P\A, then there exists f : (X, A) -> (F, B} such that f = /' P-a.e. Proof. There exists (apply Dudley (1989), p. 97, Proposition 4.2.6) a sequence of /~ ^-measurable simple functions /n, n e IN, converging to /. for every η e IN there exists an .Α-measurable function f'n = fn P-a.e. Hence / is Pa.e. the limit of a sequence of ^-measurable functions. Since the domain of convergence of /£, η e IN, is measurable, this implies the assertion. D Lemma 1.10.4. If A is countably generated, then it can be induced by a real function, i.e. there exists f : X —+ IR such that A = f~l IB. Proof. Let A be generated by An, n e IN, and let -"1^.

(1.10.5)

ι

As / is ^-measurable, we have f~l IB C A. It remains to be shown that An 6 f~l IB for n e IN. Let U 0 (i): Relation (iii) implies M(y,5~ 1 {y}) = 1 for Ρ ο 5-a.a. y e y, and hence by Lemma 1.10.24 M(y, >4)l (y) = M(y, Α Π S~1B) = M(y, Α Π S^B n 5-

1.10 Conditional expectations, conditional distributions for P o S-&.&. y e Y. This implies (i) by (iii).

61 D

Proposition 1.10.26. Let M\Y x C be a conditional distribution ofT, given S, with respect to P. Then for any f : (Ζ χ y, C x B) —» (IR, B), the function

f(t,y)M(y,dt) is a conditional expectation ofx—>/(T(x), S(x)), given S, provided this function is P-integrable. Corollary 1.10.27. Let M\Y x A be a conditional distribution o f x , given S, with respect to P. Then for any f e £\(X, A,P), the function y —> f f ( x ) M ( y , d x ) is a conditional expectation of f , given S. Proof of Proposition 1.10.26. We have to show that J /(i, y ) M ( y , dt) exists for Ρ ο 5-a.a. y e Υ, that / f(t,y}M(y,dt)

is ^-measurable,

(1.10.28)

and that I j

f ( t , y)M(y, dt)lB(y)P o S(dy)

= f f(T(x)iS(x))ls-iB(x)P(dx')

(1.10.29) for Β 6 .

Since / is approximable by C x -measurable elementary functions, it suffices to prove relations (1.10.28) and (1.10.29) for / = 1 D , with D e C x B. Let 5 denote the class of all sets D e C x B such that (1.10.28) and (1.10.29) hold true for / = ID. The class S contains C χ Β for C e C, B e B, since M|y x C is a conditional distribution of T, given 5. As »S is a Dynkin system, this implies Ζ such that, for every y e Y, the p-measure M(y, -}\C is induced by U and u —» m(y,u], i.e.

forCeC, yeY, where U\ BO denotes the uniform distribution. Proof, (i) Let φ : Ζ —> (0, 1) be a 1 — 1 C, Bo-measurable map such that y?(C) e BO for C 6 C. (See Parthasarathy, 1967, p. 12, Theorem 2.8, and p. 14, Theorem 2.12 for existence.) Let ψ : (0, 1) —» Ζ be defined by ·φ(η) = ζ if u — φ(ζ), and φ(η) = ZQ (an arbitrary element in Z) if u $ φ(Ζ). Since φ(Ζ) e BO, V is Bo,C-measurable. Moreover, ψ(φ(ζ)) = ζ for ζ € Ζ. (ii) Any measure μ\€ can be induced by the measure μοφ\ BQ and the map ψ : (0, 1) —·» Ζ. Let ν = μ ο φ. We shall show that μ = ν ο -φ. ι/(5) = μ{ζ Ε Ζ : ψ(ζ] G Β] for Β G B0 implies ν(·φ-ιΟ = μ{ζ G Ζ : φ(ζ) G Φ = μ{ζ(=Ζ : φ(φ(ζ}) Ε C} = μ(Ο)

for C 6 C,

i.e. μ = ν οψ.

(iii) Given a Markov kernel N|y x B0, let F(y,i):=JV(y,(0,f])

for y 6 F, t e (0, 1).

It is straightforward to show that y —» F(y,t) is measurable for ί e (0, 1), and that ί —» F(y,t) is isotone and right continuous for y e Y. Hence (y, i) —> F(y,t) is x Bo-measurable by Lemma 6.7.3(ii). Let G : Υ χ (0, 1) -» (0, 1) be defined by G(y,M)iinf{ie(0,l):F(y,t)>u}.

64

1. Sufficiency and completeness

Since F(y, ·) is isotone and right continuous, we have G(y,u)u.

Hence (y, u) —» G(y, u) is IB χ Bo-measurable, and

U{u e (0, 1) : G(y, u) < t} = U{u € (0, 1) : F(y, i) > u} = F(y, i). Hence the Markov kernel N\Y x BQ can be induced by U and the map u G(y,u), i.e. for y € F,

G IB0 .

(iv) Let now N\Y χ BQ be the Markov kernel defined by N(y,B} = M(y,