Computational Statistics: Wolfgang Wetzel zur Vollendung seines 60. Lebensjahres [Reprint 2019 ed.] 9783110844405, 9783110084191

171 75 17MB

German Pages 367 [368] Year 1981

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Computational Statistics: Wolfgang Wetzel zur Vollendung seines 60. Lebensjahres [Reprint 2019 ed.]
 9783110844405, 9783110084191

Table of contents :
Vorwort
Preface
Inhalt / Contents
On the Investigation of Some Statistical Properties of the Most Probable Number (MPN)-Procedure for Estimating the Density of Microorganisms by Use of Computer Simultations
On Bilinear Models for Time Series Analysis and Forecasting
Tests auf Gleichverteilung - Ein Gütevergleich
The Classical Ruin Problem and Electronic Roulette Machines
Price Formation in the Chemical Industry of the Federal Republic of Germany - Estimation of an Econometric Model with Parameter Restrictions Across Equations
Vektorielle AR-Prozesse in der Makroökonomie
The Robustness of Some Distributed Lag Estimators
Robust Estimates in Linear Regression - A Simulation Approach
Nichtlineare Regression: Parameterschätzung in linearisierbaren Regressionsmodellen
A Test for Independence of Dichotomous Stochastic Variables Distributed Over a Regular Two-Dimensional Lattice
Computational Experiences with an Algorithm for the Automatic Transfer Function Modelling
Vergleiche zwischen empirischen und theoretischen Kenngrößen von ARMA-Modellen im Zeitbereich - Eine zusätzliche Möglichkeit der Modell-Validierung
APL and the Teaching of Statistics
Optimale Schichtabgrenzung bei optimaler Aufteilung unter Annahme einer bivariaten Lognormalverteilung
Nichtparametrische Tests auf Zufälligkeit bei verschiedenen Stufen der Diskretisierung von Beobachtungsfolgen
A Linear Combination of Estimators in an Errors-in-Variables Model - A Monte Carlo Study
The Robustness of Sampling Plans for Inspection by Variables
Über eine Verallgemeinerung der Spektralanalyse
On a Generalized Iteration Estimator
Statistical Computing with a Text Editor

Citation preview

Computational Statistics Editors: Herbert Biining • Peter Naeve

Computational Statistics Editors: Herbert Büning • Peter Naeve

W DE

G Walter de Gruyter • Berlin • New York 1981

CIP-Kurztitelaufnahme der Deutschen Bibliothek

Computational statistics: [Wolfgang Wetzel zur Vollendung seines 60. Lebensjahres]/ed. : Herbert Büning; Peter Naeve. - Berlin; New York: de Gruyter, 1981. ISBN 3-11-008419-8 NE: Büning, Herbert [Hrsg.]; Wetzel, Wolfgang: Festschrift

Library of Congress Cataloging in Publication Data

Computational statistics. Festschrift dedicated to Dr. Wolfgang Wetzel. English and German. I. Mathematical statistics - Data processing Adresses, essays, lectures. 2. Wetzel, Wolfgang. I. Biining, Herbert. II. Naeve, Peter. III. Wetzel, Wolfgang. QA276.4.C57 519.5'028'54 81-12585 ISBN 3-11-008419-8 AACR2

© Copyright 1981 by Walter de Gruyter & Co., vormals G. J. Göschen'sche Verlagshandlung, J. Guttentag, Verlagsbuchhandlung Georg Reimer, Karl. J. Trübner, Veit & Comp., Berlin 30. Alle Rechte, insbesondere das Recht der Vervielfältigung und Verbreitung sowie der Übersetzung, vorbehalten. Kein Teil des Werkes darf in irgendeiner Form (durch Photokopie, Mikrofilm oder ein anderes Verfahren) ohne schriftliche Genehmigung des Verlages reproduziert oder unter Verwendung elektronischer Systeme verarbeitet, vervielfältigt oder verbreitet werden. - Printed in Germany. Druck: Karl Gerike, Berlin. - Bindearbeiten: Dieter Mikolai, Berlin.

Wolfgang Wetzel zur Vollendung seines 60. Lebensjahres

Vorwort 'Computers, the second revolution in statistics' war das Thema, das F. Yates für die erste Fisher Memorial Lecture im Jahr 1966 gewählt hatte.

Nun

scheint es ein Charakteristikum unserer Zeit zu sein, im wissenschaftlichen wie im nichtwissenschaftlichen Bereich fortwährend 'Revolutionen' zu verkünden.

Viele dieser 'Revolutionen' entpuppen sich dann im Nachhinein

als das Produkt einer übertreibenden Sprache. Es erscheint daher angebracht, einmal nach der von Yates konstatierten Revolution und ihren Folgen zu fragen.

Der vorgelegte Sammelband versucht,

darauf eine Antwort zu geben. Die erste Idee, einem skeptischen Frager zu begegnen, ist sicherlich, eine Reihe von 'state of the art' Artikeln renommierter Statistiker zusammenzutragen und durch den - einmal unterstellten - positiven Tenor der einzelnen Beiträge die Zweifel an der Revolution zurückzuweisen oder noch besser auszuräumen.

Diesen Weg haben die Herausgeber nicht beschritten.

Sie la-

den den Leser ein, gleichsam dem Wissenschaftler bei der Arbeit über die Schulter zu sehen.

Die Beiträge stellen Momentaufnahmen dar, die zeigen,

welche Rolle der Computer in ihrer wissenschaftlichen Arbeit spielt. Die Herausgeber meinen, daß aus den vorgelegten Artikeln durchaus der Schluß gezogen werden kann, daß der Computer für die Verfasser ein Stück alltäglicher Praxis ist.

Bedenkt man, daß viele der Autoren noch als 'jüngere Wis-

senschaftler' klassifiziert werden können, dann kann man allen Zweiflern an der Aussage Yates berechtigt entgegenhalten:

diese Revolution lebt und ver-

wirklicht ihre Vorstellungen. Spricht man über Computer und Statistik liegt es nahe, das Thema wie folgt zu strukturieren: 1.

Der Einfluß des Computers auf die statistische Theorie.

2.

Der Einfluß des Computers auf die statistische Praxis.

3.

Der Einfluß des Computers auf die Statistik-Ausbildung.

VIII

Vorwort

Eine solche Gliederung schwebte den Herausgebern bei der Vorbereitung des Buches vor.

In der Konzeption, mit der der Verlag für die Publizierung des

Buches gewonnen wurde, finden sich zur Beschreibung der drei Gliederungspunkte die folgenden Ausführungen: zu 1:

Die große Rechengeschwindigkeit der Computer machte plötzlich schon in der Theorie vorhandene Ansätze attraktiv, da die Aussicht bestand, daß man zu anwendbaren Verfahren kommen kann. viele theoretische Arbeiten erst angeregt.

So wurden dadurch

Beispiele sind die ver-

teilungsfreie Statistik (insbesondere der multivariate Zweig) und die Theorie stochastischer Prozesse (ARIMA-Prozesse, Bilineare Prozesse) . Insbesondere im Rahmen der Untersuchung robuster Verfahren der Schätz- und Testtheorie einschließlich der Berechnung der Gütefunktion, die auf analytischem Wege häufig nicht möglich ist, sind in jüngster Zeit zahlreiche und umfangsreiche Simulationsstudien unter Einsatz von Großrechnern durchgeführt worden.

Aber auch ganz

neue und dazu noch mathematisch sehr anspruchsvolle Gebiete taten sich auf, wie z.B. die Zufallszahlengeneratoren. zu 2:

Der Computer kann nicht nur die Arbeit eines menschlichen Rechenknechtes übernehmen, sondern auch - und darin liegt sein eigentlicher Wert - Arbeiten in kurzer Zeit ausführen, die der menschliche Rechenknecht oft nicht in seinem ganzen Leben beenden kann. Dadurch wurden für praktische Arbeiten Verfahren erschlossen, die ohne den Rechner, wenn überhaupt, nur in irgendeinem Zeitschriftenartikel vergilbten.

Schlagendes Beispiel sind die weltweit ange-

wandten Verfahren zur Saisonbereinigung von Zeitreihen (Berliner Verfahren, X-11-ARIMA), die alle die Theorie stochastischer Prozesse ausnutzen und meilenweit von guten alten Phasendurchschnittsverfahren entfernt sind. Außerdem eröffnet erst der Computer durch die Übernahme der lästigen und für den Menschen zeitraubenden Rechenarbeit die Möglichkeit zu wahren statistischen Arbeiten.

Man kann jetzt Daten interaktiv

mit unterschiedlichen Verfahren analysieren.

IX

Vorwort

zu 3:

Hier sind zwei Bereiche zu unterscheiden.

Wenn der Rechner für die

praktische und theoretische Arbeit des Statistikers ein nicht mehr wegzudenkendes Hilfsmittel ist, dann ist es auch notwendig, ihn in das Curriculum einzubeziehen.

Zum anderen stellt der Computer aber

auch ein didaktisches Medium dar, das nutzbringend in der Statistikausbildung verwandt werden kann.

Mit Hilfe des Computers - der Zu-

gang müßte dem Dozenten beispielsweise durch ein Terminal im Hörsaal gegeben sein - ließe sich dann praktische statistische Datenanalyse im Hörsaal demonstrieren.

Als dann aber die Arbeiten bei den Herausgebern eintrafen, zeigte sich einmal mehr die Mehrdimensionalität statistischer Arbeiten.

Die meisten Bei-

träge ließen sich trotz kooperativer Zusammenarbeit mit den Autoren nicht befriedigend nur einem der drei Themenkreise zuordnen.

In einer solchen

Situation erinnert man sich dann immer gerne der durch das Alphabet vorgegebenen Ordnung, und so finden sich die Beiträge nach diesem Prinzip angeordnet wieder.

Aufmerksamen Lesern wird nicht entgehen, daß an einer

Stelle die alphabetische Ordnung durchbrochen wurde.

Der Beitrag von

Seppo Mustonen fällt auch typographisch aus dem Rahmen.

In dem Artikel

wird die Verbindung von einem komfortablen Textverarbeitungssystem mit einem statistischen Programmsystem beschrieben.

Da die Herausgeber glau-

ben, daß hiermit die Richtung angezeigt wird, um die Feststellung Mustonens 'it is quite common that when writing a research report ... the Output from the Computer cannot be used as such ...' aufzuheben, haben sie diese Arbeit gleichsam als Ausblick auf die hoffentlich nahe Zukunft an das Ende gestellt. Dieses Buch will aber nicht nur einen Bericht über die Verbindung von Statistik und Computer zu Computational Statistics abgeben.

Es ist vor allem

gedacht als Ehre und Dank für Professor Wetzel, der im Herbst 1981 die Vollendung seines 60. Lebensjahres feiert.

Die Autoren wollen aus diesem

Anlaß mit ihren Beiträgen ihre persönliche Wertschätzung seiner Person zum Ausdruck bringen.

Viele haben sicher die großen Möglichkeiten gesehen, die in einer 'Ehe' von

X

Vorwort

Computer und Statistik liegen.

Aber nur wenige wie Professor Wetzel haben

die Kraft und die Ausdauer aufgebracht, die notwendigen Maßnahmen zur Realisierung dieser Möglichkeiten durchzusetzen. Die Probleme beginnen bei einer ignoranten Administration, - sei es innerhalb sei es außerhalb der Universität - deren unausgesprochene Leitlinie oft zu sein scheint:

die Arbeit ging ja bislang auch ohne Rechner, warum

also unbedingt einen solchen anschaffen.

Die Schwierigkeiten sind noch

lange nicht zu Ende, wenn die Bewilligung und Finanzierung eines Computers sich im Gestrüpp der Zuständigkeiten der verschiedenen Einheiten wie Universität, Land, Bund abzuzeichnen beginnen.

Da gibt es dann noch immer

die etablierte Zunft der Mathematiker, Physiker und anderer "rechnender1 Naturwissenschaftler, die dem 'weichen Wissenschaftler' aus der Statistik den Zugang zum gerade erstrittenen Rechner verwehren.

Man muß schon ein-

mal selbst an einem solchen Vorgang beteiligt gewesen sein, um die ganze Leistung eines erfolgreichen Kämpfers für die tatsächliche Verbindung von Statistik und Computer beurteilen zu können.

Einem Publikum, das sogar

noch 'rational' mit Wissenschaft assoziiert, dürften detaillierte Schilderungen zu sehr nach Baron von Münchhausen klingen. Professor Wetzel ist eine der wenigen Personen, die den geschilderten Prozeß erfolgreich beendet haben.

Das von ihm im Jahr 1965 an der Freien Uni-

versität Berlin aufgebaute Institut für Angewandte Statistik bot für jeden, der die Möglichkeit der Verbindung von Statistik und Rechner ausschöpfen wollte, hervorragende Arbeitsmöglichkeiten.

Die Ausstattung des Instituts

mit Personal, Bücherei und Sachmitteln wurde gekrönt durch eine für die damalige Zeit, zumindest in Deutschland, unglaubliche Rechnerkapazität. Das Institut hatte bereits eine eigene (jetzt schon legendäre) IBM 1130, als die Universität noch nicht einmal ein Rechenzentrum hatte.

An der

später einsetzenden Ausweitung der Rechnerkapazität innerhalb und außerhalb der Freien Universität konnte das Institut dank der bahnbrechenden Bemühungen von Professor Wetzel in herausragender Weise partizipieren. Es bestand direkter Zugriff durch Terminals oder RJE-Stationen zu fast allen Großrechnern in Berlin.

Die Generation von Professor Wetzel ist doppelt benachteiligt worden.

Vorwort

XI

Durch den 2. Weltkrieg an der Aufnahme des Studiums gehindert, mußte sie in der Nachkriegszeit das Studium unter ungünstigeren Bedingungen in einem Lebensalter absolvieren, in dem heute bereits die ersten Stufen einer wissenschaftlichen Laufbahn abgeschlossen sind.

Seine altruistische Wegbe-

reitung für andere gewinnt vor diesem Hintergrund noch eine ganz besondere Qualität. Eine Reihe von Autoren hatten das Glück, auf dem von Professor Wetzel vorbereiteten Weg gehen zu können. Rechenschaftsbericht an ihn.

Für sie ist das vorgelegte Buch auch ein

Sie hoffen, ihm durch .ihre Arbeiten zu zei-

gen, daß sich sein Einsatz und seine Mühe gelohnt haben. Nicht zuletzt wünschen die Autoren Herrn Professor Wetzel noch viele Jahre fruchtbarer Arbeit in der Forschung und Lehre. Allen Damen und Herren, die an der Anfertigung der Reinschrift des Manuskripts beteiligt waren, sei für ihre sorgfältige Arbeit und das Bemühen um die Einhaltung einer einheitlichen äußeren Form gedankt. Auch dem Verlag de Gruyter und insbesondere Herrn Schuder gilt unser Dank für die Unterstützung bei der Verwirklichung unserer beiden Ziele.

H. Büning

P. Naeve

Berlin

Bielefeld

Preface When in 1966 the First Fisher Memorial Lecture was held by F. Yates he spoke about 'Computers the second revolution in statistics'. Reports on revolutions seem to be a characteristic phenomenon of our days, both, in the scientific as well as in the public community. And it is true too that most

of those revolutions prove to be caused by exaggerated lan-

guage other than by plane facts. So there is a need to ask what has happened to the revolution proclaimed by Yates. This omnibus volume tries to answer the question whether there are any everlasting consequences caused by this revolution.

It seems to be a good idea to collect some articles on the state of the art written by high ranking statisticians. Given their positive tenor those papers would turn the sceptic down. The editors decided not to take this way. They invite the reader to look over the shoulder of some scientists and see them work. The articles are snap shots showing the role computers play in the researchers' work.

The editors feel that one may rightly infer that computers have become an integrated part of the authors everyday work. Considering the fact that most of the authors might be classified as 'junior scientists' the following statement is for sure justified and it appears to be a convincing argument against those who do not share Yates' opinion: This revolution is still alive and brings its ideas into action. When speaking about computers and statistics it sefems quite natural to structure the subject according to the following scheme: 1. The impact of computers on statistical theory 2. The impact of computers on statistical practice 3. The impact of computers on the teaching of statistics

XIV

Preface

Such a structure was underlying our argumentation when we discussed the project with the publisher. Our final conception included detailed statements concerning the subsections of the scheme, such as ... ad 1: Facing the ever increasing speed of modern computers a lot of statisticians were stimulated (or 'encouraged') to deal with known but still unsolved theoretical problems. The perspective of having the chance to obtain highly applicable solutions was for sure a stunning incentive. Out of this work many a theoretically oriented papers originate. Examples may be found in nonparametric statistics (especially in the multivariate branch) and in the theory of stochastic processes (ARIMA-processes, bilinear processes). Studying robust procedures, for example, the investigator may very soon run into severe difficulties if he tries to solve the problems (e.g.finding power functions) analytically. Therefore in the last years much research was done based on large simulation studies done on a computer. But also new and mathematically attractive fields of research developed such as random number generators.

ad 2: The computer not only can do the job of the ready reckoner but moreover can finish such tasks in short time which could not be completed by human beings within their lifetime. This latter fact accounts for the value of computers. Only due to their high speed in doing calculations many methods burried in some journal article could be made accessible for practical work. Striking examples are world-wide applied procedures for seasonal adjustment of time series (e. g. XllARIMA). All of these procedures are based on the theory of stochastic processes and miles away from the old paper and pencil methods.

Letting the computer do all the tedious and for humans time consuming calculations gives free way for real statistical work. Now one has the opportunity to interactively analyse data using quite different methods.

Preface

XV

ad 3: Two aspects are to be distinguished. When it is true that computers are such a valuable tool for the statisticians practical and theoretical work it seems to be necessary to teach students how to utilize them. On the other hand the computer is a useful didactic tool which should be successfully applied in teaching statistics. For instance one could demonstrate practical data analysis within the class-room given the teacher has access to the computer via a terminal.

But when the papers began pouring in once again the manifoldness of statistical work proved. Even in cooperative work with the authors the editors were unable to establish a convincing mapping of the articles onto those three areas. In such a situation one happily remembers alphabetic order. And this is what the editors ultimately did with one exception. The paper by Seppo Mustonen not only breaks this alphabetic order but also appears in a different typographic design. The

paper describes the linkage of a com-

fortable editor system to a statistical program system. The editors believe that this paper shows a way to overcome Mustonen's finding: 'it is quite common that when writing a research report ... the output from the computer cannot be used as such1. Placing his article at the end of this book is meant as an outlook into a hopefully near future.

This book is not only a report on the marriage of computers and statistics to become computational statistics. First and foremost it is dedicated with respect and gratitude to Professor Wetzel on the occasion of his completing the 60th year of living. Surely many scientists have noticed the big opportunities that would be offered by a linkage of the computer to statistical work. And many did prove that those opportunities could be successfully realised, when they got access to a computer. But only few did pull together enough strength and steadiness as Professor Wetzel did to go trough all the steps necessary to provide at last the computer facilities and finacial support for work in computational statistics.

A long story

could be told about the struggle with an ever ignorant admin-

istration, about the arrogant colleagues from the math or science depart-

XVI

Preface

ments for whom statistics belongs to the soft-science. In their eyes the computer facilities were not a necessity for the department of statistics. But it is better not to tell it, although everything would prove right the narrator would be looked at as a direct descent of the famous story-teller Baron Münchhausen. Nevertheless Professor Wetzel did succeed and was able to build up a very fine statistical department at the Free University of Berlin. In doing this he proved to be a real statistician. For as Sir Elderton once said, a statistician must be a utilitarian. This is certainly true with respect to Professor Wetzel. He is a member of that generation which had to pay twice. Firstly the second world war kept them away from universities and after the war they had to absolve their studies under very restricted and unpleasant conditions in an age in which nowadays the first steps of a scientific career are completed.

Several of the authors could luckily take the way Professor Wetzel did provide. They hope that they can prove by their papers that his engagement and labour was wortwhile. Last but not least the editors want to thank everyone who in one way or the other contributed to the completion of the book for their (kind) efforts. The authors who let themselves push to finish their papers in time, those who did the fine secretary work to make the papers all look alike with respect to typographic form, the publisher and especially Mr. Schuder for support and encouragment.

H. Büning Berlin

P. Naeve Bielefeld

Inhalt / Contents

On the Investigation of Some Statistical Properties of the Most Probable Number (MPN)-Procedure for Estimating the Density of Microorganisms by Use of Computer Simulations

1

DR. GISELA ARNDT, PROF. DR. HARTMUT WEISS and DR. J. FELIX HAMPE, Free University of Berlin

On Bilinear Models for Time Series Analysis and Forecasting

19

DR. WOLFGANG BIRKENFELD, University of Bielefeld

Tests auf Gleichverteilung - Ein Gütevergleich

35

PROF. DR. HERBERT BÜNING, Freie Universität Berlin

The Classical Ruin Problem and Electronic•Roulette Machines

55

PROF. DR. ULRICH DIETER, University of Graz

Price Formation in the Chemical Industry of the Federal Republic of Germany - Estimation of an Econometric Model with Parameter Restrictions Across Equations

71

PROF. DR. JOACHIM FROHN, University of Bielefeld

Vektorielle AR-Prozesse in der Makroökonomie

89

PROF. DR. HERMANN GARBERS, Universität Zürich

The Robustness of Some Distributed Lag Estimators

99

PROF. DR: GERD HANSEN, University of Kiel

Robust Estimates in Linear Regression - A Simulation Approach

115

PROF. DR. SIEGFRIED HEILER, University of Dortmund

Nichtlineare Regression: Parameterschätzung in linearisierbaren Regressionsmodellen PROF. DR. MAX-DETLEV JÖHNK, Universität Hannover

137

XVIII

Inhalt/Contents

A Test for Independence of Dichotomous Stochastic Variables Distributed Over a Regular Two-Dimensional Lattice

151

PROF. DR. PETER KUHBIER and DIPL. KFM. JOACHIM SCHMIDT Free University of Berlin Computational Experiences with an Algorithm for the Automatic Transfer Function Modelling

167

PROF. DR. HANS-JOACHIM LENZ, Free University of Berlin Vergleiche zwischen empirischen und theoretischen Kenngrößen von ARMA-Modellen im Zeitbereich - Eine zusätzliche Möglichkeit der Modell-Validierung

185

DR. WALTER MOHR, Universität Kiel APL and the Teaching of Statistics

215

PROF. DR. PETER NAEVE, University of Bielefeld Optimale Schichtabgrenzung bei optimaler Aufteilung unter Annahme einer bivariaten Lognormalverteilung

231

PROF. DR. KARL-AUGUST SCHÄFFER, Universität Köln Nichtparametrische Tests auf Zufälligkeit bei verschiedenen Stufen der Diskretisierung von Beobachtungsfolgen

249

DR. RAINER SCHLITTGEN, Freie Universität Berlin A Linear Combination of Estimators in an Errors-in-Variables Model - A Monte Carlo Study

263

PROF. DR. HANS SCHNEEWEISS and DIPL. MATH. HORST WITSCHEL, University of Munich The Robustness of Sampling Plans for Inspection by Variables

281

DR. HELMUT SCHNEIDER and PROF. DR. PETER-THEODOR WILRICH, University of Berlin Über eine Verallgemeinerung der Spektralanalyse DR. BERND STREITBERG, Freie Universität Berlin

297

Inhalt/Contents

On a Generalized Iteration Estimator

XIX

315

DR. GÖTZ TRENKLER, University of Hannover Statistical Computing with a Text Editor PROF. DR. SEPPO MUSTONEN, University of Helsinki

337

On the Investigation of Some Statistical Properties of the Most Probable Number (MPN)-Procedure for Estimating the Density of Microorganisms by Use of Computer Simultations Gisela Arndt, Hartmut Weiß, and J. Felix Hampe

1. INTRODUCTION The number of aerobic microorganisms ("plate count") has been one of the more commonly used microbiological indicators on the quality of food. There are various kinds of specific pathogenic microorganisms which should not be contained in food. For quality control it is in general not sufficient to decide only on the presence or absence of pathogenic microorganisms in food. Furthermore in many situations a quantitative examination of food for some specific microorganisms is of importance in deciding on the level of hazard for the consumer of this food. On the other hand it is impossible to find precise microbiological procedures for the estimation of the density of specific pathogenic microorganisms using classical plate-count techniques. The two main reasons are : (1) The average density of the majority of pathogenic microorganisms per ml or gm food is too low to be cultivated on plates. (2) Various kinds of pathogenic microorganisms cannot be cultivated on plates with non-liquid media at all. Taking into account these reasons, Phelps (1908) and McCrady (1915) developed the MPN-procedure for a quantitative examination of pathogenic microorganisms by use of liquid media for cultivation. The first general discussion of the MPN-procedure from a statistical viewpoint was given by Cochran (1950). From his critique we quote : "We have seen that the m.p. n.is an estimate of the density of organisms. Considered more generally, it is a 'procedure for obtaining estimates'

Gisela Arndt, Hartmut Weiß, and J. Felix Hampe

2

since the same argument could be applied to other statistical problems. The only justification which I have mentioned for the procedure is that it seems intuitively reasonable. From a reading of the literature I am not certain as to the reasons which led early investigators to select this estimate, though either the intuitive approach or an appeal to a theory of inverse probability may have been responsible. " And finally we read : ".. .Consequently the m.p.n. method is now generally used in a great variety of problems of statistical estimation, though it more frequently goes by the name of the 'method of maximum likelihood'." A further important contribution on the MPN-procedure due to de Man (1975), regarding the statistical as well as the microbiological characteristics will be discussed in detail below. As the literature on the MPN-procedure shows a lack of detailed investigations of the statistical properties of the procedure in the case of small trials, the present paper tries to answer some of these questions by use of simulation studies. Finally the results lead us to deduce some guidelines for its application within microbiological research and routine.

2. ESTIMATION OF THE DENSITY OF MICROORGANISMS BY THE MOST PROBABLE NUMBER (MPN) -PROCEDURE The usual definition of "most probable number" from a microbiological viewpoint can be stated as follows : MPN is the estimated number (per ml or per 100 ml) of microorganisms present in a sample unit, based on the presence or absence of these microorganisms in replicated aliquots (tubes) which are prepared by dilution. This definition can be used only if the microbiological examination method fulfills the following requirements (for further details see ICMSF,1978) : (1) Standardized preparation of food homogenates as well as dilutions. (2) Standardized performance of the different steps (e.g. incubation at constant temperatures) within the microbiological procedure.

Estimation of the Density of Microorganisms

3

At the end of the examination method one registers at each level of dilution the number of tubes which show presence of the microorganisms of interest. For example one could have found the recording X = (3,1,05 as a sampling combination in an examination with the dilution levels 1:10, 1:100, 1:1000 and 3 tubes at each level. So one has for a dilution level of 1:10 only "positive tubes" (presence of the specific microorganisms) and on the third dilution level 1:1000 only "negative tubes". The density of microorganisms per ml of the homogenate should therefore lie between 10 and 100. With the MPN-procedure one would find a "most probable number" of 70, as will be shown below. From a statistical viewpoint the examination method is based on the following assumptions : (1) The microorganisms are regarded as being randomly distributed within the homogenate (liquid). This implies that a germ is equally likely to be found in any portion of the homogenate and that there is no tendency of clustering or repulsion. (2) The growth of every germ within the homogenate. If one has a poor medium or another factor resulting in incomplete growth of the microorganisms, the "most probable number" will underestimate the true germ density. (3) Furthermore it is necessary that the volumes of the sampled portions of homogenate (v^) are very small compared to the original amount of homogenate at each level

i

of dilution

(i = 1,2,3,...,k). When there is only one germ in our homogenate this germ lies in the sampled portion

v

V

then the probability that

is simply the ratio of the vol-

ume of the sample to the whole homogenate, i.e.

v/V

. The probability

that the germ does not lie in the sample is therefore (1-v/V). By use of the multiplication theorem in probability and on

the basis of

the above listed assumptions we can write for the probability that none of all

b

germs (say) in the homogenate is contained in the portion

v.

Gisela Arndt, Hartmut Weiß, and J. Felix Hampe

4

as C l - v./v ) b which can be approximated for small ratios

v^/V

by

exp { - v^b/V } = exp { -v^ 6 } = ir with

6

= b/V

homogenate.

(1)

characterizing the real density of germs per ml of our

it

is the probability that the sample

v^

is sterile.

Therefore we are able to give the probability for a positive reaction of a sample at the ith dilution level 1 _ ir_ = 1 - exp{ - v.6 } i i

The recording

(2)

X = (x^,x , . . . x ^ , . . i s

the number of positive tubes

of the examination at all dilution levels V = ( v ^ , . . . v ^ , . . . v ) N = (nj,..-n.,...n^)

with

replications at each level.

By use of the binomial distribution the probability of this combination may be written as

P = L =

n. -x, x. k n ("i) tt. 1 1 (1 -ir. ) 1 . , x. l l 1=1 l

(3)

After substituting expressions (1) and (2) in (3) we can write k

P = L =

n C i) exp { -v.6 } 1 i=l X i

n.-x. 1

1

Now we can estimate the unknown density

x. (1 - exp { -v.6 })

1

1

6

by the Likelihood function.

In order to derive an estimate("most probable number") we have to maximize this Likelihood function with respect to

6

, or equivalently the corre-

sponding log-Likelihood function. (An example of a Likelihood function is shown in figure 1.)

Estimation of the Density of Microorganisms

5

Fig. 1. : Likelihood function for the design V = (5,1,0.2) ; N = (5,5,5) and given X = (5,3,1) This leads to

1

(

Vxi)vi = .

1 i

(5)

exp{- v. 6} (l-exp{-v.6}) l l

which can be solved only by use of an iterative algorithm.

3. SOME REMARKS ABOUT THE SAMPLING RESULTS

We can estimate without difficulty the MPN for each positive sampling result. But we have to remember the dependence between sulting

x. . For given

6

v^S

and the re-

we would expect for decreasing v. (increasing

6

Gisela Ärndt, Hartmut Weiß, and J. Felix Hampe

dilution levels) that the number of positive tubes

x^

is also decreasing.

If we had chosen four suitable dilution levels and N = (3,3,3,3) we would expect for instance a combination X = (3,2,1,0). If we instead achieved X = (0,0,0,3) the question would arise whether this result agrees with our assumptions of the MPN-procedure. In his discussion of this problem de Man offers a strategy to follow in practice which is briefly described below. First we estimate an estimate

6

by the MPN-procedure as described above, leading to

6 . Next we calculate the Likelihood function at the point 6

for all other possible sampling combinations

X. The result is a discrete

probability distribution as can be seen from (4) when substituting 6. Sorting all the

(n+1)

6

for

sampling combinations in descending order

of probability allows us to register the combinations which satisfy the inequality

P(X . ) < I j= j u + l [3] -

a. 3o

- a. 3u

with

a. = 0 Du

for

jn = O

(6)

De Man (1975, 1977) suggests to eliminate all those combinations fulfilling condition (6) for

a_. = 0.01 and

j u = 0 as they can be regarded as

either senseless or improbable. The remaining combinations are divided into two categories. Category 1 includes all combinations satisfying (6) with

a. Do

= 0.05 and

satisfying (6) for

a. a.

Do

= 0.01. Category 2 includes all combinations

Du

= 1 and

a.

= 0.05. Figure 2. illustrates this

segmentation for the sampling combinations X = (1,1,0) and X = (3,0,2). It should be clear that the different probability distributions depend on the chosen dilution levels. Consequently between differently chosen dilution levels one will observe large differences with respect to the "probable" combinations belonging to one of the categories. This fact can be seen either from figure 2. or from table 1. for different MPN-designs.

Fig. 2.

The effect of different dilution ratios on the probability distribution of all 64 possible combinations for three given designs and two realized sampling combinations.

8

Gisela Arndt, Hartmut Weiß, and J. Felix Hampe

table 1. : Calculated possible and "probable" combinations for various MPN-designs

v= (to, I N=(3j,3) y-rs

l

V= K. N=(Ì

J

Number of .. probable "positive

Number of possible positive combinations

Designs

(belonging to category 2)

au 02) 3)

I

0.25) 3)

(2.1 N=(3,3,

OS) 3)

0.33. N=(3, Ì

I

5)

0.75)

V= (10, I 0.1) N=!5, 5,5) N=

, 2

-2"

(2.3a)

Estimation of an Econometric Model

The nonnegative error terms

u*fc , u*fc ,

73

represent productivity changes

not explained by the respective equation. It is assumed that the relationships between the input-variables and the output hold on the average, i.e.:

E(u.t)

= 1

for

all

t.

M,K,L

(2.4)

Furthermore the error terms are assumed to be independent over time with variance independent of t:

Var(u* f c )

2.2

= 0?

for all

t,

i = M,K,L.

(2.5)

The demand equation for the product

Corresponding to the log-linear structure of the input-output-relationships the demand equation is specified as:

Vt

- ap^

n 1=1

x'i u

(2.6)

with p: price of the product, x : further explanatory variables (in the empirical study the import price of the product, and an indicator of the economic situation of those sectors buying chemical goods were used).

The same assumptions as for the error terms in equations (2.1a) to (2.3a) are made for

u^

. It seems plausible to assume independence of

and the error terms in the input-output-relations, since demand and product decisions are usually made by different agents.

74

Joachim Frohn

2.3

The conditions on the factor markets

It is assumed that for the process of price formation the factor prices are given. This assumption assures that the factor prices mediate goods) ,

,(for capital) , and

(for labour)

(for interare included as

exogenous variables in the price equation as derived from the entrepreneurial objective function.

2.4

The entrepreneurial objective

As entrepreneurial objective, maximization of the expected profit per period is assumed (see [5]). This must be regarded as a first approximation. Alternative and/or complementary objectives (like, for instance, price continuity) shall be investigated in further studies. Maximization of the expected profit leads to the following objective function:

E(p

tvt - V t

- ztKt - V t

1

=

Max p

(2

-7)

t

by use of (2.1a) to (2.3a) and (2.6): 1 Eiap^

+1

H

x ^

u

-

[ap^

1=1



(q b il + Z c e t Mt t.

II

x±£

u

v

J

r

.

(2.8)

1=1

uV4_ + w.d e Kt t

Taking first derivatives with respect to

u T ^ ) } = Max . Lt p-

p

, setting this equation equal

to zero, and making use of the assumptions for the error terms yields:

1

- ^ t * r * (b q t + c e zt + d e

i= 1 ^ t r

w

) = o

(2.9)

75

Estimation of an Econometric Model

Solving this equation for

leads to a rather complicated price equation,

which subsequently will make estimation of the parameters impossible.

To overcome this problem, the expected output e e optimal price, is substituted for

II x^i

following implicit price equation:

1

1 P

t

=

E

7 ITT

V , which corresponds to the t

in (2.9); this leads to the

^ 1-1 -^t * r * „ , r , V— r r * vt' ' t [b q t + c e zt + d e

E(u

For estimation purposes a useful indicator for

V

- X r , w fc ]

(2. lo)

has to be found. In the

empirical investigation the observed values of the output

V

were used.

As equation (2.1o) will hold only approximately, a multiplicative error term u , subject to the same assumptions as for the other error terms, Pt is added:'

i p

t

" 7 TT7

3•

THE DATA

E(u

A

vt>

v

t'

r

t

* t + a*e

1

»

t

) - v

12.11)

In this section only a very brief discussion of the data used in this study can be given. A detailed description of all time series is included in [3].

The output

(v )

of the chemical industry is represented by gross product as

published by Statistisches Bundesamt (StBA). The data for the capital input (K^)

have been interpolated from yearly values calculated by Deutsches In-

stitut für Wirtschaftsforschung

(DIW), Berlin. (Actually utilized capital

should be used as an input in the production function. Due to the specific calculation of the only available DIW-index of utilization, which implies a log-linear dependence between utilized capital and output, this variable obviously could not be used in the empirical study.) Labour-input

(L ) is

Joachim Frohn

76

measured in hours worked, calculated on the basis of official data by assuming that workers and employees have the same amount of working hours per month. The time series for intermediate goods

is taken from the in-

put-output-tables of the DIW, Berlin. As there are only input-output-tables for 1962, 1967, 1972, and 1976 the values inbetween have been interpolated in correspondence with the development of gross product.

The product price

(p^)

is represented by the index of producer prices which

is published by StBA. A price index for capital goods

(z^_)

does not exist.

Therefore capital costs (consisting of depreciation and interest) have been calculated and then related to capital

As price for labour

(w^) wages

plus entrepreneurial payments for social security are used. The price index of intermediate goods

(q^)

has been obtained as an average of the pro-

ducer prices weighted by the respective amount of the intermediate goods.

As further variables the DlW-index of utilization of capital in the chemical industry, the index of import prices of chemical goods and the price deflator of net social product of the Federal Republic of Germany have been used; the latter two time series are published by StBA.

4.

PARAMETER ESTIMATION

For estimation purposes the log-linear version of the model (2.1a), (2.2a), (2.3a), (2.6), (2.11) is used:

InM

t

InK

InL

1nV

= lnb

*

1 * + - InV + In u„. , r t Mt

= lnc* "

t

= 1 nd*

= Ina

t

-t r

+ elnp

+

7

l n V

t

+

ln U

(4.1)

Kt '

+

- lnV^ + ln u* , r t Lt

+

I i= 1

lnx.t

+ ln

(4.2)

(4.3)

,

(4.4)

Estimation of an Econometric Model

77

E lnp. = 1 n [ E ( u r ) ] + ( - - 1 ) InV + t r(e + 1 ) Vt r t

+

In (b

As

V

+ c e

z^

is used for

+ d e

+ In u

(4.5)

.

the model (4.1) to (4.5) is an interdependent mod-

el, which is characterized by prior restrictions involving parameters of several equations. This follows since the price equation (4.5) has been derived from an objective function that is constrained by the demand equation and the input-output-relations

. To reveal the specific parameter

restrictions of the model, equations

(4.1) to (4.5) are rewritten as

follows:

InM

t

lnKt

= a. + a . In V + In u ^ 1 A t Mt

=

+ fc>2t + b

lnL fc = c 1

+ c 21

InV

+ d

t

lnpt

= d

^

= ex

3

lnV

t

,

(4.1a)

+ In u *

,

(4.2a)

+ c 3 In Vfc + In u

In p

t

+ e2 InV

Z 1=1

e. l n x + In u l it Vt

+

e

+ e . In [e q + e e J 4 t 5

+

(4.3a)

6t

(4.4a)

(4.5a)

e

z

t

+ e^e 7

8t

w j t

+ In u ^ pt

Comparing (4.1a) to (4.5a) with (4.1) to (4.5) shows that the model is subject to the following restrictions:

Joachim Frohn

78

a

i

=

lne

4'

C

1

=

lne

7'

a

2

=

°2 =

b

3

e

=

8'

b

° 3' e

2

l

a

=

=

2

lnS

-

5' e

3

b

2

=

e

=

1

"

(4

6'

"6)

In case of a linear interdependent model with linear restrictions across equations the method of restricted-three-stage-least squares (R3SLS) (see for instance [4], p. 523 and [1], p. 245 ff) is an appropriate estimation procedure which takes into account all prior restrictions. Since model (4.1a) to (4.5a) as well as the parameter restrictions (4.6) are nonlinear, the R3SLS-estimates of the parameters were determined in an iterative procedure: At a first stage the production system (4.1a) to (4.3a) was estimated by R3SLS, enforcing the restrictions

a

= b

= c . From the resulting

~ ~ 2 3 3 estimates, b , c , (A /r), d , and (A /r) were calculated and used to deK L termine 'observations' of a new variable ,

y

,* = b qt

 - ' T ^ + c e

zt

À * - (-7) t w + d e t

At a second stage the whole system with

y

'

(4.7)

as one of the two explanatory

variables in (4.5) was estimated by R3SLS, imposing the restrictions a 2 = b^ = c^ , e 2 = a^ - 1 b*, c*, (A /r), d* K

and

and

e^ = 1 . This leads to new estimates for

(A /r) . According to the discrepancy between these L

estimates and those obtained from estimation at the first stage new values for these five parameters and hence for the 'observations' of

were

determined. Then again the whole system was estimated by R3SLS a.s.o. until convergence. In the empirical study reported in this paper the solution was found after only six iterations. Of course, using this procedure one cannot be sure to have reached the optimal solution according to the R3SLS-criterion since there may be various local minima. However, if the final set of estimates is quite close to the starting point it seems plausible to accept this final set as the R3SLSestimates.

Estimation of an Econometric Model 5•

79

THE RESULTS

After having specified the basic model two problems remained: The inclusion of further explanatory variables in the demand equation (4.4),and the question whether nominal prices or prices divided by a price deflator should be used.

Some preliminary tests on the basis

of OLS-estimation showed that in case

of the second problem, 'deflated' price variables lead to significantly better results. Therefore all price variables were divided by the price deflator of net social product. (Nevertheless the same symbols have been used for the 'deflated' prices.)

As far as the first'problem is concerned the index of import prices for chemical goods turned out to be an important additional explanatory variable. In addition to that an indicator for the economic situation of the main buyers of chemical goods had to be included in the demand equation. As no adequate quarterly sectoral data are available the DlW-index of utilization, lagged by one period, was used. The results of the estimation show that this variable can only be regarded as a very

rough indicator (see

below).

As described in the preceding section estimation of the parameters of the model was separated into two stages: At the first stage the input-outputrelations were estimated by R3SLS, the price equation was estimated by OLS, subject to the restrictions resulting from the R3SLS-estimation of the production submodel, and the demand equation was estimated by OLS. The estimation was performed by using the program

system EPS (see [2]).

R3SLS-estimation of the input-output-relations yields the following results:

InM

t

=

0.8731 (±0. 1173)

+

0.8605 (±0.0123)

InV

(4.8)

Joachim Frohn

80

lnKt

=

2.8253

-

(±0.1104)

lnLt

0.0019

t +

0.8605

(+0.0003)

= -2.1535

-

(±0.1123)

0.0158

InV

(4.9)

InV

(4.10)

(±0.0123)

t +

0.8605

(±0.0003)

(±0.0123)

The values in brackets give the estimated standard deviations. The observed and the estimated values of the endogenous variables are depicted in figures 1 to 3. Regarding the simple structure of the production model and the data problem in case of K

the fit seems to be tolerable: Except for the last two

or three years the general development of the inputs is approximated quite well. Before estimating (4.5)subject to all restrictions, OLS-estimates of the parameters of the following equation were determined:

lnpt

=

e

j

+

e

2

l n V

t

+

e

3 lnYt

(4.11)

with t in* Yt = 2.394

, -0.0019 t „ ,,, -0.0158 t arr + 16.866 zfce + 0.116 w^_e

The results of this estimation could indicate how close 'free' estimates of e^ and e^ would come to those values to be expected on the basis of the specification of the model and the R3SLS-estimation of the input-outputrelations, i.e.: e^ = -0.1395 ,

= 1-0 .

This 'free' OLS-estimation yields:

lnpt

= -0.6446

-

(±0.3734) R

2

= 0.9626

0.1212 (±0.0249)

; DW =

Despite the estimate of

InV

+

1.4361

lny fc

(4.12)

(±0.1133)

1.3404

e^

this result is not too far away from the

expected one .

A restricted OLS-estimation, imposing the constraints e

= 1.0, gives:

e^ = -0.14

and

82

Joachim Frohn

lnp

= 0.0990

- 0.14

+ 1.0 lny fc

(4.13)

(±0.0035) lnPt

Observed and estimated values of

according to (4.12) and (4.13) are

given in figures 4 and 5.

Finally (4.4) was estimated by OLS with chemical goods (I p t ) / an k

ip(x) = sign(x) •

- W

a

, a < |x| < b x

< c

3.0.

lfi(x)

^(x)


k IT

' x[1 - (x/k) 2 ] 2 , IxI < k 0

with k = 6.0.

, Ixl > k

The 4 proposals are used with the scale SI according to Huber's proposal 2 (HU1, HA1, AN1, TU1) and with the robust scale S2 (HU2, HA2, AN2, TU2).

3.2 Linear combinations of order statistics

L-estimates were extended to the linear model by P. Bickel (1973). The most promissing proposal in this direction was made by Koenker and Basset in 1978. They extend the sample quantile to regression models. Let 0 < a < 1 and

Robust Estimates in Linear Regression

[ax p

a

(x)

:=

,

1 (a-l)x,

119

x > o x < o

'

Then a a-regression quantile is defined as a solution, b^, of the minimization problem

~

k)

= mi

- ni

These regression quantiles can be evaluated relatively easily by solving a linear programming problem. The simplest example of this kind is the regression median

(MED) with a = 1/2. It determines a hyperplane to which

(at least) half of the observations have a non-negative and (at least) half of the observations have a non-positive distance.

With the above regression quantiles any linear combinations of order statistics may be constructed. In the study the trimean (TRI), b < 2 5 1/4 + b > 5 Q 1/2 + in 1966 (GAS), 0.3 b

1/4 and the estimate proposed by Gastwirth

— + 0.4 b + 0.3 b —, is included. • 33 • bO • 66

A 2a-trimmed regression, proposed by Ruppert and Carroll, is given by deleting all observations from the sample, which have a non-positive distance to x'(i) b or a non-negative distance to x'(i) b, , i.e. for y a 1-a 1 which y. - x'(i) b < 0 or y. - x (i) b, > 0. To the remaining obseri a i 1-a ^ vations least squares is applied. Two such trimmed least squares

regression

estimates with a = 0.1 (TRI) and a = 0.2 (TR2) are used in the study. If it is known in an application that the error distribution is skew, then it is reasonable to consider asymmetric

trimming. One such estimator (ATR)

which deletes the observations for which y^ - x'(i) y^ - x(i)

1

< 0 or

b#gj_ > 0 is also included.

Since in the before mentioned proposal the points on the limiting hyperplanes are also deleted, the number of actually skipped observations is greater than [2an]. This may have adverse effects in small samples. Therefore Ruppert and Carroll (1979) proposed estimates where a certain proportion of the residuals from some initial estimate b initial estimates b

= (b

+ b,

of 3 is deleted. As o )/2 (KBl, KB2 for a = 0.1 resp. a = 0.2),

120

Siegfried Heiler

the median

(ME1, ME2) and the least squares estimate (OLl) are used.

Then the [na] smallest and largest residuals are deleted before applying least squares. Another version starts like KBl. Then the [2nd] (a = 0.1) largest absolute residuals are rejected (AB2).

3.3 Rank-estimates Hodges and Lehmann proposed in 1963 estimates of location based on rank statistics. This idea was extended to the regression model by Adichie (1967), Jureckova (1971) and Jaeckel (1972). Jaeckel introduced the distribution-free measure of dispersion D

z

:=

n Z z. a (R.) . , l n l 1=1

for z := (z.,...,z )', where a (•) is some monotone score function and R. I n n l is the rank of z^ in the ordered sample of the z's. He then proposed to estimate 6 by a vector b that minimizes the dispersion D(b) := the residuals. Since D is a non-negative continuous and convex function n E a (i) = 0 i=l n

of b, the minimum can be evaluated relatively easily. For

D is location - invariant. Therefore an intercept cannot be estimated by Jaeckel's proposal. Such an estimate can be obtained by applying a Hodges and Lehmann location estimator to the residuals fa:n Jaeckel's proposal, + + + i.e. by minimizing a signed rank statistic Sk. := £ a (R.) sign (z.-b ) 1 i=i n l l 1 with respect to b^. If by the minimum an interval is defined, the midpoint is taken as estimate b. . In Sv[. the z. are the residuals of Jaeckel's D 1 1 I proposal, R^ is the rank of Iin ordered sample of the absolute values and a + (•) is some signed rank score function, n In the study, as score functions the Wilcoxon scores (WIL) (this is the original proposal of Hodges and Lehmann), the median scores (MSC), the van der Waerden scores (VDW) and the normal scores (NOR) are considered.

Robust Estimates in Linear Regression 4.

121

THE SIMULATED SAMPLE SITUATIONS

4.1 The Designs To meet various situations that may occur in practice 4 different designs are investigated. They are taken from Carroll (1978). Design 1: y. = 1 + .5x„. + e., 2i l l

n = 20

— X'X = n

.3325

= -.95 + .1 (i-1) Design 2: y. = 1 + . 5x„. + e. l 2i l

n = 40

— X'X n

.342 + .000641Ì(i-1)

2i

0

.0933

Design 3: Y

1 + . 5x„ . + . 25x . + e., 2i 2i l

i

n = 30

x 2 i = (2i - 31)/30

/ 1 X'X =

x 3 i = .60148 + (i-1)(i-30)/225

0

0

0

.3333

O

0

.08833

\ 0

Design 4: y. = 1 + .5x_. + .25x-. + e., l 2i 3i l x 2 i = -.34435 + .001149i(i-1) X

3i

=

(x2i+.34465)'

n

30 — X'X n

.21374

No. 1 is a simple linear regression with a balanced design. No. 2 has a highly unbalanced design, since the x^^ are very unsymmetrically distributed around zero. The sample size is therefore increased to n = 40. In the multiple example No. 3 the two regressors are orthogonal, whereas in No. 4 they are highly multicollinear (correlation = .96).

122

Siegfried Heiler

4.2 The error distributions Pseudo-random numbers from a uniform (o,l) program were used to generate the 18 error distributions. To obtain a balanced situation between the variability of the regressors and the errors the normal distribution with 2 variance a = 0.02 (a = 0.141), N(0; .02), was chosen as standard. All other distributions were 'standardized' in a manner that their density has the same value at zero as this reference distribution. Asymmetrical distributions were shifted such that their expectation was zero. Index of error distributions

Tag

Symmetrical: Normal

N

Logistic

L

Double (two-sided) exponential

DE

t-distribution 5 degrees of freedom

T

Cauchy

C

Asymmetrical : Log-normal

LN

Chi-square 1 degree of freedom

CHI

Contaminated:

2 F = (1 —c) N(0; .02) + c N(0;0 2 ) with P(c=l) = d, P (c=0) = 1 - d d = .05, .10, .20 and a2 2 = .25 (gentle situation)

CT„

2.25 (vigorous situation)

F = (1 —c) N(0; .02) + c C d = .05, .10, .20

N5N1 NlONl N20N1 N5N2 N10N2 N20N2 Nl N2 N5C NlOC N20C

Autocorrelated: Autocorrelated normal e. = 0.5 e. 4 + £., £. ~ N(0; .02) l l-l l i

AN

Autocorrelated contaminated normal e. i

0.5 e. , + £ . , £ . ì-l 1 1

NlONl

ACN

Robust Estimates in Linear Regression 5.

123

PROPERTIES OF THE ESTIMATES

5.1 Criteria of Assessment For each of the 18 error distributions M = 500 replications of pseudorandom numbers were generated and attached to the four designs. With these 500 simulations the properties of the estimators were assessed mainly by the following criteria. (i)

The mean of the estimates M

1 M(b±) = -

E b j=l

,

i = 1,2,3,

m = 500,

where b.. is the estimator for 6. in the j-th simulation run and 3 (ii)

= 1 (intercept), & 2 = 0.5, 63 = O.25.

The mean square error of the estimates MSE(b.) = ~ 1

(iii)

M

M 2 (b..-B.)2, 1 j=l

i = 1,2,3.

The mean square deviation of the regression line resp. regression plane MSD

M

n

j=l

i=l

x:l

0 ~ o with y. = 1 + 0.5 x_. {+ 0.25 x_.) and y.. estimator of y. in the 1 2i 3r 13 i j-th replication. (iv)

Within each of the four designs and for each of the 18 residual distributions the estimates are ranked according to their MSD-values and following the Princeton study the deficiencies DEF = 1 - MSD of best estimator / MSD are evaluated. (They depend of course on the arbitrariness of the respective reference estimator.)

(v)

The computer time needed.

124

Siegfried Heiler

For assessing normality measures of skewness and kurtosis of the joint and the marginal distributions of the estimates have been introduced. For the time beeing these measures have only been applied on a part of the estimates. Therefore this aspect is not discussed here.

Since space is limited only a small excerpt of the simulation results can be exhibited here. Furthermore only 14 of the 25 procedures are included in the appended tables. Hence the main outlines of the study will be summarized by letter. Detailed tables may be obtained by writing to the author.

5.2 Classification of Situations

In the Princeton study the sample situations are broken down as gentle or vigorous, as modified Gaussians or alternative, as reasonable or as notional. The Cauchy is classified as notional, i.e. unreasonable. In our case and with our parameters (see 4.2) the two-sided exponential and also the logistic have to be put into the same category. The class of gentle and reasonable situations consists of the t-distribution, the skew distributions 2 log-normal and Xj an | j 11 t 1 1 11 11 1 0398 10.248 1 2 0.359 1 4 1 10.038010.264 1 2 1 lo. MIL 1 1 0.027610.166 1 2 ]10.0396 jj i 1 1 1 1 1 I\ 11 1 I 10.020010.0 1 1 ilo.0299 10.0 1 1 VDW 110.0230 1 0.0 1 1 110.0254 0.0 1 i t ! 11 I1 1 11 1 1I 1 TRI 110.0344 10.331 1 8 » Ii 10.0448 0.433 1 6 I 10.047910.416 | 7 Ilo.0509 10.413 1 7 t i 1t 1 1 11 1 1I 1 11 TR2 110.0403 10.429 1 11 110.0639 0.603 1 11 II0.C602 10.535 1 10 ilo.0635 10.529 1 12 jj I j I | 11 11 II (l KBl MO.0327 10.296 1 7 |10.0444 0.428 1 5 110.0502 10.443 1 8 Ilo.0522 10.427 1 8 I i> 1 1 1( I 1 II» 11 KB 2 110.0399 10.423 1 10 I 10.0631 0.598 1 10 I 10.066710.581 1 13 Ilo.072 6 10.588 1 13 • i 1 I 1 1 I1 11 11 11 1 10.0610 10.542 1 11 Ilo.0610 10.510 1 10 GAS 110.0411 10.440 1 12 110.0723 0.Ó49 Ì 12 | ] tt 11 i 11 1 1 ! 1 1 sa:sss=== = = = = = = = = = = = = = = = - . - - -= = = = = = === = = = = = = = = = = === = = = = = = = = »ss a =

Siegfried Heiler

130

TABLE

l .

3

:

MEAN 0

PROC t n s :

I |

MSO

SHUARE

DEVIATION

S I G N

E 1

DEF

0

RANK

1

OLS

1 1 | | 0 . 0 5 9 8

HUI

1 1 1 1 0 . 0 4 8 7

1 0 . 0 2 3

3

ANI

1 1 H O .

1 0 . 0 3 3

4

HAI

1 1 I 1 0 . 0 4 7 6

HU 2

1 1 I 1 0 . 0 5 3 7

1 0 . 1 1 4

AH 2

I I I 1 0 . 0 4 7 9

1 0 . 0 0 6

HA2

1 1 | 10 . 0 5 3 8

WIL

1 1 1 | 0 . 0 4 9 7

VDW

1 1 I 1 0 . 0 4 9 5

TR 1

1 1 1 1 0 . 0 5 2 5

TF 2

1 1 I 1 0 . 0 5 5 9

K81

1 1 1 1 0 . 0 5 1 6

KB 2

1 1 1 1 0 . 0 5 5 5

GAS

TABLE

+

:

OLS HUI ANI HAI HU2 AN 2 HA 2 W!L VOM TRI TR2 KB1 KB 2 GAS

1 1 t 1 1 j

10

1 0 . 0 9 3 1 0 . 1 4 8

8 13 7

1 0 . 0 7 8 1 0 . 1 4 2 1 0 . 1 2 8

0 . 0

12 11

1 1 I 1 0 . 0 5 2 8

0 . 0 0 7 0 . 0 9 4 0 . 0 2 7 0 . 0 4 9 0 . 0 4 6

MSO

1 1 0 . 9 1 3 3 i 1 1 0 . 1 1 7 5 j

11 0 . 3 5 4 4 I 1 1 1 ! 1 0 . 1 5 6 6 | j 1 I l 1 i 1 1 |

1 0 . 1 0 1 3 i l 1 0 . 0 9 7 0

1 j i 1 i 1 1 |

1 0 . 1 2 4 3 ) 1 0 . 1 6 5 5

1 1 0 . 0 8 6 7 |

1 1 0 . 1 9 6 9 |

1 1 0 . 1 1 3 7 1 |

I

SQUARE

1 0 . 9 0 5

I

DEVIATION

14

10.262

6

1 0 . 7 5 5

13

1 0 . 4 4 6

9

1 0 . 1 4 4

3

1 0 . 1 0 6

2

10.0

1

10.302

d

1 0 . 4 7 6

10

1 0 . 5 6 0

11

1 0 . 2 3 7

5

]|

0.1140 . 1 4 4

OF

MSD

1! 1 1 0 . 9 3 6 7 t i 1 1 1 1 0 . 0 8 5 9 j j

1 11

1

| 1 0 . 0 7 1 6 j j

I 1 1 i 1 | 1 1

12

14

OEF

1 0 . 0 2 1

2

1 1 0 . 0 7 3 0 | j

I 0 . 0 2 1

2

1

1 1 0 . 0 7 9 4 | ]

1 0 . 0 9 8

7

1 1 0 . 0 8 1 7 j |

1 0 . 1 2 5

8

1

I 1 0 . 0 7 5 3 j j

1 0 . 0 5 0

5

1 1 0 . 0 7 5 3 | |

1 0 . 0 5 0

4

1

7

1 1 0 . 0 7 6 7 • t

1 0 . 0 6 6

6

1 1 0 . 0 7 7 | j

0

1 0 . 0 7 2

6

1

6

| 1 0 . 0 8 0 2 i i

1 0 . 1 0 7

8

1 | 0 . 0 8 4 8 | j

1 0 . 1 5 7

10

1

10

1 1 0 . 0 8 5 0 i j

1 0 . 1 5 8

12

I 1 0 . 0 8 9 8 j |

1 0 . 2 0 4

12

1

11

1

9

1 I l I 1

3

1•

8

I 1 0 . 0 8 4 1 I I

1 0 . 1 4 9

10

I 1 0 . 0 8 7 9 | |

1 0 . 1 8 7

11

I 1 0 . 0 8 7 3 j j

1 0 . 1 8 0

13

1 1 0 . 0 9 1 1 j |

1 0 . 2 1 5

13

1

1 1 0 . 0 8 4 9

1 0 . 1 5 7

11

1 10 . 0 8 3

1 0 . 1 4 4

9

1

13

N ( 0 ; 0 .0 2 ) 2 0 N ( 0 ; 2« 2 5 )

I|

1 0 . 9 3 4

14

1 1 1 1 1 . 4 7 2 7 j |

1 0 . 2 7 9

4

MSO

1 1 0 . 1 7 7 3 ! I 1 1 I 1 0 . 4 8 3 0 j |

I

I

G N

DEF

|

3

O E S

RANK

14

1 0 . 4 4 8

5

1 0 . 7 9 8

13

10.481

6

1 1 0 . 2 0 9 7 | j

1 0 . 2 1 2

3

1 1 0 . 1 3 9 8 | j

1 0 . 3 0 0

3

2

I 1 0 . 1 0 9 4 | j

1 0 . 1 0 6

2

1 0 . 0

1

1 1 0 . 0 9 7 8 j |

1 0 . 0

1 0 . 3 6 1

7

jI j 1 0 . 2 0 6 0

1 0 . 5 2 8

10

1 1 0 . 2 7 7 4 | j

1 0 . 1 8 0 3 I

1 0 . 6 5 7

11

1 1 0 . 3 7 9 0 | j

1 0 . 7 4 2

|1 0 . 1 0 0 6

1 0 . 3 8 5

8

1 1 0 . 1 9 1 4 j |

1 0 . 4 8 9

1 0 . 2 1 7 7

1 1 0 . 4 7 1 7 j |

1 0 . 7 9 3

12

1 I 1 I

IO.601

9

1 0 . 4 4 0

4

!

1 0 . 6 6 0

12

1 0 . 7 1 6

13

1 0 . 2 9 7

7

i j 1 1 0 . 1 0 4 6 j j

1 0 . 4 0 8

9

1 1 0 . 1 0 7 4

1 0.

4

1 1 0 . 0 8 8 5

1 0 . 3 0 1

5

II

1 0 . 2 4 5 2

t i 1 0 . 1 7 4 «

1 1

MSD 1 1 1 1 1 . 4 7 9 2 j |

1 0 . 9 3 4

1 0 . 0 7 3

|i 1j 0 . 2 5 5 0

II

:

O E S

1 1 0 . 1 2 3 3 i |

193

FOR

2

5

1 1

1 1

1 0 . 0 6 6 8

I

|

I 1 0 . 0 8 1 0 | |

1 0 . 0 7 3 6 I

j

1

7

| 1 1 1 0 . 0 7 1 5 | |

] 1 0 . 1 8 8 6 j j

110.06X9 I1 I1 ] 1 0 . 0 9 6 9 i i I 1 1 1 0 . 1 3 1 2 1i li

1

1

1 1 0 . 0 7 5 7

1

6

i

5

1 0 . 0

4

12

1 1 i I j

1 0 . 0 5 5

1 0 . 0 4 0 10.0

1 0 . 6 8 3

1 0 . 1 9 5 2 I 1 1 0 . 0 8 8 6 |

1

1 0 . 0 3 4

1

9

2

I

1

3

1 1 0 . 0 7 3 8 | |

1 0 . 1 1 7

I

RANK

1

14

I 1 0 . 0 9 6 2 j |

3

1 0 . 0 8 1 1

G N

1 0 . 2 5 7 1 0 . 0 3 1

14

1 j I 1 0 . 0 7 3 1 | J

1

1

DEF

1 0 . 1 1 7

ESTIMATORS I

1 0 . 0 7 4 6

RANK

1 0 . 2 6 0

1

4

S I G N

E

1 1 MSO 1 s a s a t e s a : | |

DEF

1 0 . 3 0 1

1 I 1 1 j

! 1 l j1

I i

D

3 RANK

1

MSO

11 1 0 . 0 9 6 7 1

4

1 1 I 1 I t 1

0 . 0 7 8

O E S RANK

14

11 1 I

!

I G N I D£F

1 1

S I G N

E

I 1 0 . 0 7 4 1 j 1

1

11

MEAN

RANK

]

0 . 1 0 8

N ( 0 : 0 . 0 2 ) S N ( 0 ; 0 . 2 5 ) D

]

1 1 0 . 0 5 0 7 1 1 1 0 . 0 4 9 0

1 1 1 I 0 . 0 5 1 0

0 . 1 1 5

:

5

i I1 | 11

0 . 0 3 4

1 0 . 0 5 1 1 1 1 0 . 0 4 5 5 i 1 1 0 . 0 4 9 9

1 1 I 1 0 . 0 4 7 4

1 1 1 1

0 . 0 3 8

FOR

2 1

0 * 2 2 5

t

5

1 1 t 1 1

Dfc F

1

1 1 I 1 0 . 0 4 6 5 i 1 1 1 I 1 0 . 0 4 7 5

6

1 0 . 0 3 9

O E S II PROC S83I3

1 i 1 1 I 1 I

2

10.042

1 1 I 10.0546 1 1

1.

9

1 0 . 1 1 5

MSD

ESTIMATORS

S I G N

1 1 I 1 0 . 0 4 7 0 1 1 1 1 1 1 0 . 0 4 0 8 i I 1 1 1 1 0 . 0 4 5 2 I 1 1 1

1

1 0 . 0

II

14 E

1 1 1 1 0 . 0 5 8 3

14

1 0 . 2 0 4

0492

Or

1

1 1 0 . 1 9 3 3 1 i 1 1 1 1 0 . 5 2 6 8 | |

|

I

G N

DEF

1 0 . 9 2 2

|

4 RANK

14

1 0 . 4 0 1

4

1 0 . 7 8 0

13

1 0 . 4 4 8

5

1 [ 0 . 1 7 1 3 | f

1 0 . 3 2 4

3

1 1 0 . 1 1 9 5 j j

1 0 . 0 3 1

2

1

1 1 0 . 1 1 5 8 | j

10.0

1

1 0 . 5 2 5

8

7

10

1 1 0 . 2 2 1 4 j 1 t 1 1 1 0 . 2 9 7 0 j

1 0 . 4 7 7

1 0 . 6 4 7

1 0 . 6 1 0

9

11

1 1 0 . 3 6 5 2 j |

1 0 . 6 8 3

11

7

1 1 0 . 2 2 6 6 j |

1 0 . 4 8 9

8

1 1 0 . 4 7 2 7 | j

I 0 . 7 5 5

12

1 1 0 . 3 2 9 1 | j

1 0 . 6 4 8

10

1 1 0 . 2 1 4 0

1 0 . 4 5 9

6

1 1

I

R o b u s t E s t i m a t e s in L i n e a r

TABLE

2.

1

: DEFICIENCIES

D E S I G N I DIST

||

DE F

I 10.0899 I 10.0 I 10.0729 I I 0 .2644 I

10.0226

I lo.012a I 10.0 I 10.0730 I

10.0600

I 10.0381 I I 0.3817 I 10.1352 I

I RANK

131

Regression

AND

D I

RANKS

E S DEF

O.C87a

OF

:

HU1

G N 2 RANK

7

0.0

1

0.C782

7

0.4911

3

0.0383

5

0.0439

3

0.0373

3

0.0734

6

0.0626

6

0.0505

4

0.1C29

4

0.1652

4

I 0.2621

0.2794

4

I 10.0079 I 10.0520 I I 0.3879 I 10.0390 I 10.0292 I

0.0

1

0.0736

5

0.2829

6

0.0355

7

0.0510

4

D E S I |I

D;F

G I

£I(33C sss: I 10.0858 I 10.3041 1 10.0702 I 10.3658 I 10.0337 I 10.0293 I 10.0507 I 10.0700 I 10.0549 I 10.0456 I 10.1057 I 10.1942 I 10.4484 I 10.0140 I 10.0298 I 10.3170 I 10.0391 I 10.0198 I

N 3

RArtK

D E II

S

DEF

0.0928 0.0352 0.0775 0.3402 0.0312 0.0243 0.038Ó

0.0611 0.0556 0.0590

0.1122 0.1971 0.4009 0.0305 0.0078 0.2103 0.0346

0.0260

G N RANK

4 II

Siegfried Heiler

132

TABLE 2. 2 : DEFICIENCIES ANO RANKS OF : DES I G N I

AN2

DESI G N2

D1ST 1 1 DE" 1 RANK I1 DEF | RANK • i -1 1 I I1 I I 1 1 i 1 N 1(0.1060 1 7 ! 0.0706 | 6 i t 1 i 1i 1 < 1 1 1 4 | 0.0132 1 6.5 T 1 ! 0.0407 I t i 1 i J 11 1 1 110.0895 I 7 l 0.0609 | 6 IN 1 1i 1 l [ l 1 9 I 0.5718 1 9 CHI 1 | 0.39 89 i 1 1 1 1I 1 1| ! N5N1 1|0.0063 1 2 ] O.OC66 1 2 jI 1 i l I N10N1 ) 10.0 1 1 l 0.0 |j 1 i i 11 1 1 N20N1 110.0360 1 4 1 0.0315 | 2 i i i 1 1 1I 1 I 1 N5C 1I0.0668 1 4 ! 0.0404 | 5 j I i 1 1 I1 1 NIOC 1I0.0457 i 3 1 0.0210 1 3 I ] 1 t 1 I i ! i N20C 110.0244 1 2 ! 0.0 1 1 i| 1 1 1 1 1 1 N5N2 1 10.0 1 1 1 0.0 1 1 1 1 11 1 1 1 N10N2 1 1 0.0 1 1 1 0.0 1 1 1 1 1 1 1 11 1 1 I ! 2 ] 0.0734 | 2 N20N2 110.1062 1 1 i I 11 1 1 i 1 L 1|0.0492 1 6 i 0.0356 | 5 i I I 11! I l ] DE 1I0.1007 1 8 { 0.1435 j1 8 i| 1 i i 1 C 1I0.1822 1 3 I 0.1863 1 4 1I I 1 \ 1 1 1] 1 AN 110.0252 1 5 I 0.0197 | 6 1 I i i 1 1I 1 1 ACN 110.0236 1 3 1 0.0295 | 2 1i I 1 1

DES : G N 3

DES I G

1 1 DEF 1 RA IK 1 1 DEF 1 RA aasasasas= 3 3 3333 33:313«a a«333 1 l 1 1i 1 11 I 10.0829I 6 110.0832 1 6 ti •i i 1 I 1 i10.0357 1 4 110.0352 1 3 1 1 1 1 I1 11 i 7 I 10.0789 1 7 1 | 10.0731 j 1I | l 110.4970 1 9 1I0*4846 1 9 1 I 1 1l 11 i10.0205 1 2 I10.0205 1 2 |j jj 1 1 1jio.o 1 1 1 1 10.0 1 1 1 110.0465 1 3 i10.0540 I 4 t i 11 I 1i 1 |10.0590 1 3 110.0387 1 3 i iJ 1I I 110.0387 1 3 110.0244 1 3 i t 1 t li I1 110.0108 1 2 1 lo.0 1 1 i1 1 1 1i 1 110.0209 1 2 1 Io.o I 1 1 Ij I 1I 10.0 1 l 1 10.0 1 1 1 1 1 1 1 11 11 I io.10601 2 1¡0.0310 1 2 || i i 11 I|10.03461 6 |10.0467 1 6 1 lI I 1 7 110.0410 1 6 1 j 10.0973 | !j 1 1 1 I j 1 I

10.19641 3 i 1 10.03821 6 j 10.0051 1 2 1

! 10.2509 1 6 t 1I 1 110.0332 1 6 1 i 1i I Io.o 1 1 II

133

Robust Estimates in Linear Regression

TABLE

2. 3

: DEFICIENCIES 0 = S :

DIST

assess:

II

DS=

0.0925 0.0360 0.0748 0.1664 0.0421 0.0489 0.0514 0.0737 0.0711 0.0548

0.1212 0.1950 0.3025 0.0344 0.0739 0.3953 0.0321 0.0504

AND RAIJKS Or

G N L

I RANK

DEF

I 10.0684 IC I 10.0236

!

10.0519 1 10.3589 I 10.0274 I 10.C614 I 10.1045 I 10.0359 I 10.0413 I I 0.0563

!

I0.1C92 I 10.2115 I 10.3613 I 10.0134 I 10.0S52 I 10.3379 I 10.0185 1 I0.C809

WIL

D E S I

D E S I I

:

I

6 N 3

RANK II DEF I RANK a^s : : u : i a i : 3 : 3

D E S I G N 4 II

DEF

0.C719

0.0753

0.0208

0.0404

0.0566

0.0540

0.2639

0.2483

0.0497

0.J499

0.0558

0.0598

0.1019

0.0°32

0.0708

0.0589

0.0640

0.0609

0.0761

0.0878

0.1316

0.1416

0.2639

0.2494

0.5253

0.4769

0.0168

0.0252

0.0452

0.0117

0.3689

0.2960

0.0366

0.0304

0.0447

0.0486

I RANK

II

Siegfried

ABLE

2.

4

: DEFICIENCIES D E S I G N I

OIST

II

DE F

| RANK

ANO BANKS OF D E S I

II

DEF

:

GAS

G N 2 I RANK

Heiler

D E S I G N II

DEF

I RANK

3

D E S I G N * II

DEF

I RANK

I I

N

I 10.19d0

I 12.5

110.1955

I

14

110.2216

I 12

110.2150

I

12

II

T

IIO.0730

I 10

I 10.0719

I

9

110.1066

I

9

¡10.1115

I

9

II

LN

110.1844

I

13

II0.2C12

I

14

110.2116

I 12

110.2365

I

12

II

CHI

110.4398

I

12

110.6488

I 12

110.5415

I

110.5100

I 10

II

N5N1

110.1282

I 11

110.1439

I

13

110.1567

I 11

110.1437

I

II

N10N1

110.0970

I

9

110.1407

I 13

110.1106

1

9

110.1136

1 7

II

N20N1 1 1 0 . 0 5 7 4

I

6

110.1029

1

110.0797

I

5

110.0900

1 5

II

7

11

9

N5C

110.1832

I

11

110.1789

I

13

110.1944

I

11

110.1752

1 10

||

N10C

110.1654

I 11

110.1637

I

13

¡¡0.1696

I

10.5

¡10.1584

I

9

II

8

N20C

110.1209

I

9

110.1277

I

13

1¡0.1417

Ì

9

110.1447

I

N5N2

110.1616

I

9

110.1926

I

13

110.2209

I 10

I ¡0.2080

1 9

N10N2

110.1767

I

6

110.2301 1

8

110.2543

Ì

6

¡10.2835

I

N20N2

110.1927

1

4

li 0.3006

Ì

5

110.4399

Ì

4

110.4509

1 6

L

110.0883

I 12

I1J.C791

I

14

110.1259

I 11

110.1242

I

DE

110.0293

I

2

110.0320

I

2

110.0482

I

4

110.0033

1 2

C

110.2862

I

4

110.1595

I

3

110.3301

I

6

110.1931

1 2

II

AN

110.0931

I 12

IlO.10C6

I

14

||0.0860

Ì 10

110.0612

1 9

I I

ACN

110.0279

I

110.0535

I

5

110.0512

I

110.0541

1 6

II

4

6

II li

7

M II

9

II II

Robust Estimates in Linear Regression

135

References

Adichie, J.N. (1967), Estimates of regression coefficients based on rank tests. Ann. Math. Statist., 38_, 894 - 904. Andrews, D.F. (1974), A robust method for multiple linear regression. Technometrics, 16, 52 3 - 5 31. Andrews, D.F. et al. (1972), Robust estimates of location: Survey and advances. Princeton University Press, Princeton, N.J. Barnett, V-, Lewis, T. (1978), Outliers in Statistical Data. Wiley, New York. Bickel, P.J. (1973), On some analogues to linear combination of order statistics in the linear model. Ann. Statist., J_, 597 - 616. Brown, G.W., Mood, A.M. (1951), On median tests for linear hypotheses. Proc. 2nd Berkeley Symp., 159 - 166. Carroll, R.J. (1978), An investigation into the effects of asymmetry on robust estimates of regression. Inst. Statist. Mimeo Ser. No 1172, Univ. of North Carolina. Dutter, R. (1980), Robuste Regression. Bericht Nr. 135, Institut für Statistik, Technische Universität Graz. Filipiak, B. (1980), Untersuchung der Verteilung einiger M-Schätzer im linearen Modell bei endlichen Stichproben mit Simulationen. Diplomarbeit, Dortmund. Gastwirth, J. (1966), On robust procedures. J. Am. Stat. Assoc. 61, 929 - 948. Hampel, F. (1968), Contributions to the theory of robust estimation. Ph. D. Thesis, Berkeley. Hampel, F. (1971), A general qualitative definition of robustness. Ann. Math. Statist. 42, 1887 - 1896. Heiler, S-, Willers, R. (1979), On the asymptotic distribution of R-estimates in linear regression. Forschungsbericht 79/6, Abteilung Statistik, Universität Dortmund. Heiler, S. (1980), Robuste Schätzung im linearen Modell. In: Nowak, H. and Zentgraf, R.: Robuste Verfahren. Hodges, J.L., Lehmann, E.L. (1963), Estimates of location based on rank tests. Ann. Math. Statist., 34, 598 - 611. Huber, P.J. (1964), Robust estimation of a location parameter. Ann. Math. Statist., 35_, 73 - 101. Huber, P.J. (1973), Robust regression: Asymptotics, conjectures and Monte Carlo. Ann. Stat., l_, 799 - 821. Jaeckel, L.A. (1972), Estimating regression coefficients by minimizing the dispersion of the residuals. Ann. Math. Statist., 43_, 1449 - 1458. Jureckovä, J. (1971), Nonparametric estimate of regression coefficients. Ann. Math. Statistit, 42_, 1328 - 1338. Koenker, R., Bassett, G. (1978) Regression Quantiles. Econometrica, 46, 33 - 50. Knötsch, R. (1980), Untersuchung der Verteilung einiger M-Schätzer im linearen Modell bei verschiedener Wahl der Skalenparameter und Vergleich mit R-Schätzern mit Simulationen. Diplomarbeit, Dortmund. Launer, R.L., Wilkinson, G.N. (ed.) .(1979), Robustness in Statistics. Academic Press, New York, N.Y. Lecher, K. (1980), Untersuchung von R-Schätzern im linearen Modell mit Simulationen. Diplomarbeit, Dortmund.

136

Siegfried Heiler

Mood, A.M. (1950), Introduction to the theory of statistics. McGraw-Hill, New York, N.Y. Novak, H., Zentgraf, R., (ed.) (1980), Robuste Verfahren, Medizinische Informatik und Statistik No. 20, Springer, Berlin. Ruppert, D., Carroll, R.J. (1978), Robust regression by trimmed leastsquares estimation. Inst. Statist. Mimeo Ser. No. 1186, Univ. North Carolina. Ruppert, D., Carroll, R.J. (1979), Trimming the least-squares estimator in the linear model by using a preliminary estimator. Inst. Statist. Mimeo Ser. No. 1220, Univ. North Carolina. Theil, H. (1950), A rank-invariant method of linear and polynomial regression analysis. Proc. Kon. Ned. Akad. v. Wetensch. (A), 53_, 386 - 392, 521 - 525, 1379 - 1412. Wahrendorf, J. (1980), Robuste Statistik: Eine einführende Übersicht. In: Nowak, H., Zentgraf, R. (ed.). Robuste Verfahren. Wold, A. (1940), Fitting of straight lines if both variables are subject to error. Ann. Math. Statist. 11, 284 - 300.

Nichtlineare Regression: Parameterschätzung in linearisierbaren Regressionsmodellen Max-Detlev Jöhnk

1. EINFÜHRUNG UND ÜBERSICHT Unter den nichtlinearen Regressionsmodellen spielen die linearisierbaren eine hervorragende Rolle. Dieser Aufsatz will versuchen, einen kleinen Beitrag zur Parameterschätzung im linearisierbaren Regressionsmodell zu leisten. Der nach dieser Einführung folgende zweite Abschnitt soll hauptsächlich den Begriff des linearisierbaren Regressionsmodells präzisieren. Im dritten Abschnitt werden die wichtigsten verteilungsfreien Verfahren der Parameterschätzung im nichtlinearen Regressionsmodell jeweils mit dem Spezialfall des linearisierbaren Modells dargestellt. Diese Verfahren beruhen letztlich auf Iterationen. Ihre Brauchbarkeit hängt daher von einer guten Näherung als Anfangswert für den Iterationsprozeß ab. Im vierten Abschnitt wird ein Verfahren vorgeschlagen, die Parameter des Regressionsmodells näherungsweise zu schätzen. Der Schätzer kann, wenn die Näherung für nicht ausreichend gehalten wird,als Startwert für eines der im dritten Abschnitt dargestellten Iterationsverfahren verwendet werden. Der abschließende fünfte Abschnitt zeigt die Wirkungsweise der vorgeschlagenen Schätzverfahren am Beispiel einer Exponentialfunktion als Regressionsmodell.

2. LINEARISIERBARE REGRESSIONSMODELLE Als Regressionsmodell soll im folgenden ein Modell verstanden werden, das die von zufälligen Störungen überlagerte Abhängigkeit einer Größe k

erklärenden Größen

x , ... , x

y

von

darstellt. Das Modell wird vervoll-

138

Max-Detlev Jöhnk

ständigt durch Parameter

••• , ß^

den Beobachtungen

, die nicht bekannt sind und aus

••• >

'

t = 1, ... , n

geschätzt wer-

den müssen. Das korrespondierende störungsfreie Modell werde als Regressionsfunktion bezeichnet.

Eine nichtlineare Regressionsfunktion soll als linearisierbar bezeichnet werden, wenn es gelingt, sie durch eine passend gewählte Transformation der abhängigen Variablen linear in den Modellparametern zu machen. In diesem Fall ist

p = k

6 , ... , ß 1 P . Dabei kommt es darauf an, daß die

Transformation vollständig bekannt und frei von unbekannten Größen ist. Für die Anwendbarkeit der noch darzustellenden Schätzverfahren muß zudem die Transformationsfunktion wenigstens zweimal differenzierbar sein. Die gängigen Regressionsfunktionen dieses Typs erfüllen diese Forderung. A. Haid (1, S. 56o) weist darauf hin, daß allein durch die Transformationen

1/y

,

exp (y)

und

In (y)

in Verbindung mit entsprechenden

nichtlinearen Transformationen der erklärenden Variablen eine den praktischen Ansprüchen meist genügende Vielfalt von Regressionsfunktionen darstellbar ist. Ergänzend sind noch Transformationen der Art denkbar, von denen

1/y

y° (a ^ 1)

ein Spezialfall ist; allerdings muß dann

a

vor-

gegeben sein.

Bezeichne

x,, ... , x, - im Fall einer inho1 k mogenen Regression einschließlich der Scheinvariablen - gebildeten Zeilenvektor und

x'

ß

den aus den Variablen

den Spaltenvektor der Modellparameter, dann kann eine

linearisierbare Regressionsfunktion als y = g(x'ß)

(1)

h(y) = x'ß

(la)

oder

geschrieben werden; dabei ist und

g(.)

h(.)

die bereits erwähnte Transformation

die zugehörige Umkehrtransformation.

139

Nichtlineare Regression

Der durch (1) bzw. (la) beschriebene Zusammenhang kann in der Regel nur in gestörter Form beobachtet werden. Für die Störung werden meistens unberücksichtigt gebliebene Einflußgrößen und/oder Beobachtungsfehler verantwortlich gemacht. Dementsprechend können drei Modelle für den Ansatzpunkt von Störeinflüssen unterschieden werden:

Modell 1: Modell der externen Störung

yfc = gCx^ß + ufc)

(2)

h(y ) = x^ß + ufc

(2a)

Modell 2: Modell der internen Störung

y t = g(x^B) + v t

(3)

h(y t - v t )

(3a)

x^ß

Modell 3: Modell der doppelten Störung

(4)

Y t = gtx^ß + ufc) + h(y t - v t )

V

+

U

Die Modellbeziehungen gelten jeweils für nen

u

und

(4a)

t

die den Beobachtungen

t = 1, ... , n (y^_, x^)

. Darin bezeich-

zugeordneten Stör-

größen.

Da die Modellbeziehungen

(2) - (4) sowohl für die Modellvariablen als

auch für die Beobachtungen gelten, soll im folgenden in der Symbolik nicht zwischen beiden unterschieden werden. Nötigenfalls wird darauf hingewiesen, ob die Variable oder Beobachtung gemeint ist. Die Gefahr von Verwechslungen oder Mißverständnissen ist infolgedessen nicht gegeben.

140

Max-Detlev Jöhnk

Bezüglich der Störvariablen

und

v

sollen die üblichen Annahmen

gelten. Sei also

/V1 \

/U1 \ und

v

U

\ nJ

Vnl

dann wird

E (u) = 0

(5a)

E (v) = 0

(5b)

E(uu') = a 22 I u

(5c)

E(w') = a2 I v

(5d)

E(uv1 ) = 0

(5e)

vorausgesetzt.

Im Modell 1 sei

y

= E g(x^ + u^)

und

v

=

- y^

, womit die Dar-

stellung des Modells 2 erreicht ist. Die Unterscheidung der Modelle 1, 2 und 3 gewinnt ihren Sinn daher vor allem dadurch, daß auf der Grundlage homoskedastischer Störvariabler eine breite Palette von Varianzmodellen dargestellt werden kann.

In der nachfolgenden Darstellung werden häufig der Einfachheit halber die Beobachtungen

y

bzw.

und

y = \yn

x^

X

zusammengefaßt als

/*; \ Wi

141

Nichtlineare Regression

notiert. Ferner soll

/h(Yl)\ h(y) =

: \

h

V

sein. Entsprechend sind andere Notationen zu verstehen, in denen ein Vektor als Argument einer Funktion auf oder

3.

IR verwendet wird, wie z.B.

g(Xß)

h'(y)

ALLGEMEINE SCHÄTZVERFAHREN BEI ADDITIVEM STÖRGLIED

Das linearisierbare Regressionsmodell kann in der Version 2 als Spezialfall des allgemeinen nichtlinearen Modells

y

t

=

f(x

it

x

kf

ß

i

y

+

v

(6)

t

= f(x t , ß) + v t

aufgefaßt werden, worin

v

den Bedingungen (5b) und (5d) genügt. In

diesem Modell liegt es nahe, den Parametervektor

ß

nach der Methode der

kleinsten Quadrate zu schätzen. Dazu gibt es im wesentlichen zwei Vorgehensweisen.

3.1. Direktes Verfahren

Bei der direkten Minimierung wird das Prinzip der kleinsten Quadrate unmittelbar angewandt. Das führt zur Minimierung des Ausdrucks

S2 =

bezüglich

ß

E t (yfc - f(x t , ß))2

(7)

. Dazu kann entweder das Minimum von (7) unmittelbar gesucht

werden oder es wird mit

142

Max-Detlev Jöhnk

V v

ß0) :=

ä|-f(xt' ß) iß=ß°

das nichtlineare Gleichungssystem

Efc (yt - f(xfc, ß)) • f j (x t , ß) = 0

für

j = 1, ... , p

(8)

gelöst. Hinweise auf geeignete Minimierungstechniken

geben Goldfeld und Quandt (2, Chpt 1). Im Spezialfall des linearisierba— ren Modells wird (8) zu

Efc (yt - g(x^ß)) • g'(x^ß) • x

für

= 0

(9)

j = 1, ... , Je .

3.2. Indirektes Verfahren *

Die indirekte Minimierung geht von einer Näherung

ß

aus. Mittels der

Taylor - Entwicklung f (x . ß) = f(x . ß*) + E. f. (x , ß*) • (ß. - ß*) + R t t 3 3 t 3 3 mit dem Restglied

Yi_

t

R

ergibt sich näherungsweise die lineare Darstellung

- f(x . ß*) - E . f.(x . 8*) • (ß. - ß*) + v t 3 ] t D D t

(10)

die im linearisierbaren Fall als

y

t

• (ß - ß*) + v - g(x'ß*) = E. g'(x'ß*)x t 3 t 3t J J

geschrieben werden kann. Sei menten

z.

3t

= f.(x . ß*°) 3 t

ß*°

eine Näherung und

(11)

Z

die aus den Ele-

Nichtlineare Regression

143

bzw. im linearisierbaren Fall

z. = g*(x' ß*°) x :t * t Dt gebildete Matrix, d a m ist mit

y t (ß*°) = f(x t , 6*°)

bzw.

yfc(ß*°) = g(x;ß*°)

durch ß*1 = ß*° + (Z'Z)"1 Z' (y - y(ß*))

(12)

eine neue Schätzung gefunden, die sich nach dem gleichen Prinzip beliebig weiter verbessern läßt. Nähere Hinweise zu dem Verfahren gibt J.M. Chambers (3, S. 5).

4.

APPROXIMATIVE PARAMETERSCHÄTZUNG IN LINEARISIERBAREN REGRESSIONSMODELLEN

4.1. Modell 1

Bei linearisierbaren Regressionsfunktionen besteht das gebräuchliche Verfahren darin, das Modell durch die Transformation und die Modellparameter

ß

de der kleinsten Quadrate zu schätzen. Sei aus den Elementen

h

(y t >

h(.)

zu linearisieren

aus dem linearisierten Modell nach der Methoh(y)

gemäß Abschnitt 2 der

gebildete Spaltenvektor, dann lautet demzufolge

der Schätzer ß = (X'X)"1 X'h(y)

.

(13)

Diese Vorgehensweise ist dann optimal, wenn die transformierte Zufallsgröße

h(y)

homoskedastisch ist. Diese Bedingung trifft zu beim Modell 1;

für dieses Modell ist das Verfahren der Variablentransformation sinnvoll und angemessen. Zudem schätzt in diesem Modell

ß

den Parametervektor

Max-Detlev Jöhnk

144

ß

unverzerrt; denn wegen E(h(yt) ) = x^ß + E(ufc) = x^ß

ist E(ß) = ß

.

4.2. Modell 2 Im Modell 2 sind die Zufallsvariablen

y

selbst homoskedastisch. Infol-

gedessen führt die Variablentransformation

y

h (y )

zu heteroske-

dastischen Zufallsvariablen. Damit ist der gewöhnliche Kleinstquadrate Schätzer nicht mehr optimal. Es ist jedoch möglich, den Einfluß der Variablentransformation auf die Varianz näherungsweise zu bestimmen und zu neutralisieren. Dazu werde die Taylorentwicklung von

h(y^ - v )

betrachtet und nach dem linearen Glied abgebrochen, wonach

h(yfc - v t ) = h(y t )

v h t '(y t >

entsteht. Daraus folgt wegen h (y ) = x^ß + v

h(y

• h' (yt)

+

R

- v ) = x'ß .

Demzufolge kann mit 0

0 • o

o . D '0

0

-2

= {diag (h1 (y)) }

\

um

y^

Nichtlineare Regression

der Parametervektor

145

ß

durch

8 = (X'DX)-1 X'Dh(y)

(14)

geschätzt werden. Dieser Schätzer ist nicht mehr unverzerrt. Zur näherungsweisen Bestimmung der Verzerrung werde die Taylorentwicklung von h

um

y

= g(x^8)

betrachtet. Aus

h(yfc) = h(y t ) + vfc h' (yt> + | v^ h " (yt) + R(v t )

ergibt sich wegen

hfy^) =

E(h(y )) = x'ß + \ a 2 h' • (yJ + E (R (v )) t t 1 v t t Unter der Voraussetzung, daß mit

~

D = {diag (h'(y))}

E(R(v ))

E (ß) ~ ß + - CT2 (X'DX) a2

Eine Schätzung für yt - yt

vernachlässigt werden kann, ist

- 2

X1 Dh' ' (y)

läßt sich mit

y

.

(15)

= g(Xß)

aus den Residuen

als

a2 = — z v n - k gewinnen, während

(y. - y. ) 2 t t h'(y)

und

h11(y)

durch

h'(y)

bzw.

h 1 '(y)

ge-

schätzt werden können. Damit läßt sich erforderlichenfalls die Verzerrung von

ß

korrigieren.

4.3. Modell 3

Soll im gemischten Modell ähnlich vorgegangen werden, wie im Modell 2, dann muß wenigstens das Verhältnis der Varianzen

a2 u

sein. Wird nämlich wieder die Taylorentwicklung von

und v

a2

bekannt

- v )

betrachtet und nach dem linearen Glied abgebrochen, dann entsteht

um

y

146

Max-Detlev Jöhnk

h(y t ) = x^ß + u t + v t -h" (y ) + R(u t , Sei

= u

t

+

+ R

'^t'

^ u t ' v t'

.

kombinierte Störgröße, dann gilt

bei vernachlässigbarem Restglied

'(z) = a 2 + o 2 | h" (y ) t u v 't az ( — v

+ l>(yt) ]

2

)



Wäre das Verhältnis

y? = o 2 /a 2 u v bekannt, dann könnte wie beim Modell 2 mit

\

,2-1 /(k2+|_h- (y^] 2 )" o .

(k^+|_h- ( y n ) ] 2 ) - 1 / diag { |_k2 + (h1 (y) ) 2 ]

der Parametervektor

ß

1

}

durch

= (X'DX)"1 X'Dh(y)

(16)

geschätzt werden. Wie beim Modell 2 ist es möglich, die Verzerrung von mit Hilfe der Taylorentwicklung um

y

= g(x^ß)

näherungsweise zu be-

stimmen. Es ist h(y ) = x^ß + u

+ v t h' (y )

+ j {u2 [g'ix^ß)] 2 + 2u t v t g'(x^ß) + v 2 } h 1 ' (yfc) +

147

Nichtlineare Regression

und daher !(h(v ) ) = x'ß + ^ {o2 t t 2 u

2 g 1 (x'ß)] + a 2 } h " (y ) t -1 v t

Sei G = diag (g' (Xß))

(17)

H = diag (h*(y)) = G _ 1

(18)

D = (k2I + H 2 )" 1

(19)

dann ist infolgedessen E(ß) « ß + i- (X'DX)-1 X'D (CT2 G 2 + o 2 I) h " ( y ) 2 u v = ß + x CT2 (X'DX)"1 X'G 2 h"(y) 2 v 2 Dazu muß jedoch neben

k

auch

9 a^

(20)

bekannt sein oder aus den Daten ge-

schätzt werden können.

5. VERGLEICH DER SCHÄTZVERFAHREN AM BEISPIEL DER EXPONENTIALFUNKTION Der in ökonometrischen Modellen am häufigsten anzutreffende Fall eines linearisierbaren Regressionsmodell ist wohl ohne Zweifel die Exponentialfunktion, die als Wachstumsmodell dient oder mit logarithmierten erklärenden Variablen als Modell von Nachfrage- und Produktionsfunktionen. Aus diesem Grunde wurde dieses Regressionsmodell für einige erste Proberechnungen verwendet.

In der einfachsten Form lautet die Regressionsfunktion

a+ßx y = e

. Für

das Modell 2 können die Parameter verhältnismäßig einfach nach der direkten Methode der kleinsten Quadrate geschätzt werden. Der Schätzer gibt sich als Lösung von

ß

er-

148

Max-Detlev Jöhnk

ßxfc E und

a

e

2ßx t ßx t 2ßx t e = E x^y^ e • E e

• E

kann danach aus a

2ßx ßx. e / E e

E y

ermittelt werden. Nach der im Abschnitt 4.2 dargestellten approximativen Methode lauten die Schätzer wegen 2 fr E y

Mn

v

\ /

2 X

E

2 tYt

1

_

2

Z X

X

2 \ tYt

-1

Z Y

2 2 tYt

v Z

2 tYt

E



X

X

t

und

ln

Y

h'(y) = 1/y

t

E x t y t ln y t

h 1 '(y) = -1/y

Die Verzerrung kann wegen

'y

X

h(y) = ln(y)

durch

2 tYt

2 2 tYt,

E x.

geschätzt werden. Um verschiedene Schätzverfahren miteinander vergleichen zu können, wurden fünf Datenreihen mit 6 = 0.2

und

a^ = 25

Schätzungen für

o

und

x

= 0 (1) 10

,

a = 2

,

erzeugt. Aus den Datenreihen wurden jeweils vier ß

durchgeführt und Zwar:

NLKQ

Nichtlineare Methode der kleinsten Quadrate (Abschnitt 3.1)

ELRL

Einfache lineare Regression mit logarithmierten Werten

ARLl ARL2

Approximativ gewichtete lineare Regression mit logarithmierten Werten, mit und ohne Korrektur

Danach ergaben sich folgende Schätzwerte für

ß

149

Nichtlineare Regression

Datenreihe

1

2

3

4

5

NLKQ

0. 159

0. 197

0 20 7

0 253

0 205

ELRL

0. 118

0. 331

O 204

0 195

0 167

ARLl

0. 146

0. 162

0. 195

0 227

0 172

ARL2

0. 153

0. 181

0 199

0 240

0 186

Diese Übersicht zeigt, daß die durch die approximativ gewichtete lineare Regression gefundenen Schätzwerte im Durchschnitt erheblich näher an den Schätzwerten liegen, die durch nichtlineare Regression gefunden werden und daher z.B. als Startwerte für eine iterative Lösung eher infrage kommen, als die Schätzwerte, die durch einfache lineare Regression der logarithmierten Werte gefunden werden.

Literatur

1. Haid A. , Statistical Theory and Engineering Applications, New York 1952. 2. Goldfeld S.M. , Quandt R.E. , Nonlinear Methods in Econometrics, Amsterdam 1972. 3. Chambers J.M. , Fitting nonlinear models: numerical techniques, Biometrika 6o, 1-13 (1973).

A Test for Independence of Dichotomous Stochastic Variables Distributed Over a Regular Two-Dimensional Lattice Peter Kuhbier and Joachim Schmidt

1. INTRODUCTION In economics and other social sciences "it is often necessary to consider the geographical distribution of some quality or phenomenon in the counties or states of a country, and one of the questions we may ask is whether the presence of some quality in a county makes its presence in neighbouring counties more or less likely" (Cliff and Ord 1973, p.l). There are at least two basic ways of answering such a question: a) The observed quality in each county is considered to be the realization of a corresponding stochastic variable, and, provided a suitable test statistic is at hand, a test is carried out determining whether these stochastic variables are uncorrelated with (or independent of) those in neighbouring counties. b) Presumed functional relationships between observations in neighbouring counties are formulated in terms of a model, and again, provided there is a suitable test statistic, a test of whether the model gives a satisfactory description of the data is carried out.

Depending on the aim of the investigation under consideration, possibility a) and/or b) will be chosen. In this article, we shall confine our analysis to the problem of choosing a test statistic for spatial correlation. The reason there are difficulties in defining a suitable test statistic is that any such statistic, first of all, crucially depends on the definition of which counties are neighbours and which are not, and, secondly, on the type of data (qualitative or quantitative) that is available. In order to facilitate the analysis, we make two simplifying assumptions:

152

Peter Kuhbier and Joachim Schmidt

ASSUMPTION 1 The counties under consideration form a regular lattice of squares (or rectangles) in the plane. ASSUMPTION 2 The quality or phenomenon under consideration is dichotomous. Its two possible realizations are denoted by "0" and "1", respectively. Given assumption 1 there are three straightforward definitions of neighbourhood: Two rectangles are neighbours if they have (i) a common border (rook's case), (ii) a common vertex, but not a common border (bishop's case), (iii) a common vertex and/or a common border (queen's case). Furthermore, given assumption 2 two simple join count statistics (BB and BW) have been proposed which can serve as test statistics for spatial independence in all three cases (see for instance Cliff and Ord (1973), p. 4, and Moran (1948) p. 243): BB = i 2

E w. .x,x . i,j 1 3 1 D

(1)

and BW = -¡It I w. . (x. - x.) 2 , 2 3 i,j 1 3 1 where

i,j = 1,...,N

(2)

label the regions under consideration,

x^

is the

realization of

X. in region i (x. € {0,1}), and w. . is the weight of i i i ] contingency between regions i and j (while Moran (1948) only deals

with

w. . 6 {0,1}, Cliff and Ord (1973) allow arbitrary weights

If the hypothesis is valid that, with probability all realizations

"0"

or

"1"

p^

w. . € R ). + und p^, respectively

occur independently of any realizations in

neighbouring regions where neighbourhood is defined by means of the weights w^^, then, asymptotically, both test statistics are normally distributed and the usual tests of hypothesis can be carried out. However, these statistics have certain disadvantages, since even in the simple case of a chessboard-like pattern of observations three different results are obtained (positive, negative, or no correlation) depending on which of the above defined types of neighbourhood is employed.

153

Test for Independence of Dichotomous Variables

The most general approach to a description of spatially distributed phenomena in terms of their joint or conditional distribution is found in an article written by Besag (1974). There it is shown that, provided certain additional conditions hold (Besag 1974, pp. 195 and 197), the joint distribution of the variables at all sites conditional distribution of variable

P(x,,...,x ) or the I N given the values of the varia-

X^

bles at all other sites

P(x l I all other sites) can, in the discrete case, be derived from the following function Q(x,,...,x ) = 1

N +

Here, all functions

N I x.G. (x, ) + I I 1=1 1 1 1 l2) , then the number of degrees of freedom of

k S_.

states is

(k3 - 1) (k-1) for j = 1,2,3,4 (cf. Anderson and Goodman 1957, p. 102). Nothing else changes. If, on the other hand, more than one observation at each site is available, then n equals the sum vw is the number of observations {(X, (T)=W) A (X. T

or in the

T

th

En (T), where n (T) T vw vw (T)=V); i £ I,} at time

1

sample, n^

, n ^

, and n ^

are defined

analogous-

tuw stvw stuvw ly. In this case, i.e. if there is more than one set of observations, one can also obtain ML-estimates

(T) , P. (T) , etc. separately for each r vw tuw sample and thus may test the hypothesis of spatial independence of X (T) , . . .

p

in each sample as well as the hypothesis of identical

distributions in all samples (cf. Anderson and Goodman 1957 for details in case of a Markov chain model). There is one important difference, however, between spatial models and a Markov chain model: in the Markov chain model, the variables

X. form a l sequence in a natural and given way while there is no a priori knowledge

of how to assign indices to regions in the spatial model. Hence, it is to

Test for Independence of Dichotomous Variables

165

be expected that, depending on the way regions have been provided with indices, tests of hypothesis will lead to different results for different assignments of indices. However, since the proposed test statistics take into account the full information on spatial dependencies up to the second order, divergent results of comparable statistics should be due to random variations only and not to systematic components. In this context, statistics are defined as comparable if the same number of neighbours in fixed constellations is taken into account.

One could even consider assigning indices to regions randomly. In this case, the number of constellations of neighbours with respect to a given region increases drastically: There are eight different positions if there g is one neighbour among all predecessors; there are

positions if there

are two neighbours among all predecessors; etc. Therefore, since all these different., constellations have to be considered as different cases, a random assignment of indices to regions leads, in general, to statistics with too many degrees of freedom, so that enormous amounts of data are required to yield significant results, even if there is not doubt that the quality under consideration is spatially correlated.

References Anderson, T.W., Goodman, L.A. (1957), Statistical Inference about Markov Chains, The Annals of Mathematical Statistics, 28^, 89-110. Besag, J. (1974), Spatial Interaction and the Statistical Analysis of Lattice Systems, Journal of the Royal Statistical Society (B), 36, 192-225. Cliff, A.D., Ord, J.K. (1973), Spatial Autocorrelation, Pion Ltd., London. Cliff, A.D., Ord, J.K. (1975), Model Building and the Analysis of Spatial Pattern in Human Geography, Journal of the Royal Statistical Society (B), 37, 297-328. Moran, P.A.P. (1948), The Interpretation of Statistical Maps, Journal of the Royal Statistical Society (B), 10, 245-251. Statistisches Landesamt Berlin: Ergebnisse der Volks-, Berufs- und Arbeitsstättenzählung in Berlin(West) am 27. Mai 1970. I, No. 218.

Computational Experiences with an Algorithm for the Automatic Transfer Function Modelling Hans-Joachim Lenz

1.

INTRODUCTION

Consider the following model relating the stationary input to the stationary output

("j^k-i

N

(yv). « which is stationarily coupled with .K 1 f • • • iN

y

k

=

Mlf

u

k-b

+ e

(1)

k

where A (L) = 1 - a. L 1 r

]r

- a L r

and

B (L) = b,- b,L - ... - b L S . s 1 1 s The function

T(L) = B(L)Lb/A(L) is called a rational transfer function.

^

is assumed to be an auto-correlated noise process of the

ARMA type, i.e.

D (L) "

where

F O T P

E

(2)

k '

C (L) = 1 - c1 L - ... - c L P , P P D (L) = 1 - d.L - . . . - d L q . q i q (e^)

is assumed to be a white noise process, i.e.

COV(e , e, ) = O k k

for all

E e

( j.) =

= a

\'

k | h .

For notational convenience we shall drop the arguments and the subscripts in the ploynomials

A, B, C

and

D.

Given a bivariate time-series

(u, , y, ). . of length N and admitting only models of the type k k—11 • *. jN k defined in (1) and (2) the problem is to find automatically an appropriate

168

Hans-Joachim Lenz

specification ^ A A A A2

(r,£,s) x (p,q)

and the corresponding estimates

(A,B,C,D,cr ) . Of course, selecting a model only by looking at the data makes it necessary to test against model inadequacies.

The iteration be-

tween specification, estimation and residual analysis was studied sytematically by Box/Jenkins (1970).

However, they have had a manual approach

in mind.

Fig. 1:

The gross structure of the automatic specification and estimation

We shall not be concerned here with the estimation of for given orders

A,B,C,D

and

a^

(r,b,s) x (p,q) . This is described in some detail by

Box/Jenkins (1970).

We are interested more in the specification itself

and are looking for an algorithm to select the orders which fit "well enough" to the data.

(r,b,s,) x (p,q)

The attribute "well enough" means

that the selected orders must fulfill certain model adequacy tests.

2.

TECHNIQUES OF ORDER SELECTION

There are many ways how to select the orders of the involved polynomials. One may use for instance (i)

Pattern Recognition Techniques

or

Algorithm for the Automatic Transfer Function Modelling

(ii)

169

Combinatorial Search Techniques.

Using the first techniques it is necessary to decompose the procedure of order selection at least into two sub-procedures. that the selection of the orders

This is due to the fact

(p,q) must be based on the noise process

(e ), which is not observable, however. Using cross-correlation K K — 1 , • * • ,N techniques the first procedure has to detect patterns in the estimated cross-correlation function output

(y^)

CFF

between the prewhitened

cf. Box/Jenkins (1970).

must be linked tentatively to the orders (e, )

(u^) and

(r,b,s) . When the orders are

selected, the parameters in the polynomials Then the residuals

input

Furthermore the detected patterns A

and

B

can be estimated.

can be reconstructed.

Fig. 2:

Decomposition of the order selection procedure

The second sub-procedure of the order selection method then starts.

Using

auto-correlation techniques the procedure has to detect patterns in the estimated auto-correlation function ACF of the tentatively specified and estimated noise process

(ek).

As described by Box/Jenkins (1970) these

patterns are linked to the orders parameters in the polymonials residuals

e,

C,D

(p,q) . For fixed orders the unknown can be estimated and the corresponding

of the noise model estimated.

However, notice that the specification and the estimation were processed sequentially instead of simultaneously.

This will cause usually a bias.

Hans-Joachim Lenz

170

This bias may be reduced by iterating between the transfer modelling and the noise modelling with a final step for simultaneous estimation of all parameters on each iteration step. (u*) means the prewhitened

This is sketched in Fig. 3, where

input series

(u^) .

)

( START 1 Tentatively transfer function modelling (A,B) (r ,b, s) Tentatively noise modelling

Simultaneous estimation of all the parameters (A,B,C,D)

(e

*>

. (r,b,s)

(p»q)

(ek)

white noise/

« \

-k V white noisq •5

c

ÌL. j TOP

3

Fig. 3: An order selection technique based on pattern recognition

171

Algorithm for the Automatic Transfer Function Modelling

The main difficulties with the order selection technique by

pattern re-

cognition are the detection of patterns in the ACF and CCF, which are not free of estimation errors, the lack of powerful white noise tests, the direction of re-specification in the case of any backtrack. Alternatively, the order selection could be performed by making use of a combinatorial search technique as follows.

It is reasonable to restrict

the orders of the polynomials and the delay-time: 0 3

1

1

by calling and

j

TRANSF

THEN STOP ELSE choose the

(r,b,s) . Compute

for different values

1 .

l'th entry of

= y^ - (B/A)

CLT, i.e.

.

(Order_search_f or_noise_models) CALL ARMA for given SPEZI4:

(Retrieve

j^th

IF

THEN STOP ELSE retrieve

j > 3

entry of SPEZI5:

best ARMA_model) (p,q) e CLN

as the j'th

CLN.

(Check residuals) Compute

Q = (N-p-max{p,Q}) I

auto-correlation IF SPEZI6:

(e^) .

function

Q * > X?1 -a; K-p-q THEN

T

ACF (T) where ACF is the estimated

* of

(e^) = ([c/D]ek) .

j-«-j+l J J

and

GOTO SPEZI4.

(Estimate parameter _smultaneously) Compute all the parameter estimates in the polynomials for given order

(r,b,s) x (p,q) .

A,B,C,D

Algorithm for the Automatic Transfer Function Modelling

SPEZI7^

(Comgute_residuals) Compute

E, = — [y. - — u. K k D A k-b correlation function ACF . SPEZI8:

for all

k

and estimate the auto-

(Check_residuals) _ ^ ^ K IF Q = (N-p-max{p,q}) £ ACF (x) > x? „ ~ ~ i-a; K-p-q T=1 THEN

SPEZI9:

175

j Y? THEN 1-a; k-r-s

1 + 1+1

and GOTO SPEZI2 ELSE, STOP.

The following two algorithms outline the order search procedure for a transfer function model and for a set of corresponding ARMA models ordered by increasing AIC values. 2E§fE_£2E_£i2SSfer_function The algorithm TRANSF selects the orders transfer function T(L) = B(L)/A(L)L^ y k = T(L) u^ + e^

r,b,s

of the

in the model

where

A(L) = 1 - a.L - ... - a L r 1 r B(L) = b Input :

o

- b.L - . .. - b L S 1 s

Estimated cross-correlation dunction

CCF

be-

tween suitable prewhitened input and output x o Output: TRANSFl:

of

(y^)j

CCF ;

N

the factor

JJ

; the largest lag X

Tentatively specified orders

for

X-a-rule

(r,b,s)

of

T(L)

(Test_for_non-zero_CCF) IF there exists a THEN

b = min {x| |CCF(X) | > X/V'N-X' , 0 "xx(o) ' * " ' , Y X X ( S ) '

naCh

(28)

Berechnung der AKF nach (29) und (23) xx(k) XX

für k=l,...,s

sowie

(o)

P, =-D C a) u ai •H j-i a) n o 0, Zg^=l), die z.B. monoton fallen oder bei saisonalen Modellen in der Umgebung der Vielfachen des Saisonlags relativ groß sind. 24) Für AR (m)-Prozesse gilt bekanntlich 'c>m • Bei T.W. Anderson, Seite 216 und 221 wird für solche Modelle gezeigt: \ß'kk sowie

N(0,1)

bzw.

p n- Z 4, , ->-A v 2 . kk p-m k=m+l *

kk

N (0,

für

k>m

(sog. Quenouille- Test) .

25) Vergleiche Bhansali, insb. Seite 559 sowie Hosking, Seite 225. 26) Man könnte höchstens versuchen, die Aussage von Fußnote 24) zu dualisieren. 27) Siehe Mohr (1979) und Mohr (1980). 28) Ein Grund dafür ist sicherlich, daß sich bei der Schätzung des reinen MA-Modells (M3) keine numerische Konvergenz eingestellt hat und im Verlauf der Iterationen die Modellkoeffizienten stärker schwanken. (Mohr (1980), S. 177. 29) Man vergleiche auch die empirische PAKF und IAKF für ein Beispiel bei Chatfield, Seite 372. 30) Es soll darauf hingewiesen werden, daß in Anlehnung an Box und Jenkins in der Literatur die AR- bzw. MA-Koeffizienten des Modells die partiellen meist negativ festgelegt werden und dann und 0 _

_

,

_

KJv

KK

Folgen sind. Wenn man wie hier die Modellkoeffizienten positiv ansetzt, muß man zwangsläufig - wie im zweiten Abschnitt geschehen - die Folgen und als partielle Folgen definieren, wenn die Konvention beibehalten werden soll. 31) Vergleiche Mohr (1979), S. 281 und Mohr (1980), S. 179.

214

Walter Mohr

Literatur

Anderson, T.W.: The Statistical Analysis of Time Series. Wiley, New York 1971. Bhansali, R.J.: Autoregressive and window estimates of the inverse correlation function. Biometrika 67_, 551-566 (1980). Box, G.E.P., Jenkins, G.M.: Time Series Analysis, Forecasting and Control, 2. Printing. Holden-Day, San Francisco 1971. Chatfield, C.: Inverse Autocorrelations. Journal of the Royal Statistical Society A, 142, 363-377 (1979). Cleveland, W.S.: The Inverse Autocorrelations of a Time Series and their Applications. Technometrics J_4, 277-298 (1972) . Durbin, J.: The Fitting of Time Series Models. Review of the International Statistical Institute 28, 233-243 (1960). Hipel, K.H., McLeod, A.I., Lennox W.C.: Advances in Box-Jenkins modeling, 1. Model Construction. Water Resources Research 13, 567-575 (1977). Hosking, J.R.M.: The asymptotic distribution of the sample inverse autocorrelations of an autoregressive-moving average process. Biometrika 67, 223-226 (1980). Kashyap, R.L., Ramachandra Rao, A.: Dynamic Stochastic Models from Empirical Data. Academic Press, New York 1976. Kitagawa, G.: On A Search Procedure For The Optimal AR-MA Order. Ann.Inst. Stat.Math. ¿9, 319-322 (1977). McLeod, A.I.: Derivation of the Theoretical Autocovariance Function of Autoregressive-Moving Average Time Series. Applied Statistics 24, 255-256 (1975). Mohr, W.: Univariate Autoregressive -Moving-Average Prozesse und die Anwendung der Box-Jenkins-Technik in der Zeitreihenanalyse. PhysicaVerlag, Würzburg 1976. Mohr, W.: Prognoseuntersuchung für die Zeitreihe der registrierten Arbeitslosen in der BRD. Statistische Hefte 20, 276-283 (1979). Mohr, W.: Grobidentifikation und Modellvergleich bei ARIMA-Modellen. Allgemeines Statistisches Archiv 6£, 164-183 (1980). Nerlove, M., Grether, D.M., Carvalho, J.L.: Analysis of Economic Time Series. Academic Press, New York 1979.

APL and the Teaching of Statistics Peter Naeve

1. INTRODUCTION Undoubtedly the statistical community has become aware of the great possibilities of modern computer facilities. This might be seen by the ever increasing list of books and articles dealing with such themes as 'statistical computing', by the growing number of conferences and symposia devoted to the impact of computer on statistics and by numereous computer programs ready made for statisticians' use. As there are already many papers on the special topic 'computers in the teaching of statistics' too (Evans (1973), Naeve (1978)) justification must be given why one more paper is added.

With respect to teaching all this books, articles and conferences do not tell the true story. One gets an impression of a world that is wishful thinking compared with every days teaching of statistics. This is especially true for the role of APL in that behalf. With respect to APL computer scientists and programmers are divided in two hostile parties. This attitude seems to have been carried over to statisticians without paying much attention if an argument that might be convincing from a computer scientist's point of view keeps its place when considered from a statistician's point of view. Or if there are other properties of APL which although of little value for a computer scientist make APL such a rich tool for a statistician.

The following lines do not intend to give an introduction to APL. The APL novice may take Gilman, Rose (1974) , Pakin (1972) or Polivka (1975) as an advice. They are meant as a list of arguments why one should use it.

216

Peter Naeve

2. APL AS A NOTATIONAL LANGUAGE IN STATISTICS

The richness of build in primitive functions and their wide scope allows for almost one to one mapping of formules written in conventional mathematical notation and executable 'programs'. To give one example if one had to evaluate

SS =

B'C(C'GC)~1C"B

this could be done in APL as follows

SS

CB

+.*(S(iSC)+.xG+.xC)+.xCB

(iS)C)+.xB

.

As Evans (1973) points out 'the programming details are minimized and one can get on to the statistical implications.

The feature of user defined functions eases the development of an even more statistical tailored notation. Direct function definition even if not yet always implemented is of special value in this respect. The following example is an modified one from Rosenkrands (1974) .

CORR

:

SSD

:

(SPD oi)iSQRT (SSD oi)°.*SSD oi 1 1 §SPD gj

SPD

(*~ l+ip,a)+.xa

P0L3

:

(x\"l+l,(p,a)paj)+.xc(

POLK

:

(Jl4>a

POLS

:

(us.*~1+ip,a)+.xa

P0L6

:

(x\0 "l + l,iS)((p,a),p,w)pilj)+.xa

P0L1

:

(,£d)[((p,aj),l)pip ,w]±4>a

C

denote the vector of coefficients and

X

the argument for which

the polynomial shall be evaluated then the result is given by calling

C POLi X

222

Peter Naeve

The last three functions are also capable of handling vector arguments i.e. the polynomial with coefficient vector x^. .

POLi,x= 1,2,5

C

is evaluated at several points

are adapted from Iverson (1972). APL and efficient

statistical computing are not contradictory. There is

always the tempta-

tion to turn APL down because it is an interpretative language and therefore a natural second best compared to compiler languages with their run time superiority.

But first of all we must be aware that this is a computer oriented type of argument. When a computer specialist's main concern lies in effective use of his computer with respective to optimization (minimization) of CPU-time and memory space this cannot necessarily be a statistician's point of view. It is easy to produce with highly efficient code very fast a lot of garbage. So a second thought will show that it is not the time of the computer but our time that matters as Dennis Evans emphazised during the discussion following the presentation of his paper (Evans (1980)) at COMSTAT 1980.

Statistics calls for insight in the process evoked by a statistical procedure i.e. the link between data and algorithm, the sequence of patterns which are build up and so on.And here APL is a useful tool as hopefully is shown by the foregoing arguments. So statisticians should hold their own interpretation of efficiency against that of the computer scientist when negotiations about implementing APL at the computer center are held. This does not mean that the other interpretation of efficiency shall be completely denied. As Kennedy and Gentle (1980) again and again point out: 'the purpose ... is to present selected computational methods, algorithms, and other subject matter which is important to the effective use of the given methods and algorithms in the computer'.

As might be seen in the before mentioned example by Abrams (1973) one might combine those questions of efficiency in the sense of Kennedy and Gentle with the aim to get deeper insight in the problem and the applied methods to reach

a really efficient (fast and clear) solution.

To write efficient programs in any high level language demands some knowledge of the structure and properties of the host computer and some insight

APL and the Teaching of Statistics

223

in how the compiler does his job. But there is nothing different with APL. As Sykes (1973) states the essential point is 'know your operators'. After some trial and error and some thought the results given in the following table concerning the time for functions POLi, i=5,6,7 on an IBM 5110 will become clear.

POL

O i 51 A>i20

O . 5+i 51

5

36

6

23.3

7

12

Fig. 2:

C«-. 5+1 51 X*-. 5+120

0126 X*-.5+150

31.4

31.7

41.1

19.4

18.6

24.9

9.6

8.8

12.9

X^\20

Computing time (in sec)

6. WHAT KIND OF STATISTICS SHOULD WE TEACH ?

It seems to be wise to stop for a moment in talking about the advantages of APL. The phrase 'the teaching of statistics1 rises two questions.

Firstly

:

Secondly :

What should be taught, i.e. what kind of statistics ? How could the goals be achieved, i.e. in our context : can APL help in reaching these goals ?

Although the answers to both questions cannot be given independently, it appears as if the enthusiasm over that powerful tool computer often made people skip the first question. As a result much effort and money

(computer

time, programming cost etc.) has been wasted on projects which rank low in an overall list of teaching goals and connected means.

The present author shares the point of view expressend by Tukey (1980). If one agrees that statistics is both confirmatory and exploratory this calls for a more subtile analysis of the role the computer may play in teaching statistics.

224

Peter Naeve

7. APL AND CONFIRMATORY DATA ANALYSIS

As Tukey puts it: 'confirmatory data analysis, especially as sanctification, is a routine relatively easy to teach and, hence a routine easy to computerize'. Confirmatory analysis uses a body of well established statistical methods and procedures. Tukey is not to be misunderstood. Implementing any such method on a computer as a dependable, efficient and useful computer program is not an easy task. Many things have to be considered among which the choice of the programming language is only one. APL is here just one out of several competitors.

In passing let us notice that the body of methods used in comfirmatory analysis has been widened tremendously through the existence of computers. Many more methods became computable with real data which had to be abondened before in the framework of pencil and paper calculations.

Confirmatory analysis cannot be done without understanding the methods which will be used. This obvious fact can be made more specific by breaking it into three parts.

Understanding the methods means

firstly

knowing how to interprete the results, knowing what kind of decisions and conclusions might be drawn from the result,

secondly

understanding the algorithm within the statistical procedure,

thirdly

knowing the way algorithm and data are linked together, understanding the dynamic properties of the procedure.

As has been shown elsewhere (Naeve (1978)) the computer can be used in the process of teaching statistics on three different levels - named black box, glass box and open box - which roughly correspond to the before mentioned aspects of understanding. Due to lack of space this point will not be elaborated here any further, the interested reader is asked to see the cited paper for more details.

At the moment there is some discussion whether one should rely on an inter-

225

APL and the Teaching of Statistics active or batch mode in using the computer (Klensin and Dawson (1980),

Cooper and Emerson (1980)). This truely is an important question and he who is free in the choice of his tools should spend considerable time in wheighing the pros and cons. But all the other users have to take their computer as it stands i.e. bound to batch or capable of interactive usage. But independently of the computer at hand all designers of statistical software have to answer the question as to where the statistician should be placed - inside the computer or in front of it. To make this question more transparent two examples of computer output are presented. The first is part of a terminal session with the regression analysis modul of the IBM product APL Statistical Library.

THE SIGNIFICANCE OF ABOVE F = 0.9787 THIS FIT IS NOT CORRECT AT THE SIGNIFICANCE LEVEL'. 0.95 TRY A HIGHER ORDER OR DIFFERENT MODEL The objections are twofold. Although for an experienced statistician there is nothing wrong with the first two lines things change when one imagines a statistical novice or nonexpert sitting at that terminal. Far too often he will not understand the information given in the first line and so blindly is caught by the message. THIS FIT IS NOT CORRECT AT THE SIGNIFICANCE LEVEL-. 0.95 As most users of this type he only knows two significance levels i.e. 0.95 or 0.99 and applies them without giving much thought to the meaning of the concept of significance level. A program should not enforce such an attitude. A prompt for the significance level at the start of the program - as in this case - is no cure. One might say with respect to understanding : garbage in, garbage out. The advice given in the third line seems to be a nice feature. But is this really so. In case there is no linear model at all which would fit to the data. Is this incorporated in

OR DIFFFRENT MODEL

or is the meaning of

226

Peter Naeve

these words: try another linear model i.e. take other independent variables into consideration ?

So either the advice might be misleading or is of the

kind: try something else.

This point or view may look a bit dogmatic. But one has to be somewhat strict in the beginning if one wants to implement statistical thinking in the

user's head and not in his computer.

The

following sample from the output produced by a similiar program are

more along the recommended line of approach. The program was designed by a student of mine. KONFIDENZINTERVALLE FUER DIE REGKESSIONSKOEFFIZIENTEN

Konfidenzintervall fur den Niveaukoeffizienten: P (362 .07H—T (1 -a/2; 8) x 13 . 906) = 1-ct Konfidenzintervall fur den Regressionskoeffizienten: P(53.314+-T(l-a/2;8)x3.627) = 1-a

8. APL AND EXPLORATORY ANALYSIS

Exploratory analysis is an ever growing body of helpful techniques. Although many of them can be applied on the pencil and paper level (perhaps with the need to use multi-color-pencils as Tukey (1977) would say)computer assistance might be welcomed. Several software products are available see Dawson, Klensin and Yntema (1980) or McNeil (1977) . As might be judged by the latter APL can contribute substantially in this respect.

But exploratory is more than just a bundle of useful techniques and procedures, it is an attitude as Tukey (1980) puts it. Exploratory analysis in the long run cannot be done with fixed tools and routine lines of approach. Creativity ranks high and should not be hindered by provided programs. Although at a first glance the non-routine data analytic applications use many of the standard procedures and data processing tools there is a distinguishing characteristic - the necessity to allow for human thought and intervention

(Heiberger (1979)).

227

APL and the Teaching of Statistics There are many reasons why APL is suited best for the requirements of

exploratory. Exploratory analysis calls for rethinking and reconstructing one's tools. As has been shown this task is eased with APL as the programming language. The workspace oriented concept of APL, the dinstinctions between global and local variables and the concept of a function quite easily offer the opportunity to enlarge the bundle of tools. The possibility to process in deskcalculator mode every global variable gives way to all kind of ad hoc manipulations with the data. The build-in function

$

(activate) allows for user intervention with software products. This might be seen by the following example.

Let us consider univariate Box Jenkins time series analysis. The time series data are kept in a variable

TDATA

. Usually the program would

offer (for instance using a menu technique) some options to transform the data. One might bet that even the most careful planed program will face a user who is not satisfied by the options provided. Here the activate feature may help. The following line of code is a first simple solution

TDATA -e $ 13

The user might for instance answer the prompt by typing

HxTDATA -

TDATA , 0

or whatever kind of transformation he wants. This line of approach surely can be widened to incorporate syntax checking (avoiding

$

ERROR) or

taking input via shared variables into consideration.

Last but not least if analysis is an attitude with creativity as a vital content then APL seems to be the right host language for it supports creativity.

9. LEARNING BY DOING

To reach an attitude is a very difficult task. One promissing approach

228

Peter Naeve

could be this. What has been said about computers and statistics can be combined in the following display.

An interactive statistical system is something like the synthesis of the arguments and facts given before. It is a statistician oriented

(speaking

and understanding his language) system which does not pin the user to think in terms of computer science. Instead of that it allows for human intervention. The idea of such a system has been brought up by Mustonen (1977, 1980). He too showed that it can be realized even on a desk computer. It is evident that the teaching of statistics would benifit if such a system is at hand for the teacher.

If such a system is not available so start and implement one. All what is needed is APL. If the students are allowed to participate in building moduls for such a statistical system one will find out that they learn much more than APL (best learned on the job) and certain statistical procedures. They develop step by step that kind of attitude Tukey calls for. This process benefits from the ease of rewriting programs (this means rethinking in the statistical field) as given by APL.

The interpretative nature of APL allows for merging the usually separated phases

of design, coding and checking (syntax, logic and acceptance).

Leading princibles in the design such as, the user is a statistician and not a computer specialist, the statistical decisions should be made in front and not inside the computer quite easily can be pushed down onto the different levels

of application during the process of program development.

APL and the Teaching of Statistics

229

This is something like a princible guided trial and error process. And students participating in that work will benefit for they get an ever growing understanding what statistics as an attitude really means. This is not wishful thinking. At the moment the author is working on such a statistical system to be implemented on a IBM 5110. APL as the reader may guess is the chosen programming language. Several students participate in this work. And they really demonstrate that the before mentioned effect is achieved. They all show a shift in their understanding about the nature of statistics in that direction Tukey wants us to teach.

References

Abrams Ph.S. (1973), Program writing, rewriting and style, APL Congress 73 Gerliziv P. ,Helms H.J., Nielsen J.. Cooper B.E., Emerson 1*.J. , (1980) Interactive or batch, Compstat 1980, proceedings in computational statistics, Ed. Barrit M.M.,Wishart D.. Dawson R., Klensin J.C., Yntema D.B. (1980), The consistent system, The American Statistician vol 34 p 169 f.. Evans D.A. (1973) , The influence of the computer on the teaching of statistics, J.R.Stat.Soc. A 136 p 135 f.. Evans D.A. (1980), APL 84 - an interactive, APL based, statistical computing package, Compstat 80 proceedings in computational statistics, Ed. Barrit M.M., Wishart D.. Gilman L., Rose A.J. (1974), APL - an interactive approach, 2.ed, J.Wiley, New York. Heiberger R.M. (1979), Software for statistical theory and practice, Technical report 42, Department of Statistics, The Wharton School, University of Pennsylvania. Iverson K.E. (1972), Algebra, an algorithmic treatment, Addison Wesley, Reading. Klensin J.C., Dawson R. (1980), Interactive computing versus computing in an interactive environment, Compstat 1980, proceedings in computational statistics, Ed. Barrit M.M., Wishart D.. McNeil D.R. (1977), Interactive data analysis, Wiley New York. Metzger R.C. (1980), Extended direct definition of APL functions, APL 80, Ed. v.d. Linden G.. Mustonen S. (1977), Survo 76, a statistical data processing system, Research report 6, Department of Statistics, University of Helsinki. Mustonen S. (1980), Interactive analysis in Survo 76, Compstat 1980, proceedings in computational statistics, Ed. Barrit M.M., Wishart D.. Naeve P. (1978), CAI and computational statistics, Compstat Lectures I, Ed. Skarabis H., Sint P.P.. Naeve P. (1979), Some aspects of the teaching of statistics, discussion paper 60, University of Bielefeld, Department of Economics. Pakin S. (1972), APL\360 Reference manual, Science research Associates. Polivka R.P. (1975), APL : the language and its usage, Prentice Hall Englewood Cliff.

230

Peter Naeve

Rosenkrands B. (1974), APL as a notational language in statistics, Compstat 1974, proceedings in computational statistics, Ed. Bruckmann G., Ferschl F., Schmetterer L.. Smillie K.W. (1974), APL\360 with statistical application, Addison Wesley, Reading. Snyder M. (1973), Interactive data analysis and nonparametric statistics, APL Congress 73, Ed. Gerl?iv P.,Helms H.J., Nielsen J.. Sykes R.A. (1973), Use and misuse of APL, efficient coding techniques, Scientific Time Sharing Corporation, Share XL. Tukey J.W. (1977), Exploratory data analysis, Addison Wesley, Reading. Tukey J.W. (1980), We need both exploratory and confirmatory, The American Statistician vol 34 p 23 f..

Optimale Schichtabgrenzung bei optimaler Aufteilung unter Annahme einer bivariaten Lognormalverteilung Karl-August Schaffer ZUSAMMENFASSUNG Optimale Schichtgrenzen bei optimaler Aufteilung des Stichprobenumfanges auf die Schichten werden unter der Voraussetzung berechnet, daß die gemeinsame Verteilung der untersuchten Variablen und der Schichtungsvariablen eine bivariate Lognormalverteilung ist. Die Ergebnisse werden verglichen mit den Grenzen, die optimal sind, falls die vollständige Erfassung einer Randschicht gefordert wird. Die Abhängigkeit der Schichtung von der Korrelation zwischen den Variablen, von der Zahl der Schichten und vom Gesamtauswahlsatz wird untersucht.

SUMMARY In the context of stratified random sampling, the estimation variable and the stratification variable are assumed to have a bivariate lognormal distribution. Optimal boundaries for strata are obtained in the case of optimal allocation. The results are compared with the boundaries, that are optimal, if complete evaluation of a marginal stratum is postulated. The dependence of stratification on the correlation between the variables, on the number of strata and on the overall sampling fraction is investigated.

1. EINLEITUNG Die Aufgabe, Schichten für das Ziehen einer Zufallsstichprobe vorteilhaft abzugrenzen, ist bereits 1950 von Dalenius als Optimalproblem formuliert worden. Die von ihm angegebenen Bedingungen für die Lösung des Problems, die Fehlervarianz einer Schätzfunktion zu minimieren, haben jeweils die Form eines Gleichungssystems. Diese Dalenius-Gleichungen

232

Karl-August Schäffer

- sind verhältnismäßig schwer lösbar, - geben zwar notwendige, aber nicht immer hinreichende Bedingungen für die Lösung des Problems und - gelten nur unter Annahmen, die in der Praxis höchstens ausnahmsweise erfüllt sind.

Die erste Schwierigkeit kann überwunden werden durch systematisches Probieren - vgl. Zindler (1956) und Strecker (1957) - oder durch Lösen eines einfacheren Ersatz-Gleichungssystems, das den Originalbedingungen unter vereinfachenden Annahmen über die Verteilung des betrachteten Merkmals äquivalent ist (vgl. Dalenius und Hodges (1957) und Ekman (1959)).

Das Aufkommen von leistungsfähigen Computern ab 1960 hat die Möglichkeit eröffnet, die Dalenius-Gleichungen ohne Näherungsannahmen zu lösen (vgl. z.B. Sehti (1963)). Die Untersuchungen von Schneeberger (1967) haben jedoch gezeigt, daß diese Gleichungssysteme in manchen Fällen mehrere Lösungen besitzen, die nicht notwendig dem globalen Minimum der Fehlervarianz entsprechen, sondern auch relative Minima sowie Sattelpunkte der Zielfunktion enthalten können.

Die numerische Lösung des Optimalproblems über die Dalenius-Gleichungen ist ein unnötiger Umweg. Er läßt sich unter Zuhilfenahme von Computern vermeiden, indem Minimalstellen der Zielfunktion unmittelbar mit einem Optimierungsverfahren berechnet werden. Dieser unmittelbare Weg ist ganz allgemein gangbar, führt allerdings nicht mit Sicherheit zum globalen Minimum. Das globale Minimum kann jedoch nach Bühler und Deutler (1974) in einigen Fällen sicher mit Hilfe der dynamischen Programmierung erreicht werden.

Die meisten Untersuchungen zur optimalen Schichtabgrenzung sind bisher (vgl. Deutler, 1976/77) durchweg von der vereinfachenden Annahme ausgegangen, die für die Abgrenzung der Schichten verwendete Variable (im folgenden kurz "Schichtungsvariable" genannt) sei identisch mit der Untersuchungsvariablen. Die vorliegende Arbeit verzichtet auf diese unrealistische Annahme .

Weil die

empirische Verteilung von vielen wirtschaftlich relevanten Größen

233

Optimale Schichtabgrenzung bei optimaler Aufteilung

überraschend gut durch eine zweiparametrische Lognormalverteilung

(vgl.

Aitchison und Brown (1957)) approximiert werden kann, wird im folgenden vorausgesetzt, daß die gemeinsame Verteilung der Schichtungsvariablen und der Untersuchungsvariablen eine bivariate Lognormalverteilung ist.

Unter dieser Voraussetzung werden optimale Schichtgrenzen bei optimaler Aufteilung des Stichprobenumfanges auf die Schichten berechnet. Die Ergebnisse werden mit den optimalen Grenzen verglichen, die sich bei Totalerfassung einer Randschicht und optimaler Aufteilung des restlichen Stichprobenumfanges auf die übrigen Schichten ergeben. Die Abhängigkeit der Schichtung

von der Korrelation zwischen den Variablen, von der Zahl der

Schichten und vom Gesamtauswahlsatz wird untersucht.

2.

PROBLEMSTELLUNG

2.1.

Theoretische Grundlagen

Die Schichtung einer Gesamtheit, dh. ihre Einteilung in paarweise disjunkte Teilgesamtheiten ("Schichten") und das Ziehen von unabhängigen Zufallsstichproben aus den Teilgesamtheiten, erlaubt, den Mittelwert eines Untersuchungsmerkmals in der Gesamtheit mit einer Fehlervarianz zur schätzen, die in der Regel wesentlich kleiner ist als die Fehlervarianz des arithmetischen Mittelwertes in einer uneingeschränkten Stichprobe vom gleichen Stichprobenumfang.

Die Verkleinerung der Fehlervarianz hängt von folgenden Planungsgrößen ab:

- von der Anzahl L der Schichten, - von der Abgrenzung der Schichten, - vom Gesamtumfang n der Stichprobe und - von der Aufteilung des Gesamtumfanges n auf die Schichten.

Bei endlicher Gesamtheit ist ferner die Entscheidung wesentlich, ob die Zufallsstichproben mit oder ohne Zurücklegen gezogen werden sollen. In

234

Karl-August Schäffer

dieser Arbeit wird vorausgesetzt, daß die betrachtete Gesamtheit endlich ist und die Auswahl stets ohne Zurücklegen vorgenommen wird.

Unter der Annahme, außer der Anzahl L der Schichten sei auch ihre Abgrenzung vorgegehen, sind die folgenden Größen für die h-te Schicht (h=l,.. ..,L) determiniert:

N, = Anzahl der Einheiten, h Vi^ = Mittelwert des Untersuchungsmerkmals Y, Oy^ = Varianz des Untersuchungsmerkmals Y. Daraus folgt für den Umfang N der Gesamtheit

N

=

I N. h=l h

für den Anteil P. der h-ten Schicht an der Gesamtheit h Ph

= Nh / N

und für den Mittelwert

des Untersuchungsmerkmals Y in der Gesamtheit

L U

Y

=

* h=l

P

h ' ^Yh

Der Mittelwert

"

kann bei geschichteter Auswahl geschätzt werden durch

die Funktion

^Y = =

* h=l

P

h •7h

(1)

'

in der Y^ den Stichprobenmittelwert des Untersuchungsmerkmals Y aus der h-ten Schicht bezeichnet

(h=l,...,L).

Vorgegeben sei ferner der Gesamtumfang n der Stichprobe und damit der Gesamtauswahlsatz

f

= n / N

(2)

235

Optimale Schichtabgrenzung bei optimaler Rufteilung

Jede Aufteilung des Gesamtumfanges auf die L Schichten ist darstellbar als L-Tupel (n^,..., n^,..., n^) von ganzen Zahlen n^, das die Bedingungen O < n < N, h — h

für h= 1, . . . , L

und L E n, = n h=l h erfüllt. Sofern

und

für alle h=l,...,L nicht wesentlich voneinander ab-

weichen, hat die Schätzfunktion V{u y } =

die Fehlervarianz

. L N - n - L • I -Ü 3L . a 2 N h=l n. h

.

(3)

Sie hängt, wie Formel (3) unmittelbar zeigt, wesentlich von der Aufteilung (n ,...,n ) des Stichprobenumfanges n ab. Unter den oben genannten 1 L Vorgaben wird das Minimum der Fehlervarianz erreicht mit der von Neymann (1934) und von Tschuprow (1923) angegebenen Aufteilung

"h

=

n

' VYh

/ /

'\ V h=l

Y

h

'

h=1

L

(4)

' *

sofern die so berechneten Stichprobenumfänge n^ die Nebenbedingungen n* £ N h

h= 1, . .. ,L

(5)

¥

erfüllen (die Einhaltung der ibrderung, daß n h ganze Zahlen sein sollen, ist bei größerem Stichprobenumfang praktisch nicht erheblich). Falls eine Nebenbedingung, z.B. für die L-te Schicht, verletzt ist, müssen die Umfänge der Stichprobe in den Schichten nach der folgenden Vorschrift bestimmt werden: /L-l * "h =

X
«r -•> í»N m £* fro fr freoo fr o^» ONC •£> 4) M

HN P O« r intoj r-O ofr fr -» -OOO lf> fr e « o mir o -f w fr ooo r «r o o o •o »•» -o fr o «o fr ^ ir> fr co o o oo oo

f» «9fr« .6 tr\ f» m r-o O •OCCDH É[ O f»

M« fr OO O OI IT «O

• « mf»oo> h- O O «»• Ni O 9 OTo ONO »O N oO O -C -C fro «o S o o M fSJ ÍSÍ

fr Ofr M oO o Oo

í f» fr

«» — o O tr c» ,« rr N^ eoe ^ fi 4>uj Û.3_J

o ofr œ> r-— o


LQ

L^ = 1-a. Furthermore lots with

are not acceptable to the consumer and should be rejected with at

least probability

= 1-8. Lieberman

and Resnikoff (1957) established

the following procedure of variables acceptance sampling. If the specification is an upper limit, the value

t

x + ka

if

a

known

x + ks

if

a

unknown,

(2)

of the test statistic

T

is compared with the specification limit

the specification is a lower limit with

L,

x - ka

or

x - ks

U. If

is compared

L, respectively.

On the basis of this comparison, each lot is either accepted (t U). Accordingly, a variables plan is specified by the parameters n(sample size) and k(acceptance constant). The desired

(n,k)-plan

has to fulfil the condition that the OC-curve of the plan will pass through the points defined by (AQL,1-a) and (LQ,B). 1-a

AQL

LQ

Fig. 1.: OC-curve of the

To compute the sample size analyse the distribution of

(n,k)-plan

n

and the acceptance constant T = x + ka

for a given fraction defective L(p) = P (T < U | p)

and

k, we have to

T = X + kS, respectively!

p. The OC-curve is given as (3)

Helmut Schneider and Peter-Theodor Wilrich

284

If the variance of For unknown variance

X

is known, the variable T

T

is normal distributed.

can be transformed to a variable with a non-

central Student distribution. Hence, taking into consideration the requirements L(AQL) = P(T £U|AQL) = 1-a

(4)

L(LQ)

(5)

= P(T £ u|LQ)

=

6

it is not difficult, in principle, to calculate the parameters (n,k), either exactly (Lieberman and Resnikoff 1957), or approximately (Wilrich 1970).

To obtain protection close to that given by the OC-curve of the variables plans discussed above, the distribution of the quality characteristic being inspected must be a normal distribution within the lot. All calculations used in constructing the plans have been based upon this assumption. Tests for verifying the assumption of normality may be used to ensure the practitioner that the application

of variables plans is justified. If

there seems to be a possibility that the product has been screened and hence has a truncated normal distribution, or if for any other reason a normal distribution can not be assumed, the user might want to know the deviation of the true OC-curve from that which holds under normality assumption. If the deviation is large, one might decide to apply sampling by attributes instead of variables plans.

2. ASYMPTOTIC OC-CURVE FOR KNOWN SKEWNESS AND EXCESS In most cases where the assumption of normality does not hold the true distribution is not known. But on occasion excess and skewness of the quality characteristic might be available at least approximately. This being the case the asymptotic OC-curve can be computed. Following Cramer (1964) the test statistic

T = X + kS

is asymptotically normal

285

Robustness of Sampling Plans

distributed with mean E[x + ks] = U + ka = u T

(6)

and variance 2

v[x + ks] = —

n

2

(1 + V

4

(6.-1) + k VF?) = oi

2

I

where y is the mean, a 2 the variance,

T

' the skewness, and 6^

(7)

excess

of the quality characteristic inspected. Thus the OC-curve is given as L(p) = P(X + kS £ Ulp) = p_ O

(X'X + kl)

Iterative shrunken estimator

BISE =

; n = o,

(O

—i—1

1, 2 , . . . )

x'y

s _< 1 ; n = 0)

(1 - (1 - s ) n + 1 ) ( X ' X ) _ 1

X'y

that

Define

On a Generalized Iteration Estimator

323

by considering the estimators from (ii) and (iii.) with a control parameter n

which is now allowed to attain all nonnegative integers. These

statistics have two control parameters, and converge to the ordinary LS-estimator for increasing

n

, if

k > 0

and

0 < s _ 0 , j = 1, . . . , p

we obtain the generalized ridge estimator

B k = (X'X + P'KP)" 1 X'y

(cf. Goldstein/Smith 1974), whereas

for

n > 0

we may introduce the

Iterative generalized ridge estimator (k_. ^ 0 ; n = 0, 1 , 2 , ... ) "TPRF

3

=

^

E i=o

(P'KP)

i

(X'X + P'KP)

—1—1

X'y

which has p+1 control parameters.

Let us now return to the more general case where

A = P'AP

. From

6 = A y = A XX + y n, A ni n to be a linear transform of 3 . Obviously,

condition (iv) of lemma 1 we can conclude that = A X3 n Bn ^

which shows

3 n,A

is a homogeneous linear estimator whose squared euclidean length

is given by 3 ' 3 , = n,A n,A Hence, by condition

3 , n, A

i'(AX)26 n (i) of lemma 1 we can see that with probability one

is shorter than

3

The next theorem is an easy consequence of the preceding

results. The

proof will be omitted since it requires only some knowledge about the characteristics of homogeneous linear estimator, see for example Trenkler (1981, p. 29).

.

324

Götz Trenkler

Theorem 4: Let

g n,A

be a member of

T

. Then

g „ n,A

has the followinq

characteristics

Expectation —

E(ß

) = A Xß n, A n P" (I - (I - A A ) n + 1 ) Pg

Bias

B(S

„) = (A X - I) g n, A n = -P 1

Quadratic bias

D(g

(X - AA) n + 1 Pg

2 ) = g' ( A X - I ) g n, A n 2n+2 i'P' (I - A A ) Pß (1 - 6 . X . ) 2 n + 2

I 3=1 where

Y

2

y = P(5

Covariance matrix

Cov(g

„) = a 2 A A n,A n n a 2 P' (I - (I - A A ) n + 1 ) 2 A

Total variance

V(g

1

P

„) = o 2 tr(Cov(g J) n,A n,A p

(i - d

-

y ^

1

)

2

b -l

Mean squared error —— — — —

G(g

n,A

) = V(ß

J n,A

P Z j=l Mean squared

M(g

+ D(ß

„) n,A (1 -(1 - 6.X.) n + 1 ) 2 2 DD A . D

2n+2~ M _ S.A.) * i i y. ,/ +. (1DD D

) = E | (ß A - g)(g - ß) ' n, A n, A n, A

error matrix = Cov (ß ) + B(ß ) B (g ) ' n,A n,A n,A = P'

a 2 (I - (I - A A ) n + 1 ) 2 A - 1 . .. n+1 , , , . . n+1 -i (I - AA) YY (I " AA) J P

On a Generalized Iteration Estimator

325

When evaluating the quality of an estimator it is customary to use the mean squared error which coincides with the total variance when the estimator under consideration is unbiased. But mean squared error is only one measure of goodness of an estimator

6

. Another is generalized mean

squared error given by G h (6) = E I" (6 - 8) • H(B - 6) ] where

H

is a nonnegative definite matrix. This measure allows for

weights on the components of

8

. However, there is an infinite number

of weight matrices making it rather difficult to select one. Fortunately, we have the following result due to Theobald (1974): If E |

- 8)(6^ - 8)' ]

M(BJ =

denotes the mean squared error matrix of any two

estimators

8. , j = 1, 2, then it holds that j positive definite matrices H if and only if

G

(8.) > G (B0) for all H _ 1 H^ ¿. MfB^ - M (82) i s

positive 'definite. As a consequence, it makes sense to compare two estimators by inspecting the difference of their mean squared error matrices. It also appears natural to judge the performance of an estimator 8

by a comparison with the widely used LS-estimator. Indeed, by choosing

the control parameters suitably an extensive subclass of

V

is now shown

to be better than the LS-estimator. Theorem 5: Suppose that