PhD thesis with early description of backpropagation algorithm for neural networks, also known as reverse mode automatic
341 48 36MB
English Pages 454 Year 1974
Table of contents :
PJWthesis_best_front_to_xi......Page 2
PJWthesis_best_xiui_to_7_original......Page 13
PJWthesis_IIp1_to_45_best_original......Page 28
PJWthesis_IIp46_to_99end_best_original......Page 73
PJWthesis_chIII_best_original......Page 127
PJWthesis_chIII_p44_to_end_a_fixbest_original......Page 170
PJWthesis_chIV_best_original......Page 179
PJWthesis_chVI_p1_to_70_best_original......Page 249
PJWthesis_chVI_p71_to_end_best_original......Page 319
PJWthesis_chV_best_original......Page 389
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/35657389
Beyond regression : new tools for prediction and analysis in the behavioral sciences / Article · January 1974 Source: OAI
CITATIONS
READS
2,279
11,747
2 authors, including:
Paul Werbos National Science Foundation 164 PUBLICATIONS 11,863 CITATIONS SEE PROFILE
Some of the authors of this publication are also working on these related projects:
New functional explanation of how brains work View project
Hard core artificial neural networks View project
All content following this page was uploaded by Paul Werbos on 22 June 2016. The user has requested enhancement of the downloaded file.
CI)
BEYONn REGRESSION:
NEW TOOLS FOR PREDICTION ANO ANALYSIS IN THE BEHAVIORAL SCIENCES
A thesis presented by
Paul John Werbos to
The Committee on Applied Mathematics
In partial fulfillment of the requirements for the degree of
Doctor of Philosophy In the subject of statistics
Harvard University
Cambridge, Massachusetts August, 1974
Copyright reserved by the author
(ii)
PREFACE The Initial Impetus for this thesis came from a
suggestion by Prof. Karl Oeutsch, my thesis supervisor, that I l ook more cl osel y at the prediction of national
assimil ation and pol itical mobil ization, by use of the DeutschSol ow model ; his co11111ents have been of major
hel p to me with al l the empirical work on national ism,
and In revising the original structures of Chapters (V) and (VI).
The earl ier work reported in section (ii) of
Chapter (VI) was carried out under his supervision and
support, through a research project funded by the Cambrl dge Project.
I have been surprlsed more than
once by the sensitivity and receptiveness of his
Intuition, in suggesting areas of research which l ooked
unworkabl e at first but which l ed In the end to useful
and surprising Innovations.
The opinions expressed In
Chapter (V), however, remain my own responslbll lty,
particul arl y In their more bul l headed aspects.
Professor Mostel l er, In the Harvard Statistics
Department, has hel ped by Introducing me to current
work on robust estimation, by suggesting the use of
simul ation tests, and by monitoring the general content
of the first four chapters.
Professors Anderson and
Bossert, of the Commit tee on Appl ied Mathematics, and
(iii)
Professor Demps ter, of the Harvard S ta tis tics
Depar tmen t, have hel ped me to find a more orderl y way
of presen ting the ma thema tical Ideas here, hel p which
was sorel y needed in the summer of 1973. To the
Cambridge Projec t, and i ts sponsors a t ARPA, mus t be
given thanks, no t onl y for al l the compu ter time used
in this research and in I ts documen ta tion, bu t al so for the oppor tuni ty to transl a te some of these ideas from theory in to opera tional sys tems wi thou t the ex tensive
del ays more common In such research; wi thou t their
suppor t, this research coul d never have been brough t to
a s ta tP. final enough to al l ow i ts con tinua tion.
The personal and financial condl tions which l ed to
the compl e tion of this thesis were ra ther compl ex, and
deb ts are owed to more individual s than shoul d properl y
be ci ted here. Cer tainl y
have a s trong deb t to my
paren ts In this respec t; a deb t to Richard Ney, whose
ideas on the s tock marke t financed a l arge par t of my
ac tivi ty In this period;
a moral deb t to Prof. Nazli
Choucrl, of M. I. T. , who, In a brief discussion in the
spring of 1973, encouraged me to con tinue this work;
deb t to a col l eague, Gopal Krishna, for hel ping me see
more cl earl y tha t the psychol ogical hurdl es I faced a t
firs t were no t to tal l y unique;
a
a deb t to Dr. Karreman,
of the Bockus Research Ins ti tu te, who, In one of those
( i V)
1960 1 s programs to encourage high school students, connected with the Moore School of Electronics at the Unlversl ty of Pennsylvania, got me started asking
questions about the phenomenon of Intelligence,
questions which led me to the dynamic feedback concept,
which was applied only later to formal statistics.
(v)
TABLE OF CONTENTS CHAPTER OR SECTION
PAGE
PREFACE.........................................
Cit)
TABLE OF CONTENTS................................ LIST OF FIGURES............................... LIST OF TABLES..................................
SYNOPSIS.......................................
(I) GENERAL INTRODUCTION AND SUMMARY.............
(v)
(vi11) (Ix)
(xii}
11
(II) DYNAMIC FEEDBACK, STATISTICAL ESTIMATION AND SYSTEMS OPTIMIZATION: THE GENERAL TECHNIQUES.....................
I11
Ct) INTRODUCTION...........................
111
(ii) ORDINARY REGRESSION...................
114
(II I) NONLINEAR REGRESSION AND DYNAMIC FEEDBACK....................
I115
(iv) MODELS WITH MEMORY...................
1128
NOISE, AND THE CONCEPT OF THE TRUTH OF A MODEL, IN STATISTICS.............
I136
(vi) ORDINARY REGRESSION AND THE MAXIMUM LIKELIHOOD APPROACH..........
I143
(v)
(vii) THE NEED FOR SOPHISTICATED
NOISE MODELS••••••••••••••••••••••••
I148
(viii) HOW TO ESTIMATE EXPLICIT SOPHISTICATED NOISE MODELS.•.•..••.
I160
(Ix) PATTERN ANALYSIS .....................
1165
OPTIMIZATION•.••••••••••••..•.••••••••
1172
THE METHOD OF "RELAXATION" WITH MEASUREMENTNOISEONLY MODELS••••••••
1178
(x)
(xi)
(Vi )
PAGE
CHAPTER OR SECTION (xii) THE ORDERED DERIVATIVE ANO DYNAMIC FEEDBACK.••.••••••••••.•••••
I182
APPENDIX: VARIATIONS ON THE STEEPEST ASCENT METHOD FOR EFFICIENT CONVERGENCE •••
I189
FOOTNOTES TO CHAPTER (II) •••••••••••.•••••
I194
Cl II) THE MULTIVARIATE ARMA (l,!) MODEL: ITS SIGNIFICANCE AND ITS ESTIMATION .•.•..
I 111
( I ) INTRODUCTION••••••••..••..•.••••••••••
I 111
(It) THE RECONSTRUCTION OF A WHITE NOISE MODEL FROM A VECTOR ARMA MODEL ...••.•
I I
(tit) THE ESTIMATION OF MULTIVARIATE ARMA PROCESSES ••.•••••••••.••••.••• (Iv)
DESCRIPTION OF COMPUTER ROUTINE TO ESTIMATE ARMA PROCESSES ••.•••••.•
APPENDIX: NUMERICAL EXAMPLES OF THE BEHAVIOR OF DIFFERENT CONVERGENCE
I I
19
119
III 32
PROCEDURES...............................
11143
FOOTNOTES TO CHAPTER (II 1) ...............
II149
(IV) SIMULATION STUDIES OF TECHNIQUES OF TIMESERIES ANALYSIS.......................
IV1
Ct) INTRODUCTION...........................
IV1
(t i ) DEFINITION OF STUDIES CARRI ED OUT...••
I V5
(tit) DESCRIPTION OF RESULTS..............
IV25
(VI
i)
PAGE
CHAPTER OR SECTION (V) GENERAL APPLICATIONS OF THESE IDEAS: PRACTICAL HAZARDS AND NEW POSSIBILITIES......
V1
(1) INTRODUCTION AND SUMMARY.. •.••.•••••••••
V1
(II) THE LIABILITIES OF MATHEMATICAL METHODS IN PRACTICAL DECISIONMAKING...........
V3
(111) PREDICTION: A COMMON GOAL FOR VERBAL AND MATHEMATICAL SOCIAL SCIENCE.......
V6
(Iv) POSSIBILITIES FOR STATISTICS AS AN EMPIRICAL TOOL IN REALWORLD PREDICTION
V23
(v) BEYOND NAIVE EMPIRICISM: ADAPTING OIJR IDEAS TO FILL THE GAP LEFT BY STATISTICS
V38
FOOTNOTES TO CHAPTER (v) .••.•••.••••••••.••
V49
(VI) NATIONALISM AND SOCIAL COMMUNICATIONS: A TEST CASE FOR MATHEMATICAL APPROACHES....
Vl 1
(I) INTRODUCTION AND SUMMARY...............
V 11
(II) INITIAL STUDIES OF THE DEUTSCHSOLOW MODEL..................
Vl 14
(II i) LATER STUDIES OF THE DEUTSCHSOLOW HODEL.................
Vl 49
NATIONALISM, CONFORMITY AND COMMUNICATIONS TERMS: AN EXTENSION OF THE DEUTSCH MODEL.................
Vl 85
Local vs. Regional LanguageDominance as a Dynamic Factor ••••••••••••.•••••
Vl 90
Some Operational Dimensions of Nationalism: Narcissism, Stereotyping and Aggression.......................
Vl 94
Toward More General Models...........
v1qg
(Iv)
(v) ASSIMILATION AND COMMUNICATION: THE CASE OF NORWAY...................
V 1107
FOOTNOTES TO CHAPTER (VI) ................
Vl 133
( viii )
LIST OF FIGURES Figure Number
Title
Page
V1: Pathways of Correlations With Nois y Data...
V18
Legend to Figures Vl1 Through Vl4........
Vl3
Vl1: RMS Average Errors In LongTerm Predictions of As s lrillated Populations , In Percentages
Vl4
Vl2: RMS Average Errors In LongTerm Predictions of Differentiated Populations , In Percentages ............................
Vl5
Vl3: RMS Average Errors In LongTerm Predictions of Mobilized Populations , In Percentages..
Vl6
Vl4: RMS Average Errors In LongTerm Predictions of Underlying Populations , In Percentages.
Vl7
( I X)
LIST OF TABLES Table Number
Page
Tlt 1 e
I11: Table of Operations for EQuatlon ( 2.11 ) •• I 12: Table of Operations for EQuatlons ( 2.14 ) ••••••••••••••••••••
1121
113 1, 32
113 : Hypothetical Example of "Ideal Types " ••••
I15 7
I14: Table of Operations for EQuatlons ( 2.17 ) •••••••••••.•.•••••
to
1161
I15: Table of Operations for EQuations ( 2.18 ) •••••••.•••••••••.•.
63
1174, 75
I111: Example of Convergence Res ults With Early Vers ion of the ARMA Es timation Routine. I1144 I I12: Sample Ses s ion With Convergence Res ults For Final Version of the ARMA II145 to 47 Es timation Routine............... IV1: Average coefficient es timates , and dispers i on errors of es t i mates , for the s i x es timation rout i nes and twelve s imulated process es defined In
section (it) •••••••••••••••••••••••••••••
IV3 4
IV2: Prediction Errors as Defined In Section ( t I ) .....•...••..•.........
IV3 5 to 40
IV3 : Estimates of Growth Factor, "c"•••.
IV41 to 46
IV4: Errors In Prediction & Mis cellany••
IV47 to 70
V I1 : Samp 1 e of Res u 1 ts fr om O F. LTA ••. •••.•.•.••
Vl15
Vl2: Regress ion Statis tics for Mobil i zed and Underlying Populations •••••••••••••••••••
Vl16
Vl3 : Regres s ion Statistics for As s imilated and Differentiated Populations •••••.•••••••••
Vl17
( X)
Table Number
TItle
Page
Vl4: LongTerm Prediction Errors with SERIES In Predicting Moblllzatlon•••••••••••••••
Vl18
Vl5: LongTerm Predictions Errors with SERIES In Predicting the Underlying Population••
Vl19
Vl6: LongTerm Prediction Errors wt th SERIES In Predicting the As s imilated Population.
Vl20
Vl7: LongTerm Prediction Errors with SERIES In Predicting the Differentiated
Population •.•••••••••••••••.•••••••...•.•
Vl8: RMS Average Errors of Predictions of Mobilized and Underlying Populations ,
Vl21
by EXTRAP ••••••••••••••••••••••••••••••••
Vl22
Vl9: RMS Average Errors of Predictions of Ass imilated and Differentiated Populations , by EXTRAP ••••.••••••..••••••
Vl23
Vl10: Indices of "M" ( Moblllzatlon ) and of "A" ( Ass imilation) From the Deuts chKravitz Data Us ed For Runs Reported In Tables Vl1 to Vl9 ••••••••••••••••••.••
Vl24
Vl11: Statis tics Concerning Regres s ion for Mobilized and Underlying Populations .•••
Vl50
Vl12: Statis tics Concerning Regress ion for As s imilated and Differentiated
Po pulation s•....•...••.••....•.•.•..•...
Vl51
Vl13: ARMA Models for Mobilization Proces s es ••
Vl52
Vl14: ARMA Models of the Ass lnilatlon Proces s .
Vl53
Vl15: RHS Averages o f Percentage Errors With LongTerm Predictions of Moblll7.ed and Underlying Populations , Bas ed on
Regression.•••••..•....••.......•..•..••
Vl16: RMS Averages of Percentage Errors With LongTerm Predictions of Mobilized and Underlying Populations , Bas ed on the
ARMA Method •••••••••••••.•••••••••••••••
Vl54
(XI )
Table Number
Title
Page
Vl17: RMS Averages of Percentage Errors WI th LongTerm Predictions of Mobilized and Underlying Populations , Bas ed on the Robus t Method ( GRR) •••••••••••••••••••••
Vl56
Vl18: RMS Averages of Percentage Errors With LongTerm Predictions of As s imilated and Differentiated Populations , Bas ed on Regress I on . . • . . • • • . • • . • . • • . . • • . . . • . . . . . •
Vl5 7
Vl19: RMS Averages of Percentage Errors With LongTerm Predictions of As s imilated and Differentiated Populations , Based on the AR��A Met hod • • • • • • • • • • • • • • • • • • • • • • . • • • • • •
Vl5 8
Vl20: RMS Averages of Percentage Errors With LongTerm Predictions of As s imilated and Differentiated Populations , Bas ed on the Robus t Method ( GRR) ••••••••••••••••••••.
Vl5 9
Vl21: Predictions of Future Mobilized and Underlying Populations , by the Robus t Method ( GRR) ••••••••••.•••.••••••.•••••••
Vl6 0
Vl22: Predictions of Future As s imilated and Differentiated Populations , by the 11 11 ext2 vers ion of the Robus t Method ( GRR)
Vl6 1
Vl23: Definition of Mobilization Variables and Spans of Years Used For Runs Oes criberl In Tables Vl11 through Vl22...••••.•••
Vl6?.
Vl24: Definition of As s imilation Variables and Spans of Years Us ed For Runs Des cribed In Tables Vl11 through Vl22...........
Vl6 3
Vl25 : Gravity Model Correlations .............
Vl125
(Xi f )
SYNOPS I S This thesis provides a broad, coherent exposition of a new mathematical approach to social studies and to
related fields.
This work began as an attempt to apply the
classical techniques of statistics and econometrics to
the DeutschSolow model of nationalism, In order to
turn this model Into a workable tool for predicting the
political future.
I n the course of this effort, It
became clear that the usual statistical methods do a
poor job in fitting dynamic models to realworld data,
If we Judge these models by their ability to make good
predictions across time.
I t also became clear that
newer and better methods would not be feasible,
economically, unless we could Invent less expensive
algorlthms, too.
Thus the goals of this thesis are
fivefold: (l) to describe new ways of fitting models to data;
( It) to define new algorithms which make
these methods feasible; (iii) to Introduce evidence for
the superiority of these methods, both for realworld and for simulated data;
( Iv) to discuss the
applications of these Ideas, in broad terms, to social and even biological sciences;
(v) to discuss the new
work on nationalism which has led us In these
Cxfil)
directions.
Let us begin with the first three goal s.
We have studied not one, but two, new approaches to fitting model s to data.
First, we general ized the
work by Box and Jenkins, on "ARMA" processes, on "mixed Autoftegresslve MovingAverage processes."
Chapter
( I I I) discusses the mathematics of this approach, In
detail.
It shows how an ordinary, multivariate
autoregressive process, observed by way of "noisy" data Cl .e. data measured wtth random measurement errors or
conceptual distortions), becomes a "vector ARMA
process. "
It then shows how to appl y "dynamic
feedback" to estimate the coefficients of such
processes, at much lower costs than were possible
previously;
the resulting computer program Is now
avail able to the public through the M IT Cambridge Project Consistent System.
In Chapter (V), we discuss
why we considered this approach Important for quantitative pol itical science.
In studtes of s Imu 1 a ted. data, the ARMA approach
general ly yiel ded only hal f as much error as regression
did, In estimating the coefficients of a simple model;
It was more efficient In making use of l imited data and It led to l ess systematic bias, both.
However, with
real worl d data, the ARMA approach did littl e better
( xi v)
than regression In making l ongterm predictions;
the
error dlstrtbutton curves for ARMA are onl y about 10%
smal l er than those for regression, and the curves are uniforml y cl ose to each other.
After reexamining these empirical resul ts and the
theory of maximum l ikel ihood Itsel f, we formul ated a
new, more radical and more successful approach to the fitting of model s.
I n essence, the idea ts to maximize
l ongterm predictive power directl y, over the known data, Instead of maximizing formal l tkel Ihood.
Formal l y, this Idea rests upon the aprlort expectation
that many social processes are governed by rel ativel y
deterministic underl ying trends, obscured both by
measurement noise and by transient deviations of great compl exity.
The qual ttattve, pol itical basts of this
idea ts discussed In Chapter (V).
Sections (vii) and
(xi) of Chapter ( I I) discuss the statistical basts.
The "measurementnoiseonl y" approach strongl y
outperformed both ARMA and regression, over both
real worl d and simul ated data. I t outperformed ARMA
most strongl y In our most compl ex simul ated processes, which seem most representative of the real worl d.
According to our error distribution r,raphs, the new
method cuts In hal f the errors In l ongterm predictions of real worl d variabl es;
the biggest reductions occur
(xv}
with those variables, such as national assimilation, and with those cases, near the middle of the
distributions, for which the simple models of the first half of Chapter (VI) can do an adequate job of
prediction.
These empirical results for our new approach came
from special computer programs, which exploited the simplicity of the models under study.
In Chapter (II),
we discuss how the algorithm of "dynamic feedback" can be used to estimate more general
"measurementnoiseonly" models, at minimal cost,
especially for models which are very Intricate,
nonlinear and nonMarkhovlan;
we also discuss how the
algorithm can ftt more conventional models, can
optimize policy, and can perform "pattern analysts"  a dynamic alternative to factor analysts.
"Dynamic
feedback" ts essentially a technique for calculating
derivatives inexpensively, for use with the classic
method of steepest descent. In section (iv) of Chapter
(111 ), and in section Ctt I) of Chapter (VI), we discuss
how our experience here with steepest descent has led
us to new ways of adjusting the "arbitrary convergence
we Ights" of steepest descent;
these methods speeded up
the process of convergence by a large factor.
The
Appendix to Chapter (I I) discusses extensions of these
C xv I )
methods , for the general, nonl i near cas e. From a pract i cal poi nt of v i ew, the appl i cat i ons of s uch mathemat i cal Ideas In the s oc i al s c i ences rema i n controvers i al. The extreme pos i t i ons of "behav i or i s m" and "trad i t i onal i sm" rema i n popular; d i v i s i ons s t i ll ex i s t between quant i tat i ve and verbal s tud i es of s oc i al behav i or.
In Chapter CV ) , we
des cr i be how our mathemat i cal tools m i ght f i t In, i n the broader context of s oc i al s tud i es and pol i t i cal dec i s i onmaki ng.
From a ut i l i tar i an and Bayes i an po i nt
of v i ew, we s ugges t a methodolog i cal approach Intermed i ate between "behav i or i sm" and "trad i t i onal i s m, " i n wh i ch the d i fferent frameworks m i ght be Integrated more clos ely w i th each other.
In
s ketch i ng out the pos s i b i l i t i es for s uch an Integrated framework, we als o po i nt out that the algorithms of Chapter (II ) , taken as part of "cybernet i cs, " have a d i rect value as parad i gms , to help us unders tand the requi rements of the complex Informat i onproces s i ng problems faced by human s oc i et i es and by human bra i ns . We als o ment i on poss i ble appl i cat i ons to other f i elds , Includi ng ecology. F i nally, Chapter (VI ) pres ents our emp i r i cal and analyt i c work on nat i onal i sm. In s ect i ons (t i ) and (t i t ) , we d i s cus s our s ucces s
( xv I I )
In making l ongterm predictions of national assimil ation and mobil ization, by use of the
DeutschSol ow model . Tabl e Vl8, for exampl e, gives the
average errors In predicting the
percentage
of
popul ation assimil ated, over time periods on the order of thir ty years; these errors are uniforml y distributed between 0% and 2%, except for four outl iers (20% of al l
cases) at 2.68%, 3. 0 8%, 3.09% and 6.21%.
The fail ures
of these predictions are al so Informative; they give us
a picture of those external factors which real l y do
have the power to divert the processes of assimil ation and mobll ization from a steady course. We have
tabul ated the predictions of the "robust" method for
the years 1980, 1990 and 2000;
these predictions are
subject to caveats discussed In section ( Ill). In section (Iv) , more compl ex model s of
national ism are synthesized, by drawing together Ideas
from the l iterature on this topic and Ideas from social psychol ogy.
The future possibil ities of these model s,
In verbal and quantitative anal ysts, are sketched ou t
briefl y.
These model s attained high l evel s of
"statistical significance, " and l ed to noticeabl e
Improvements In l ongterm prediction, in empirical
tests described In section (v); however, these tests,
based on cl assical estimation routines, are regarded as
(xviii)
prel iminary. The communicat i ons concepts of section ( Iv), as appl ied In section (v), al so yiel ded an
expl anation of one of the Inconsistencies observed with "gravity model s" In previous research; this expl anation was val idated empiricall y.
The MIT Cambridge Project has begun Impl ementing
the al gorithms of Chapter Cl I).
Page I 1
(I) GENERAL INTRODUCTION AND SUMMARY The original purpose of this research was to apply
the classical techniques of statistics and econometrics to the Deu tschSolow model of nationalism, In order to
turn th Is mode 1 In to a work a b 1 e too 1 f or pre d Ic t In g the
po 1 ItIca 1 fut u re •
I n t he course of thI s res earc h, It
became clearer and clearer that the usual statistical
m ethod s do a poor job In fitting d ynamic models to
rea 1wor 1d data, If we judge these mod e 1s hy the Ir
abll tty to make good predictions across time.
Furthermore, It became clear that newer and better
methods would not be feaslhle, economically, unless we could also develop new, less exnenslve algorithms.
T hus the goals of this thesis have heen fivefold:
(I) to descrthe new ways of fitting models to data; (II) to define the new algorl thms which make these
methods feaslhle; (Iii) to Introduce evidence for the superiority of the methods (see Table IV1, on Page
IV34, and the graphs which start on Page Vl3);
(Iv) to discuss the applications of these Ideas, In
broad terms, to social and even biological sciences;
(v) to discuss the new empirical work on national ism
which has led us In these directions.
Let us begin by discussing the first three gonls.
Page I2
Strictly speaking, we have s tudied not one, but two, new approaches to fitting models to data, In pol ltlcal s cience. The first approach was es sentially an extension of work by Box and Jenkins , on "ARMA"
proces ses , on "mixed Auto.Regres s ive MovingAverage
proces s es."
Chapter (111) dis cus s es the mathem atical
s tatis tics of this approach, In detail.
It begin s by
pointing out that an ordinary, m ultivariate a utoregres s ive proces s , obs erved by way of data which were not meas ured perfectly accurately (I.e. meas ured with random meas urement errors or conceptual dis tortions ), turns Into a "vector mixed autoregressive movingaverage proces s ."
It then proceeds to s how how
the algorithm of "dynamic feedback, " discuss ed In Chapter (II) , can be applied to es timate the coefficients of s uch a process , at a lower cos t than was pos sible with previous methods ;
the res ulting
procedure has been tes ted, and made available to the general us er, as part of the MIT r.amhridge Project Cons is tent Sys tem.
In Chapter CV) , we discus s why this
approach seemed Important to us, In quantitative po 1 It Ica 1 s c I ence . In studies of s imulated data, this approach did quite a bit better than the bes t form of regress ion. In Table IV1, on Page IV34, one can s ee that the
Page I3
average es t Imates ("av") produced by "arma, 11 for the coefficients of a s imple model, were much closer to the true values than were thos e of "reg, 11 In the twelve cases s tudied; this Implies much less sys tematic bias. Als o, the dispers ion of the
11
arma11 es timates was about
half as much as that of "reg, 11 on the whole;
this
Implies les s random error In es timation, or, In other words, greater practical efficiency In making use of 1 I m Ited d ata •
However, In s tud Ies of rea 1 wor 1 d data ,
the longterm predictions by this method were only s lightly better than thos e by regres sion;
for example,
In Figures Vl1 through Vl4, on Pages Vl4 through Vl7, one can s ee that the error distribution curve for "ARMA" Is only about 10% smaller In area than that for "Regres sion, " and that the curves are uniformly close to each other. After reexamining the empirical res ults of this research, and the concepts of maximum likelihood thems elves , we have arrived at a new, more radical and more succes s ful approach to the fitting of models .
In
es sence, the Idea is to maximize longterm predictive power directly, over the known data set, Ins tead of maximizing formal likelihood. Formally, this Idea res ts upon the aprlorl expectation that many s ocial proces s es are governed by relatively deterministic underlying
Page 14
trends, obs cured both by meas urement noise and by transient deviations of a very complex s ort.
The
qualitative, political basts of this Idea Is discus sed tn Chapter CV).
The statistical basis Is discuss ed In
sections (vii ) and (xi ) of Chapter (11) . The "meas urementnoiseonly" approach perf ormed much better than hoth AR�A and regres sion, on both realworld and s imulated data. In Tahle IV1, "ext" Is markedly s uperior to "arma " In es timating coef f icients; In the text of Chapter (IV ) , we note that this s uperiority is greates t for the s imul ated data generated by the more complex proces ses (11 and 12) , process es which may be more repres entative of the real world. In Figures V 11 through VI 4, the "meas urementnoiseonly" approach, des cribed as the "robust" approach, had much lower dis tributions of error than ARMA or Regres sion did, In longterm Prediction.
If one allows for the spread of the
vertical axis In thes e graphs , one can see that the "Rohus t" method cuts the longterm prediction errors In half, roughly;
the bigges t reductions occur with those
variables, s uch as national ass imilation, and with thos e cases , near the middle of the distributions, for which the simple models of the f irs t half of Chapter (VI) can do an adequate job of prediction.
Page I 5
The empirical results for the "robust" meth od were al 1 based on special computer programs, designed to take advantage of the simplicity of the models under s tudy.
In Chapter (I I), we discuss how the general
algorithm of "dynamic feedback" can be used to estimate s uch "measurementnoiseonly" models, at a minimal cost, especially for models which may be very Intricate, nonlinear and nonMarkhovtan;
we also
discuss how the algorithm can be used to fit more conventional models, to optimize policy, and to perform "pattern analysts"  a dynamic alternative to f actor analysis.
The technique of "dynamic feedback" ts
essentially a technique f or calculating derivatives Inexpensively, to be used with the classic method of s teepest descent. In section ( Iv) of Chapter ( 1 1 1) , and In section (t it ) of Chapter (VI) , we point out how practical experience with steepest descent In this context has led us to new ways of adjusting the "arbitrary convergence weights" of steepest descent; these methods appear to have the power, In normal, practical situations, to speed up the process of convergence hy a large f ac tor. In the Appendix to Chapter (II) , we mention a fPw generallzatltons of these methods, which may be helpful In the general, nonlinear case.
Page 1r;
From a practical point of view, the applications of thes e and other mathematical approaches in the social s ciences remain a s uhject of dispute. The extreme pos itions of "behaviorism" and "traditionalism" remain popular;
a division s til l tends to exis t
between quantitative and verbal s tudies of s ocial behavior.
In Chapter (V) , we des cribe the way that our
mathematical tools might fit in, in the broader context of s ocial studies and political decis ionmaking.
From
a utilitarian and Bayes ian point of view, we s ugges t a methodological approach intermediate between "behavioris m" and "traditional ism, 11 in which the different frameworks might he integrated more closely with each other.
In s ketching out what s uch an
integrated framework might l ook like, we als o point out that the algorithms of Chapter (II ) , taken as part of "cybernetics , " may have s ome direct value as paradigms , to help us try to understand the requirements of the complex informationproces s ing problems faced by human societies and by human brains . We als o mention the pos s ibility of applying these approaches to other fields , s uch as ecology . Finally, In Chapter (VI) , a few s ubs tantive conclusions emerge from our emntrical and analytic work on nationalis m.
The relative s uccess of our longterm
Page 17
prediction of national assimilation and mobilization,
as shown tn Table s Vl8 and Vl9 In section (ti) of Chapter (VI), ts of some substantive Interest;
note,
for example, that In predicting the percentage of
population assimilated, over periods of time on the order of thirty to forty years, that the errors are
uniformly distributed be twe en 0% and 2%, except for
four outliers (20% of all cases) at 2.68%, 3.08%', 3.09% and 6. 21%.
T he exact sources of weakness In these
predictions are also of Intere st, Insofar as the y give
us a picture of those external factors which really do
have the power to divert the processes of assimilation and mohtltzatl on from a steady course . In Tables Vl21
and Vl22, In section (tit) of Chapter (VI), we have
listed the predic tions of the "robust" method for the years 1980, 1990 and 2000;
the se pre dictions are
subject to caveats discussed In the text of that section.
Both In sections (ti) and (tit), all
predictions are based on the OeutschSolow model, with
minor modifications.
In section (Iv), more complex mode ls of national
assimilation and mobilization are synthesize d, by
drawing together Ideas from the literature on this topic and Ideas from social psychology.
The future
posst ht 1 tttes of the se mo de 1 s , In v erh a 1 an d
Page I 8
quantitative analys ts , are s ketched out briefly.
In
section (v) , a preliminary tes t of the models ts des cribed. The main methodological conclusion of Chapter (VI) Is that the available tools for timeseries analysis cannot cope adequately with the level of complexity represented by such models ; however, In the prellminary tes ts, the mode ls attained a high level of "s tatis tical s lgnlf lcance,
11
and did
have a noticeable value In Improving longterm prediction. The communications concepts of s ection (Iv), as appl led In s ection (v), als o had the splnof f of s ugges ting a rational ex planation of one of the Incons istencies observed with
11
gravlty mnde ls , 11 In
previous research; this explanation was val !dated em p 1 r Ica 1 1
y•
In concluding this Introduction, It would s eem appropriate to des cribe what might come out of this work, In the f uture. However, thes e pos slh 11lt 1es are dis cus s ed In enough detail In each of the separate chapters .
Still, It s hould be of general Interes t that
the programming of the general algorithm of Chapter (II) ls already underway at MIT, as part of the larges cale "DATATRAN" project on Multics, and Is scheduled to be available to the social scientis t In 1974.
Page II1
(II) DYNAMIC FEEDBACK, STATISTICAL ESTIMATION AND SYSTEMS OPTIMIZATIONs THE GENERAL TECHNIQUES ( i) INTRODUCTION In recent years, social scientists and ecologists have become
interested more and more in the use of mathematical models
to describe the dynamic laws of the systems they study. Karl Deutsch
and Robert Solow, for example, have proposed(l) the following model to predict the size of the assimilated population, A, and the
unassimilated population, U, in a bilingual or bicultural societys dA • dt
aA + bU
dU dt
cu,
(2.1)
where "a", ''b" and "c" may be treated as constants, at least for
medium lengths of time. The constant ''b" represents the rate of
assimilation, as a fraction of the people yet to be assimilated,
per unit of timer "a" and "c" represent the natural growth rates
of the assimilated and unassirnilated populations, respectively. Mathematical models may serve two general purposes in the
social sciences. On the one hand, they may be used as a tool in
verbal reasoning, as a technique for formulating one's assumptions and their consequences very clearly and very coherently;
they may be used to construct paradigms, which, like metaphors,
Page II2
may be very useful but which are not meant to be taken literally.
The "prisoner's dilemna" parad1gm(2) is a good example of such
a model. On the other hand, mathematical models may be used to make actual predictions of variables which can actually be measureds
economists, for example, have long been in the business of predicting the GNP, as a number, from the use of equations()) originated by
Keynes and Samuelson. These equations offer different predictions
for the GNP, depending on one's assumptions about government spending and tax ratesr thus they can be used, not merely in prediction, but
in helping the government to choose a policy for spending and taxation which will maximize the real GNP.
Our major concern in this thesis is with the second type of
model  predictive models, like the DeutschSolow model above.
Given such a model, the social scientist would want to ask three
questions, (1) how likely is it that the model is true, empirically? (11) how can we measure the values of the constants in the mod.el? J
{111) if the model is true, but if certain policymakers could change
some of the constants or even control some of the variables directly,
what should they do in order to get the '"best" results? The first two questions concern the problem of estimation, the core of classical
statistics. The third question falls roughly into the area now called
"control theory". All three questions can be answered, by use of
existing methods, but only for certain restricted classes of models.
Our main objective in this chapter is to present a more general
Page IIJ
method , to allow us to answer these questions for any explicit
model, of any complexity, at a minimal cost in terms of computer
time . More precisely, if a user specifies his model in terms of equations built up out of elementary operations and functions,
known to a standard computer package, then our method could give
this computer package the power te answer the three questions above, at a minimal cost. As more data become available in the social
sciences and in ecology , and as models are developed which reflect
the true complexity of the sacial systems themselves, the need for such a general method may grow greater and greater.
In this chapter, we plan to explain the dynamic feedback method ,
by building up examples of its most important applications , these
examples will grow in complexity until, in section (xii ), we present
the general algorithm explicitly. Thus we will start out in
section (111)
by showing how to reduce the cost of conventional
nonlinear estimation. In section (iv ) , we will show how the dynamic
feedback method can cope with simple models with "memory"r even
simple models of this type are difficult to handle by other methods. In sections (v) through (vii ), we will discuss the basic problem of
induction, as seen by the statistician. This material prepares us for
the discussion of more advanced applications in later sections and
also in Chapter (III ) . In particular, in section (vii), we will
propose a new, "robust" approach to estimation, which, � for
Page I I4
simple models , calls for the use of the dynami c feedback method J
later , in section (xi ) , we will specify exactly which "models with
memory " are used in this approach . and in Chapters ( IV) through ( YI )
w e will di scuss the evidence that this approach i s worthwhi le ,
In section (xiii ) , we will discuss the problem of estimation with
complex noise models , In se ction ( ix ) , we will discuss a radically
new con cept , "pattern analysis, " for dealing with situations where the nonlinearities and complexities of a process defy the use of
straightforward estimation 1 the applications of this concept would include problems now dealt with by factor analysis or by pattern
recognition techniques, Once a person has finished estimating a model , he may then wish to go on to use this model in formulating policy 1
in section ( x ) , we will show how the dynamic feedback method can be
u sed at that stage , too , to help one maximize the ut ility function of one ' s choice . Finally , in the Appendix, we will mention a few
technical procedures , which can help speed up the convergence of a computer routine based on the dynamic feedback method , ( i i ) ORDINARY REX;RESSION Let us begin by discussing the first two questions listed on
page II2 , from the viewpoint of classical maximum likelihood theory . How would we ascertain the "truth " of a model like the
DeutschSolow model , equations (2 , 1 ) , if we were given the values
Page II5
of "A" and ''U " every year for some nation, from 1901 to 19?3 ?
If we were given the values of the constants, a, b, and c, then we
could simply solve these equations, starting from the known values
of A and U in 1 9 01. In order to avoid having to solve a differential equation, we could rewrite the model in a simpler, but equivalent,
form 1
A(t+l ) • k A(t ) + �U(t) 1
U (t+1 ) • kJU(t),
where
(2.2 )
''U( t ) " means the value of U in the year t, and where
k1 , � and k are all constants. In either case , we could predict 3 A and U for 1902 through 19?3, by starting from our knowledge of
A and U in 1 9 01, and using our model. We could compare the predictions of the model against the observed data. And we would discover that
equations (2. 2 ) are simply false , as written , there would always be some difference between our predictions and the data, while the
equations (2. 2 ) do not allow for any such error . F.quations (2 . 2 ) are completely deterministic. This complaint may seem like quibbling,
but it is central to the classic concepts of statistics. In practice,
admittedly, one may be more interested in the predictive power of a
simplified model, rather than its formal statistical truths however,
in section (vii ), we will be able to discuss this possibility as an
extension of the more classical approach discussed here. At any rate, to construct a model which has some hope of being "true ", in the
Page II6
social sciences, we need to express the idea that there will be
a certain amount of unpredictable random noise in the system we are studying. Thus we might rewrite equations (2 .2) to get ,
(2 . Ja) U ( t+l ) • k U ( t ) + c(t ), J
(2. Jb)
where b (t) and c ( t) are random error terms, obeying , 1 p(b) ., J2'ftB p(c)
1
�c
e
) ½ ( £ C
2
(2 . 4 )
In other words, we do not know what b(t) and c(t) will be in advance s the probability that b(t) will equal some particular value, b, is
given by p(b) in the formula. Strictly speaking, since ''b" is a
continuous variable, p(b) is actually a probability density function s
one may think of it as the probability that b(t) lies between ''b",
and a nearby point, "b+db", divided by the size of the interval (4),
db, These functions for the probability of b and c are sirnpJy the
classic bellshaped curve, or "normal distribution, " The constants
in front of the "e" are there, te make sure that the probabilities of
the different values for b add up to one, when the formula is
integrated, The constants ''B" and "C", like the constants "k1 ", "� " and "k ", need to be specified before our model is complete . 3
Page II7
According to this elementary model, the probability of b (or c)
is highest when b (or c) is zero ; in other words, it is highest
when the exponent is zero, instead of a negative number. When b gets
to be a large number, positive or negative , in proportion to B,
the exponent gets to be a large negative number, and the probability falls off very quickly. It should be emphasized that this simple
model of noise , while standard, is far from the only possibility
in this case s in section (vii), we will mention a few other
possibilities,
Once we have decided to formulate such a simple model,
at least to start with, classical statistics can tell us exactly
how to measure its "likelihood of truth" for any combination of
the constants k1 , k2 and k • In sections ( v) and (vi), we will 3 discuss in more detail how it is possible for some statisticians to arrive at such strong statements s for the moment, however,
we will relegate the theoretical abstractions to a footnote( 5) ,
Even on a very concrete level, one can get a feeling for the power of the classical approach ,
Looking back at equations (2 , 3) , we may define ,
•1 ( t+1) " is simply the best prediction one could make for A(t+1),
Page II8
at time t, given our knowledge of A ( t ) and U (t ) , and given our
model. From equations (2 . 3 ) , we get s
b ( t ) • A (t+1 )  i( t+l )
(2 • .5 )
c (t ) • U ( t+l )  'O'( t+1 ) ,
Intuitively , one would expect that a model which gives us "good " predictions ,
1,
would be likelier to be true than
a model which gives u s "bad " predictions r one would expect that
bigger errors , b and c , would imply a lower probability that the model is true . Indeed, when we look back at equation s (2 .4 ) ,
the only probability fun ctions we have with this model ,
we c an see that larger values o f b and c would imply a lower probability . More exactly , as we look at these equations , we
can see that the probabilities of these errors really depend
upon (
b 2 and
B)
(Cc )2  i . e the size of the square of the error,
As part of our model , we assume that the errors at different times
are all independent of each other. Thus in order to combine all the different probabilities fer different times, t ,
into one overall probability , i t i s legitimate to multiply them
all together 1 thi s has the effect of telling us to add up a.11 the exponents, the square error terms,
(Bb ) 2
and
c 2
Cc) '
to get an overall measure of the probability of the model.
Therefore we can measure the total effective size of the errors in equation (2 . 3a ) by ,
11 ...
Z:c � t ) > b
"t
2

.l'age 11�
In order to pick the best values of k and k , in our model, 2 1 we do not have to account for the other part of the error, the c2 term, since our choice of k and � does not affect 1
equation (2 , Jb ) , Indeed , to pick the best values of k and � • 1 in equation (2 . Ja), we do not even have to worry about the value
of B , since B does not appear in that equation; thus we can simply try to minimize s
2 L • � (b(t ) >
(2, 6 )
Similarly , in equation ( 2 . Jb) , we can pick k by minimizing the J analogous function s L' •
y
(c(t ) )
2
Notice that we now seem to have two separate measures of truth, for ( 2 , Ja ) and ( 2 , Jb ) treated as independent equations ,
Formally speaking, we have found that the maximization of
li kelihood for the composite model, (2 . J ) and ( 2 .4),
can be decomposed into the maximization of likelihood for
(2 , Ja) and (2 . J b) as separate equations, attached to the
top and bottom equations of (2 . 4) , respectively.
This decomposition is due to the simplicity of the original models
it would not be valid for many more complex models. In section (vi) ,
we will present more details of this decomposition with ordinary
regression 1 in Chapter (III ) , however, we will focus on a cla.ss of standard statistical models , the "vector ARMA" models, for which
Page II10
such a decomposition is impossible , and for which
an equationbyequation estimation procedure cannot have statistical consistency.
Even in the simple case here , however , we have yet to specify
how to pick k 1 and � to minimize "L" , in equation ( 2 . 6 ) .
Let u s begin by substituting into (2 , 6) the value of b(t ) from
equation (2 . Ja) 1
L •
L
(A(t +1 )  k 1 A(t )  �U (t ) >
2
( 2. ? )
Our problem, again , i s to minimize L as a function of k 1 and k2 ,
while treating the measured data series, A(t ) and U(t ) , as fixed. From basic calculus , we know that a function has its minimum,
for variables k 1 and k2 , only at a point where its derivatives
with respect to k 1 and k both equal zero. In other words, if the 2 derivative of L with respect to k were not zero, but , say, +10, 1 this means that L will change whenever we change k1 , and that the change in L will equal 1 0 times the change in k 1 , roughly ,
for small changes in k 1 thus , if we change k by 1/1 00 , then 1 1
L would change by about 1/1 0, proving that it hadn't yet reached
a minimum at our original choice of k and k • Thus we can try to 2 1 find values for k and k such that the derivatives of L 2 1 with respect to both of these parameters will equal zero.
Page II11
Differentiating , we get 1
•
� 2 (A(t+1)  k A(t)  �U(t)) ( A(t)) 1
t
• 2 ( � (A(t+l)A(t)  k1 (A(t) ) t:
2
 �U(t) A(t)) ) ,
which we will try to set to zero. And we get a similar expression for
!� 1
putting them together, we get two algebraic equations ,
L t
A (t+l)A(t)
L A (t +l)U (t) #:
 kl L t = kl
2
L. U ( t)A(t) 'I:
+ �
2:' 1:'
U (t )A(t)
+ k }::(u ( t ) ) 2
2
t
We can calculate these sums by looking at our data; we can solve
these simple simultaneous equations for the variables k and � 1 exactly , by classical algebra, or by using programs available on
any computer. The procedure above is the procedure of classic
. multiple regression.
All of this reasoning , however , supposes that we decide to look
at a very simple model , like equation ( 2. Ja). It also assumes that
the "errors" , b and c, follow a normal distribution. There is nothing to stop us from using the same calculating procedure in cases where
Page II12
we do not expect the noise to be normal ; from a classical point of
view, this may still be equivalent to accepting the normal
distribution as part of one ' s "simplified model, " but the effects
of such a "simplification " are far from obvious apriori. ( 6 ).
What happens , however, if we move on to consider a more complex
model? What would happen if we decided to change equation (2. Ja )
itself? For example , in equation (2. Ja) , we assume that the rate of
assimilation, k , is constant in any given country. In reality, we 2 know that this is unreasonable. If the "unassimilated " outnumber the
"assimilated " by a large majority, they may feel very little pressure
at all to assimilates on the other hand, if they are a tiny isolated group, dependent on an economic world which is mostly "assimilated " , then their rate of assimilation is likely t o be higher than it
otherwise would be. There are other factors involved, but, holding
those factors constant , our model is likely to be "truer " and better
if it accounts somehow fer the power of percentage dominance.
How could we revise equation (2. 3a ) to express this kind of
effect ? First of all, we need to find some kind of measure of
"percentage dominance. " The simplest and most obvious measure is
simply the difference between the percentage of the population
which is assimilated and the percentage of the population which
is unassirnilated. In order to avoid having to multiply everything
by 1 00 , let us look instead at the difference be tween the fraction
Page II1 J
which is assimilated and the fraction which is not. The fraction
of the population assimilated equals , by definition , the ratio between the number of people assimilated , A(t ) , and the total
number of people , A(t)+u(t) 1 thus it equals A(t)/(A(t)+u(t ) ).
The fraction unassimilated equa.ls U { t)/A(t)+u(t)r the difference between the two equals ( A(t)U(t) )/(A(t)+U(t)). Somehow, we
wish to express the idea that an unassimilated person is more likely to assimilate if the "percentage dominance " of the assimilated
population is larger. If we recall that � was defined as the rate of assimilation per unassimilated population per unit of time, we may simply postulate that � • instead of being constant,
will be larger if "percentage dominance" , as defined above, is larger. For simplicity , we may consider the idea that � is directly
proportional to percentage dominances
(2 . 8 )
This time, � is assumed to be constant. While the actual
relation between � and percentage dominance is not likely to be
quite this simple, this equation still gives us some expression of
the important qualitative idea that there is a strong and consistent
positive connection between the two. To generate an explicit model of assimilation, we may substitute this equation back into (2. Ja ) s
(2 . 9 )
Page II14
The second term on the right is an "interaction term, " nonlinear
in A and U , A great deal of fuss has been made about this kind of
nonlinearity, with terminology such as "curvilinear regression", "polynomial regression", and even "spectral regression" often
used , (? ) , However, this kind of situation is fairly easy to deal with, We can solve for b {t ) , as before, to get , L �
L
(A(t+1 )  ki_A ( t )
't
As a function of k
1
(2 . 10 )
and � • this is really the same kind of
expression as (2 . 7 ) , with "U(t ) " replaced by a more complicated expression which we might call ''U'(t ) " z U'(t ) •
A ( t )U ( t � U(t) A(t )+u (t
The derivatives with respect to k
1
and ½ are the same, with a few
"prime " signs interjected, and we wind up with the same
algebraic equations to solve and almost the same sums to oaloulate ,
(We have to sum up {U'(t ) )
2
and U ' (t )A ( t ) instead of U(t)
2
and
U { t )A(t ) , ) In practice, one would normally begin by calculating
the variable ''U'" f'rom one's ex isting data, and injecting it into
a standard regression package to calculate the sums and solve
the equations , in a computer :package such as TSP, one could compute
U ' f'rom one's previous data. by use of the command "GENR"(generate ), and issue a regression command { OLSQ) with U' and A as independent
Page II15
variables. What is essential in this example is that we
continue to express A(t+1 ) as a linear combination of
other variables, which are defined as specific functions of
the available data..
( 111) NONLINEAR :RID;RESSION AND DYNAMIC FEEDBACK However, if we want to move on to more interesting models of
social phenomena, we will often find that we have to estimate
constants which do not simply multiply an expression we already
Jmow how to calculate , like U' (t) J we will find that there are
constants on the "inside" of the model . For example, in equation (2 . 8 )
we said that k2 is directly proportional to the dominance of A over U as a fraction of the total population. How do we know that it is a matter of direct proportionality? � is the rate of assimilation,
per unassimilated person per unit of time, as originally
defined in equation (2 , 3 ) . In equation (2 . 8 ) , if U is 25% population, then AU will equal ½ I if U is almost O, then A+u
of the AU A+u will
equal 1 . Thus we assume that the rate of assimilation will always be twice as much in the latter case, as compared with the former case.
But how do we Jmow it is only twice as much? It might be four times
as much. After all, the pressures on a tiny community, near 0%, may be
much, much larger than on a community near 25%, which may be large enough to protect its own members, and to give
them economic
Page II16
opportunities almost as great as they would find after assimilation. AU U ) 2 , Or ( AU ) J • So instead of i0u• we might have written ( A A+u A+u Even
without considering more complicated possibilities, it would be
interesting to try to measure just how strong these effects are that
we have been talking about , we may write , � (t) •
kz •
, ( A t u t ) k4 , A ( t +u ( t )
kz
where k4 , like is a constant we would li ke to estimate . To turn this into an explicit model of assimilation, we substitute in to ( 2 . 3 a) 1
k A t U t ( � ( � ) 4u ( t ) + b ( t ) ( A(t+1 ) • k1 A ( t ) + k! z A (t +u ( t
We can solve for b(t) , to get , L
D
L (b(t ) )
2
t
k A t U t ( ) ( ) ) 4U ( t ) k! ( A • ' ( A ( t+1 )  k1A ( t )  z ( t ) +u ( t )
L
( 2 , 11 )
)
2
( 2 . 12 )
When w e differentiate L , and try t o set the derivatives to zero , as before, we find a very unpleasant set of equations emerging ,
k 2 A t U t ) 4 ) ( ) ( k. � k! (t A(t) ) ( ( u(t) A ) + • i lz � A ( t )+u ( t ) k A ( t )U ( t ) 4 U { t )A ( t+1 ) • k1 �A ( t )U ( t ) ( A ) ( t )+u ( t ) A (t) A ( t +1 )
t
A t U t ¾ 2 + � � L (U (t ) ( A (t )+U t ) ) )
Pa.ge II17
(Note that we use the asterisk to indicate multiplication, as in
FORTRAN. ) To solve these three equations as functions of k1 , � and
k4 is not only a difficult exercise in algebra , it would appear to be impossible . There are many equations in algebra for which there simply
exist no "closed" solutions  no solutions which can be expressed in terms ef the ordinary "vocabulary" of mathematics. ( 8 ).
Thus, in order to devise computer routines to handle this
contingency, we must use routines which give numerical approximations
to the constants k1 , � and k4 J we must estimate k , � and k4 by a 1 numerical technique of successive approximations, rather than an exact solution. This is the classic problem of "nonlinear estimation. "
A similar problem can even arise when dealing with more sophisticated linear models.
There are two wellknown methods for dealing with the problem of
nonlinear regression. The simplest, and perhaps the best, is the method of .. steepest descent, .. (9 ) . When we try to maximize L,
as a function of k1 , � and k , we may not be able to solve for 4 lL � • O, � a ak! • 0 and a k • O. However , if we start off with k z 4 1 reasonable guesses for all of the constants, k , � and k4, then we 1 can differentiate (2. 1 2), plug in our guesses, and see if the
Page II 18
derivatives happen to equal zero; if so, chances are good that
we have guessed the best values, lWith classic regression, when we had two simple equations in two unlmowns,
lei and ½ ,
there would
only be one solution in the usual case, and there would always be a
minimum for L s therefore, we could be fairly certain that the
derivatives were zero only at the minimum, With very complex formulas, we simply have no wa:y of being sure about this. )
If the derivatives a.re not zero, then we can guess new values for
our constants, values which will make L smaller , If �� is positive,
1 a L then we can decrease L by decreasing k ; if 0 1 k is negative, then we 1
can decrease L by increasing k 1 • If � is close to zero, then �k 1 k 1 is probably close to its best value s if 1!:_ is far away from zero, o k1 then � is probably further off, Thus we can create a new guess,
k ( n+l), better than the old guess, k (n), by changing k in 1 1 1 proportion to C}k , but in the opposite direction a 1
� (n+l) • k 1 (n) 
c1� ,
1 where C is some positive constant, and where we calculate the
derivative by using our old guesses, k ( n), � (n) and ¾ (n). 1 We also calculate the derivative and then the new guess for � and k4, each. Once we have our new guesses for k l '
Itz and k4,
we can go baek to ( 2. 12), to see if we really have gotten a smaller value for L. If we have, then we can start a.gain from our new
guesses, to check the derivatives, etc. If not , , . then C must be
Page II19
too big. If C is small enough , the definition of the derivative
assures us that L can be predicted as well as we like by looking
at just the first derivative ; therefore , for some C small enough,
we know that our new guesses will have a smaller L than our old
guesses. If we find ourselves making C smaller and smaller , from
guess to guess , then we may eventually quit , when C is so small that
we aren ' t changing the constants very much. Hopefully , this will mean
that our approximations are very close to the ideal values.
In the Appendix to this chapter , we will suggest a few ways to ■peed up the convergence of this classical technique .
As we look back at equation (2. 12) , it is clear what our biggest
problem is in actually doing all this work s we have to calculate the derivatives of a very complicatedlooking expression, L,
and we have to calculate the exact numerical values of these
derivatives for many different values of the constants k , � and k4 • 1 Even worse, there is the question of who the "we " is who will do a.11
the work, in most cases. Classical regression can be done automatically
for the social scientist, at a low cost, by a computer program ; the
social scientist need only load in his data, and specify his choice of variables . Who is to do the differentiating here? The social
scientist? In BMD, one of the biggest computer packages for use in
social science and biology, there is only one "nonlinear regression"
routine , added into the XSupplement (10) available June 1972 ;
Page II20
this routine requires the social scientist to write his own
FORTRAN programs � for the function A(t+1) and for all its
derivatives. It is reasonable to ask a social scientist to
understand the logic behind a formula like (2 , 11); it seems
rather unreasonable to ask him to carry out elaborate
differentiations , and write and debug his own FORTRAN programs,
for every such model he chooses to investigate , Also , this approach
could become expensive in terms of computer time, too, depending on the user ' s ability to devise lowcost ways of calculating his
derivatives, There is a second possibility a the user could be asked to specify a FORTRAN program to caloulate A(t+1) , as a function
of A (t) , U(t) , k , k and k 1 the program would then go on to 1 4 2 calculate the derivatives numerically, by changing k a little bit , 1 and seeing what happens to A(t+1) , At each time, for each constant , the computer would have to carry out calculations as expensive as
calculating A(t+1) 1 with many constants, this could multiply the cost
manyfold, In the two nonlinear regression routines easily available
in Cambridge, besides BMD  TSPCSP(11) and Troll/1(12)  the social
scientist has a more convenient way to get his work done, In these two
systems , he need only specify his model in terms of a. "formula", like t
A (t+1) • k1 *A(t) + k2 *U( t) *( ((A(t)U(t) )/ ( A ( t)+u(t) )) **k4)
This is the same as (2 , 11) , but with FORTRAN conventions used to
make it possible to put everything on one line, and with error terms
Page II2 1
Variable Number ( Address )
Category
Major Source
Minor Source
12
12
sum
product
11
3
1
11
product
10
4
10
product
9
2
9
power
8
5
�U(t � +u ( t
8
ratio
7
6
A ( t )U ( t)
7
1
2
A( t ) +u ( t )
6
1
2
k4
5
parameter
4
parameter
3
para.meter
U(t)
2
given
A( t )
1
A ctual
Variable
13
A ( t+1 )
k A (t )
1
k'U ( t)
2
k f t �U f t � ) 4 (A A t +u t
(t) U ( t ) ( A (t )U ) A ( t ) +u ( t ) ) ( t } /4 A ( ( t U A ( t ) +u ( t ) A(t
A( t
k2
k1
k
4
difference sum
given
•
Table II1 1 Table of Operations for .Equation ( 2 . 1 1 )


Page II22
left implicit. ( Single asterisk means multiplication, double means raising to a power. ) This formula then gets translated by the
computer into a list of simple expressions. 'Ihis list of
expressions would normally look something like Table II1 ,
with the explanation column on the left removed , such a list is called a "Polish string. " The categories of operation allowed
on such a list depend on the arbitrary choices of the systems
programmers. In some systems, there are function names reserved
for usersupplied FORTRAN subroutines , in other systems, there are functions corresponding to model neurons , for use in statistical
pattern recognition , et cetera. It is already possible for a
computer to calculate the symbolic derivatives of a formula by
manipulating formulas which have been broken down like this a
however, that process becomes quite expensive, if we have many parameters to differentiate against.
'Ihe easiest way to calculate these derivatives is by a simple
use of dynamic feedback. Now we know that s L •
� ( b(t) ) 2 t
To calculate � , we need only calculate �( (b( t ) ) ) for each time, 1
1
2
and add up these derivatives over time. We want to know the effect
on the error , (b(t) ) , of changing, say, k , while we keep our data 1 2
Page II2J
(i. e. A(t ),A(t+l ) , U(t) ) constant, and while we keep the other
parameters (k , k4 ) constant. In Table II1, let us define the 2 "ordered derivative " of (b(t ) )
2
with respect to variable number i
to be the change we get in (b(t) )
2
in proportion to the change
in variable #i, � !!! hold all the previous variables constant. For � (b( t) ) , this definition doesn ' t give us anything new 1 1
2
the ordered derivative,
x
ok (b (t ) ) 2, is the same as the ordihary
2 partial derivative, � (btt ) ) • But for the other variables,
it gives us something new to calculate ,
Nows suppose we ask, in changing variable #? by a small amount,
what will the total effect be on (b(t ) ) ? Changing variable #7, 2
we will have a direct effect on only � variable, later in the system, variable #8 , (See Table II1 . ) Thus ifs
X., • Xr,
+
d,
where "d " is a small number, where
is variable #7, and where
x7
"X ' " is the value a:fter our changes. We will produce the following direct effect on later variables s X' • 8
.:z_ X' • 6
• X
s
d + x6
(X6 held constant. )
If we are calculating backwards from the top of the table, we +
2 d aiready know � ((b(t ) ) ) 1 we already know the ratio between 8
(b(t ) • ) (b(t ) ) 2 and 2
x8x8 •
Let us call that ratio, or derivative,
Page II24
11
s
8
2 b1
11
Thus we know, for small values of g , that if X =X8 +g , that 2 = b + s8g , Now we just found out, for small d, that if d d + d , that Xa = X s + X ; thus if we write g = X '
8
•
x?  Xr,
we know that this change in ( b(t ) ) 2 of
d . s8X 6
6
x8
.
6
will lead to a total change in
(Before, when we measured
would be held constant. However , when we va:ry
s8 ,
x7 ,
we assumed that
and hold the
Xr,
earlier variables constant, there is no way that this change can
a.:ffect anything later on , except by way of ss s  
7
x8. )
Thus we deduce s
x6
In more sophisticated language, this is an example of s
_r ( ( b ( t ) )2 )
�X
u 7
+
D
) 2 (q ( ( b ( t ) ) )
o e
f.
�f8
) "X " ? '
Now let us consider = fs ( X , X6 ) • r, 6 a more complicated example . In Table II1 , ¾ has a direct effect on where
r8
is the function
x8
three variables higher in the table 
x10 , Xr,
and x6 • When we start to
vary x2 , we have to account for the total effect of all three of the
changes it introduces directly on these other variables, Thus we get s
Of course,
x2
is simply U(t)r the reader , differentiating ( 2 . 12 )
with respect to U(t) , would also arrive at three terms, equal to
Page II2 5
the three terms here, but the work involved would be rather
tedious. To make it explicit how we begin this downwards calculation,
let me point out how to get
s 13 1
+ 2 � S13 • �(t l) ( { b ( t ) ) ) + + � • 1)f{ t+l) ( ( A( t+1) • 2(A(t+1 )  t(t+1) )
1( t+1 ) ) 2 )
One way to operationalize all this is to start from the top,
and, for every variable, look at all of its direct connections to variables higher up. An easier way, in practice is to pass gQ!.!l
from the top all the information to variables below them and also
directly connected to them 1 the effect is exactly the same, but the
order of computations is easier to deal with. We can start out,
in our example, by setting
s1 3 s13
as above. At
to
s 12
and to
s13,
s1 1 ,
s1
through s1 2 to zero, and plugging in
we note that we have a "sum"; thus we add to account for the direct effect of
x1 2
and of
on ( b t t ) ) • Then we are done; we go down to s • At s , we know 11 12 12 that all the later effects of x 2 have already been added into s 2 , 1 1 x
and that our value for
we encounter a product,
s1 2
has been completely calculated. At
x3x1 •
Thus we add
s1 2 x 3
to
s1 ,
and
s1 2 ,
s 12 x 1
to
s • We go down to s • We encounter another product. We add s x to 1 1 10 11 3 s4, and s1 1 x4 to s 1 0 • We go down to s10 • And so on. At the end, we really look only at s , s4 and s , the derivatives we wanted for the 3 5
Page II26
steepest descent method. The mathematical basis of these operations
is the theorem, for a set of ordered functional relations
·�txn
?, Xi

!:
d X df ___n. , � oX j dxi +
'
j: l + I a theorem to be discussed in section ( xii ) . For each line
( 2 . 1 3)
of the list, as we go down, we have only two calculations to perform at most, one for the "major source N and one for the "minor source" ,
thus the total number of calculations needed for each time, t ,
will equal only 2n x ' where nx is the tota.l number o f variables
on the list. The total cost will be 2nxT• across all times, to get all of the derivatives we want, regardless of how many parameters
there are. Notice that to go !!E. the list , starting from U(t) and
A(t), requires one calculation per line of the list , thus the total
cost, merely to compute all the �(t +1 ) for a given model, will equal nxT ' the same order of magnitude. I assumed, above, that we had
already carried out this latter calculation, so that the values of
the "X " were already known , given that we have to find L for each i guess, not just the derivatives of L, there is no extra cost in
first calculating the "Xi" and L.
In practice, one can imagine three ways that a systems
programmer might want to use the generalized form Gf the dynamic
feedback method. First of all, he might simply write a subroutine,
Pa.ge I I27
to do the calculations specified above directly , on the table of
operations fer some model, Second of all, he might write one
subroutine to look at one table of operations, and to specify the calculations required by the dynamic feedback method for this
table, in another table of operationsr he might write a second
subroutine to prune away all unnecessary and redundant operations from this table. The relative advantages of these methods would
depend heavily on the characteristics of the model being studied ,
on the number of time periods and calculations of derivatives to be
performed, and even on machine characteristics. Finally , one might imagine the possibility that the operations on a table like
Table II1 will someday be grouped into "strata , " groups of
operations that can be performed in parallel , on a computer
capable of parallel processing. On such a machine , one could perform the operations at a given set of "S1 II • in parallel , using the
same procedures as above , so long as none of the corresponding " X1 11
depend directly on each other as input sources J in short, one could
use any system of stratification which was adequate for calculating the
x1 •
This possibility is restricted , however, by the requirement
that several processors would have to be able to add something to the same machine word (S
1
for " 1 " on the next lower stratum ) , at the
same time, with the result that this word would be increased by the sum of all the numbers added.
Page II28
(iv ) MODELS WITH MEMORY Nows with a firm mathematical basis for these procedures,
equation (2. 13), we can extend tham still further , The models we have
discussed so far have all been rather conventional "Markhovian"
models , in other words, they give us a prediction of A(t+l ) as a function of A(t ) and U(t ) , We could add in A(t1) and U(t1 ) as
dependent variables, without changing much, because we would still
have a distinct table for every time "t" giving 1(t+1 ) a.s a function
of a manageable number of variables , Suppose, however, that we have
a model with "memory, " In economics, for example, there is a model
of consumer behavior which states that consumers spend money, not in proportion to their current income, but in proportion to the
permanent income ( 13) which they expect to average in their lifetimes 1 the model states that the perceived permanent income is adjusted slightly, from year to year, in response to actual income ,
Thus we get a models
C(t ) = k1 Yp( t ) + b(t)
Yp( t ) • (1� )Yp(t1) + � YA (t),
( 2 , 14 )
where "C" is consumption, "Y " is permanent income, "YA " is actual p
annual income, and ''b(t ) '' is an error term , Note that statistics will normally be available here for "C" and for "YA " , not for Y However, this is still what we would call an "explicit" or
p'
Page II29
"phenomenological" model. Given estimates for Y (t ), and data P for YA (t), the model tells us exactly how to calculate Y (t+1 ) p and how to predict C{t) . To calculate the Y (t ), for all times t, p and to make predictions for the C(t), we need to start off,
at time t•1, with some estimate of Y (O); this estimate we can treat p as an external constant ef the model, like k and � • to be estimated 1 by the statistician ( us). ( From Y ( O ) and Y (1), equation (2. 1 4b ) p A tells u s how to calculate Y (1 ) ; then we can calculate Y (2 ) from p p Y ( 1 ) and YA(2), then Y (J) from Y (2 ) and Y ( J ), etc. ) A p p p To minimize the sum of the errors squared, L, is much harder
in this case than with our complicatedlooking model in equation {2. 1 1). To calculate �
1
here, it is not enough to set up separate tables,
like Table II1, for each time t, and add up the
2 _a_ ( b( t ) ) . k
o l F.g_uation (2 . 14b) establishes a connection between the unknown
variables, Y , at all different times. However, we .£!!!_ set up a p large table to include all the different values of Yp ( t ) and C(t) across different timesr this will be like taking the separate tables
for each time t, tables like those implied by Table II1, and putting
them together into one large table . In this large table, we can show the relations that exist across time . Suppose that we have data
for "C" and "YA " from time 1 to time 4. We get a big table,
as shown in Table II2, on the next page.
With a given set of constants  Y ( O ), k and �  and with a p 1
Page IIJO
Actual Variable L
{b {4))
'3?
2
b (4)aC(4)k Y (�) 1 p C(4)
klYp (4)
Yp (4) ( see
Variable Number
2 . 14b)
� YA (4) Y (4) A
J6
35
Operation Category sum
product
difference
34
input
32
sum
33 31
30
product
4
23
product
� YA ( J ) YA (J)
14b)
(1k2)Yn( 2)
24
22
21
1

24
product
2.
32

product
input
25
Y ( 3) ( see p
33
2
k Yp (J) 1
26
34
JO
27
C(3)
35
product
b ( 3)•C(J)  k Y ( J) 1 p
28
35
2 8 .20. 1 2
29
29
2
36
Minor Source ( s)
31
( 1 �)Y (3) p ( b ( 3))
Major Source
product
difference input sum
input
product

27

27
26
25
24
1


23
21
22
2
16
4

Table II2 1 Table of Operations for F.quations ( 2 . 14). ( top section)

Page II.31
Actual Variable
Variable Number
Operation Category
(b(2 ) ) 2
20
C (2 )
18
input
Y p (2 )
16
sum
b ( 2 ) • C ( 2 )k 1 Y p ( 2 ) k 1Y p ( 2 )
19
17
product
difference product
15
product
yA (2 )
14
input
( 1k2 )Y p ( 1 )
13
product
12
product
k2 YA ( 2 )
(b( 1 ) ) 2
b ( 1 ) •C ( 1 ) k 1 Y p ( 1 ) C{l )
k 1 Yp ( l )
11
10 9
difference input
product
Major Source
Minor Source
19
19


18
17
16
1
15
13
14
2
12
4


11
11
10
9
8
1


Table II2 1 Table of Operations for :Equations ( 2 . 14 ) . (middle section )
Page IIJ2
Actua.l Variable Yp ( 1 )
k Y (1) 2 A
Y (1 ) A
( 1k2 )Yp ( O ) 1
y (0) �
k1 1
8
7 6
5 4
�
p
Variable Number
J 2
1 0
Major Source
Minor Source
7
5
product
6
2
product
J
4
0
2


Operation Category sum
input
difference
parameter
parameter parameter input

Table II2 1 Table of Operations for F.quations ( 2 . 14 ) . ( bottom section )

1'age
ll))
given set of data, we can calculate "forwards in time", or upwards
in this table, to calculate every one of the "actual variables, " including L, the total error. To calculate c)L ' we can calculate �kl backwards, just as we did before with Table II1, from the top of the table to the bottom. This time, however, it is easier to see where to start s
8
'.3?
ai.  �1 �  a½ 7
 1
2 This time, with L itself on top, instead of (blt)) ,
we get a simpler result at the end s
exactly the quantities we need to apply the steepest descent method . For each line of the table, except for the . top, there are only two
sources, or no sources J thus to go back from the top line to the
bottom line requires only two operations per line, at the most.
For a very large table, with nX lines for each time t, and with T periods of time, this amounts to n T lines, and 2n T operations X
X
in all, to get all the derivatives of L. Remember that to go up
Page II34
the table, to calculate L, we had to carry out one operation
per line  n T operations in all. No matter how complex the model, X
if the functional relations across time a.re explicit enough that they can be put into formulas which the computer can translate
into a table , like Table II2, then "dynamic feedback" can be used to calculate all the derivatives, in one pass.
As a practical matter, one may wonder just how explicit is
"explicit enough". In general, the procedure above allows us to
calculate the derivatives backwards down any ordered table of
operations, so long as the operations correspond to differentiable
functions. In order for us to use this method , then, the primary
requirement is that we be able to specify the model well enough
to construct such a table, This is the � requirement that applies
when we wish to use a model forwards in time, to make a prediction
of the future, without having to solve a complex set of nonlinear
algebraic equations in every time period , In general, in the existing computer packages (including FORTRAN compilers), any formula
expressed in the following form can be parsed into a table of operations
(a "Polish string " ) generating the variable Xi ( t ) from operations performed on the arguments 1
where "fi " is a function made up by nesting basic operations known to the computer package 1 for examples
X ( t) • W ( t1 )*Y(t1) + k + sin ( Z (t1 ) ) . i
Page II35
In order f�r a set of such formulas to be converted into a table
of operations, we need only find an ordering of the variables to be
computed, "X (t)", such that the arguments used in calculating X (t) i i
are calculated before X (t) itself is ; the table of operations to i calculate X (t) can simply be inserted on top of the table already i built up to calculate variables earlier in the causal ordering.
If the arguments of "fi " included only constants, parameters and
values of variables at "tn", for all f , with "n" always greater i than zero for endogenous variables, then this requirement would be satisfied automatically . otherwise, an ordering of the variables
X (t) would have to exist, with the later expressed as functions of i the earlier . Global things to be calculated, such as the sum of
&
utility function or a loss function over time, can always be inserted on top of the table of calculations, so long as we specify formulas
for calculating them as a function of sums across time or the like.
Indeed, even if one had a set of implicit equations, so that one had
to use algebraic solution methods instead of explicit calculation
in order to carry out a simple prediction of the future from given
para.meters, then one could easily calculate the matrix of partial
derivatives for those equations, to be used in conjunction with the
algebraic solutions generated for prediction, to allow one to carry out
dynamic feedback estimate ; however, simulations of this sort are both expensive, and outside the major realm of interest here .
Page II36
Parenthetically, one might note that there is a certain
difference between the operations needed to specify the generation of a random number, and the operations needed to calculate the
associated lo ss function. Estimation by dynamic feedback requires the specification of loss functions. In the case where the
unobserved random numbers are generated by a rather complex process, the translation between the two forms of specification may not be
easy. However, if the losses one is concerned with are the
discrepancies between the actual and predicted values of Imown
variables, the specification of an explicit loss function should
present no problem to the user of a computer package.
The
corresponding model would be suitable for predicting the future,
but may not be quite as suitable for stochastic simulations of
the future, in some eases. In such cases, however, the method of pattern analysis, to be discussed in section (ix), may help
reduce the distance between the two forms of specification.
(v) NOISE, AND THE CONCEPI' OF THE TRUTH OF A MODEL, IN STATISTICS Up until now, we have avoided one other aspect of statistical
estimation s the problem of noise models. In our old model, in
equations (2. J ) and (2.4), we assumed a simple equation to predict
A(t+1), and a simple bellshaped curve for the distribution of the
Page IIJ7
errors , b ( t ) . In the la.st thirty pages or so, we have considered
more and more complex models to predict things. However, we have stayed with the old idea of minimizing the square of the error,
an idea based upon the old bellshaped normal distribution.
We have begun in this way only with great reluctance, and only
for the sake of exposition. In fact, if we admit that most processes in human society and ecology do contain important elements of
randomness, then we must admit that equation (2 .4) is just as much a part of our original model as equation (2 . J ) . Equation (2 . 4 ) is
not an "assumption which must be proven true before we can use
classical techniques"; like equation ( 2 . J ) itself, it is part of a
simple , approximate model, to be evaluated for its predictive power.
Unfortunately, there has sometimes been a tendency in social science
and ecology to formulate ever more complicated models to predict
things, without an explicit model of the random element; the "errors "
are sometimes regarded as something unpleasant, that one faces up to
at the end of one's research, after one has formulated a model of
what is interesting.
Statisticians, on the other hand, have long since passed
the stage of ..minimizing least 'squares " or of "minimizing error" in general. We mentioned, earlier, the idea of measuring the
"probability of truth " of a model. We mentioned the problem of
how to estimate the probability of truth of a model, given
Page II38
a set of data observations. The traditional "maximum likelihood" school of statistics , as represented by Jeffreys and Carnap(14 ),
and the more recent Bayesian schools , both agree that this is
simply a problem in conditional probabilities, how do we estimate
the probability of the truth of a model, conditional upon our
having made a certain set of observations? Formally, the conditional probability of A given B, p(A( B ) , is defined to equal p(A and B )/p(B ) ; from this follows Bayes ' Law s P(A\ B )
a
P(B\A)P(A) P(B)
Statisticians have applied this law to deduce s p(model( data )
) ) = E{ data podel p ( model p(data )
(2 . 1 .5 )
This equation does not say that the "data" and the "model " have to be expressed in purely mathematical terms ; as a result , the equation
has led to enormous controversy both among statisticians and among
philosophers. It is a general equation telling us how to determine
the probability of truth of any sort of theory; thus, its relevance
to social and natural science goes well beyond the question of
statistical methods proper. The calculations on the right involve
two terms of real interest  p(data\model ) and p(model ) . The term
p(data) is the same for all models , and does not help us to evaluate the relative probabilities of truth of different models, except
perhaps indirectly(15 ). The term p(data\ model) represents what
Page II39
statisticians have traditionally focused on : how well the data "fit "
the model, defined as the probability that these particular data
would have been generated if the model were true. The term p(model ) refers to the probability of the model before any data have �
observed at all ; it is our apriori probability distribution, The philosopher Immanuel Kant long ago asserted that
0
empirical
induction " is impossible, without some system of apriori assumptions
(the "apriori synthetic " ) with real information content to them(16 );
the choice of "p (model ) " would constitute such a system of assumptions. More recent philosophers, such as Carnap and Jeffreys(17),
have tried to preserve the more popular attitudes of pure empiricism and positivism, by suggesting that p(model) should be "equal"
(apriori) for all different models . Thus p(datal model) would be the
only term left to consider, in measuring probabilities of truth.
Their suggestions have been carried over to the field of statistics,
where they are now orthodo x practice(18). This approach is normally
referred to as the "maximum likelihood approach, " In more recent years, however, many members of a new school of statisticians, the Bayesians,
have grown in their opposition to this orthodo x procedure. They have pointed out that "p ( model)•k", with the same "k" for all models,
is a very strong assumption, just as strong as any of the alternatives.
In most practical problems , the social scientist would have some
reason to expect some models to be likelier than others, even before
Page II40
he runs his statistical analysis , Thus they suggest that a user of statistical programs should be asked to specify his apriori
probability distribution as the first step of any statistical
analysis(1 9 ) 1 then the computer program can account for both
p(model) and p(data model), in picking out the model with the
highest probability of truth, From a broader perspective,
one might say that the Bayesians are proposing a procedure for
allowing the social scientist to account for two different kinds
of data  statistical data, and verbal data he has from other
sources, This still leaves open the question of where his initial
p( model ) should come from, a question which we can avoid in this context, (20 ) ,
The Bayesians may be right in principle , but in practice
the orthodox procedures may remain a sensible way to design
computer statistical packages. The social scientist, when he reads the output of a computer , would normally expect that this output
reflects only the ability of different models to fit the actual data;
in deciding what he finally believes about the world, he can � account for his verbal data, This does require that he understand
what "standard errors " mean, in ordinary regression, so that he can
get some idea of the variety of models consistent with the statistical
data , It also suggests that a direct printout of the relative
probabilities of truth of different models , over the given data,
Page II41
would be a u seful feature to have. In brief, it requires the
development of an intuition regarding the relation of mathematics
to social processes, an intuition strong enough to sustain the
balanced assessment of probabilities. This does place a burden on
the social scientist. On the other hand, the extreme Bayesian
alternative  to ask a social scientist to encode his intuition
into a few normal distributions, and to ask for a more complete faith in what cemes out of a computer  would seem to place a
much heavier burden on the social scientist. It would tend to deemphasize the learning experience which usually occurs at
the end of a statistical analysis, when the social scientist tries to relate all the things which crune out of the computer to what
he knows in the real world; if this experience is what develops
a balanced intuition in the first place, it should not be given
a diminished role. In Chapter (V), we will discuss in detail the
importance of this type of experience to the actual application of
statistical methods in the social sciences. Furthermore, the verbal
knowledge of a social scientist will not normally fit a simple
distribution. Even if it did, few "intuitive decisionmakers" can
express their intuition at all reasonably in terms of probability
distributions, even in simple cases, without extensive training
in that task. (21 ). While there is more that could be said on both sides
of this particular issue, the orthodox approach would seem quite adequate for the purposes of the present context.
Page II42
The dynamic feedback algorithm, which we are discussing in this
paper, can actually be applied to Bayesian estimation a$ easily
as to conventional estimation , In our examples and in our applications
we will follow the more orthodox procedures, However, we will refer
back, on occasion, to the concept of "prior probabilities" when this is appropriate ,
In concluding this section , we might note , for the sake of the
mathematician , that equation (2. 15 ) , when used in statistics, is
normally used to give the probability distributions of a continuous
family of models , rather than a discrete probability , Strictly speaking, such distributions
a.re
not even functionsr they
a.re
actually "measures", and they would normally be written as the
product of a function times an explicit measure , like "d9" , where "9" is a parameter of the model, In equation (2 , 15 ) , however,
the choices of measure used for the data cancel out; thus it is
not of intrinsic importance , so long as we are consistent in the
units we use to record the data, The choice of measure for the
model is one more aspect of the problem of specifying p(model ),
an aspect discussed by the sources referred to above ,
In general,
however, one would not expect the maximum likelihood choice of
parameters to be affected very strongly by the choice of measure,
unless the standard error for these parameters indicated a large
uncertainty and probably a low statistical significance in any case,
Paga II4)
(vi ) ORDINARY RECRESSION AND THE MAXIMU M LIKELIHOOD APPROACH Back in section (ii ), when we discussed how to measure
the "probability of truth" of the DeutschSolow model,
we glossed over the basic questions discussed in section (v). In section (ii ) , we found ourselves discussing two different
"measures" of the probability of truth , one for equation (2. Ja)
and another for equation (2. 3b) 1 all the rest of our discussion fo cused on equation (2. 3a ) , an equation to predict a single
variable, A(t+l). When discussing a single equation, to predict
a single variable from known data, it makes some kind of sense
simply to add up all the square errors ( b (t )) 2 across time, and use
the sum as a measure of how good the equation is. However, what do we do if there are two equations and two sets of errors?
How do we combine the two different error terms to measure the
validity of the model as a whole? With the Deutsch Solow model,
we could pick out the best values for k and k without answering 2 1 these questions, because the two equations were essentially
independent, and because we assumed that the two error terms,
"b" and "c" , each had their own probability distributions
independent of each other. ( Equation (2. 4 ). ) But we had to
avoid the question of how to measure the validity of the model
as a whole. Now , using the concepts of section (v) , we can come back
Page II44
to answer this question. Let us define the relative probability
of truth, P, of any model, as z
P • p(data l model),
the probability that we would have observed the data we have
observed, if the model were true. More precisely, this
"probability" is actually a probability density, as are the
other "probabilities " in this section. For simplicity,
let us assume, with the DeutschSolow model, that we only have
data for three years, 1958, 1959 and 1960, in one country.
Writing out the data explicitly, we are trying to measure s
p  p(A( 1960), U(1960),A(1959), U (1959),A(1958), u(1958) 1 model),
which, by classic probability theory, equals the product z
P = p ( A(1960),U ( 1 96 0), A(1959),U(1959),A( 1958), U(1958 ), model)
* * * *
p ( U(1960 ) 1 A( 1959),U(1959),A(1958),U(1958),model)
p (A(19 59 ) lu(1959),A(19 58 ), U (1958), model )
p (U {1959) I A ( 1958), U (1958 ), model) p (A(1958), U { 1 958)l model).
Now our model, equation (2. J), predicts A(1960) as a function of
A(1959) and U { 1959 ) , once these data are given, the other data. will · not affect the probability of A(1960) as given by the model.
Page II45
Similarly for U(1960 ), etc , 1 thus we can simplify our expression s P = p(A(1960 ) 1 A(1959), U(1959 ), model )
* * * *
p( U ( 1 960 ) (u (1 959) , model )
p (A( 1 959 ) lA ( 1958 ) , U(1958 ) , model) p( U(l959) l u(1 958), model )
p(A(1 958) , u (1 958)l model).
Once we are given values for A(1 959) and U(1959 ) , how do we
determine the probabilities of the possible values for A(1960) ? The � likely value of A(1960 ), according to equations ( 2. 3 )
and ( 2 , 4), is the one with b ( 1 960)0, i. e. A(1 9 60) = A(1 960 ),
which equals k A(1959 ) +� U(l959 ). But any value for A(1960 ) would be 1 consistent with equations (2. 3 ), for some value of b , Values of
A(1960) far away from A(1 9 60 ), however, would imply large values of b,
which, according to equations (1,. 4 ), are not as likely as small values of b. To determine the probability of any given A(1960 ), given
A(1 959 ) and U(1 959 ), we need only look at the probability of the value of b needed to generate the combination,
b(1 959 )=A(1960 )k A(1959 ) � U(1959 ) , Thus s 1
p(A(1960 ) 1A(1959 ), U(1959 ) , model) = p(b(1959 )f model)
= 
½( b( 1E59) 2 J
Page II46
And similarly for A(1959 ), 0(1 960 ) and 0 ( 1959). Thus we get, in summary 1
P = p ( b(1 959 ) \ model ) * p(b(1958 ) ,model ) * p(c(1 959 )\ model ) * p(c(1 958 ) \ model ) * p (A( 1 958 ) , U(1958 )\ model ),
which, by equation (2. 4 ), equals 1 _L e ( fflB
..¼( b( 1958 ) )2 __a,.(b( 1959 ) )2 B 12 B 1e 2 )( )( �C JzilB
1 * ( e
ffi\c
½< c(1B5s > >2
) * p ( A ( 1958 ), U(1958 ) , model)
What do we do about the last term, representing our earliest
data point, 1 958 ? The usual practice is simply to ignore the final
term on grounds that it is difficult to compute, and contributes
only one timepoint worth of information; for long series of data, the importance of one e xtra point of information grows very small. In the social sciences, the argument for eliminating this term
grows even stronger. This term, as usually interpreted(22 ),
requires us to compute the probability that we would have started off
at a data point equal to (A(1958),U(1 958 ) ) , if this initial data had
been generated by the DeutschSolow model operating for an infinite length of time before the start of the available data. Normally,
in the social sciences, one picks the start of one's data series for
one of two reasons 1 (i) one is trying to find a model to describe events in a given historical period, and one does not expect the
Page II47
mod.el to be valid before the start of one ' s data series 1
(11) the data are not available before a certain time , usually
implying that some aspects of the social system were different beforehand. furthermore , if one ' s model is not "stationary",
as few stationary processes are, then the usual procedure for computing this term breaks down in any case.
On this basis , we get a relative probability density ,  e
.me
½tc( 1t59 ) ) 2
1 e 
½ ( c( 1 f58) )2
12f\c
' )
The interesting part of this formula is the exponent, the part which
depends en b and c, In order to maximize p with respect to k , k2 1 and k4 , we try to bring the negative number in the exponent as close to zero as possible , This number is essentially the � of
the errors squared, exactly what we tried to maximize before .
Once we have done this, it is wellknown that we can maximize P
by picking B and C to equal the rootmeansquare average of b
and c, respectively,
Page I I48
(vii ) THE NEED FOR SOPHISTI CATED NOISE MODELS In general , there is little reason to believe that the classic
normal distribution , of equations (2 . 4 ) , will be a good model of
the noise element in all social :processes. Mo steller, for example , has pointed out that "fluke s" occur fairly often in real social
data. (23 ) . There may be many processes which normally plod along in a predictable sort of way , governed by a noise process b ( t )
which fits a normal distribution and which never gets to be very
large ; every once in a while, however, the process may be hit by
a fluke , which leads to changes much larger than one would have expected in the normal course of events. Suppose that
11
p
1
11
is the
probability , at any time , of getting a fluke, Then the probability
distribution for b ( t ) may actually fit thi s kind of equation s p (b ) • ( 1 p ) e 1 .,JznB 1
where B
1
2 _ !. 2 t E. ) \B
e
2 ½ (; >
1
i s much larger than B . This equation states that most of
the time  ( 1 p ) of the time , to be precise  b will fit the same 1 bellshaped curve as before ; however, when a "fluke " occurs ,
b will fit a much broader bellshaped curve , leading to much larger
values for b. One way to account for these effects i s to use this
probability formula explicitly , instead of the usual normal
distribution , in one ' s noise model s it may be impossible to
Page II49
e stimate "B i " accurately, but, if flukes are a serious problem,
it may still be possible to e stimate k and k2 more accurately , 1
and t o show when "P" i s larger for this kind o f model.
Another source of noise, rarely handled explicitly in social
science , is "measurement noise . " In our discussion above , we
talked about "A(1 959 ) " and "A(1960 ) " as if we had available exact
data for the true levels of assimilation in those years. We may have
data, but there is good reason to believe that errors of different
sorts occurred in the collection of this data . Even if the data were "perfect " , in the sense of giving us a perfect measure of who speaks
what language when, for example, they may still not be giving us a
perfect measure of the underlying concept of assimilation, as
governed by equations (2 , J ) . Let us define ''U( t ) " as the true size
of the unassimilated population, and ''U' ( t ) " as the measured size of
the unassimilated model. Then we might modify equation (2. Jb)
by writing :
U(t+l )
a
k U ( t ) + c(t ) 3
U ' ( t+l ) • U( t+1 ) + d( t+l ),
(2 . 16 )
where "c" is the noise going c,n in the process itself, and where
"d" is the measurement noise , The "process noise, " "c ", is a random
factor in the actual process ( top equation ) which determines the
objective evolution of the real variable we are interested in, U,
Page II50
through time. The "measurement noise ", "d 0 , does not affect the
objective reality, U, but only adds a factor of distortion to our
measurement of U, our U ' ; U'U, the difference between the measured value of U and the true value of Even i f
u•
u,
equals the measurement error, d .
did represent an objective variable in its own right, but
a variable different from the one we really postulate to govern the
dynamics, then this mathematical structure would still apply,
Given that we do not know the true value of U(t) at any times t,
this model is not an "explicit " model ; it does not tell us directly
how to estimate U'(t+1 ) from earlier data available. (Note that the
noise term, "c(t ) ", makes it impossible to calculate later values of
U(t ) from an estimate of U(O). ) However, Box and Jenkins(24 ) have
shown that this model is equivalent to the explicit model , U ' (t+1 ) = 1JU'(t ) + f(t+1 )  k4f(t ) ,
where "f" is a noise process fitting a normal distribution.
This model has a kind of "memory term " in it, k4f(t), and may be
estimated by use of the dynamic feedback method, as discussed in
section (iv) . In Chapter (III ) , we will describe how this method can be specialized to deal with models of this general sort, with
any number of dependent variables. Economists, like Cochrane and Orcutt, have developed techniques to deal with some of the
secondary consequences of measurement noise, like the problem of
Page II51
serial correlation; however, in their original article {2 5 ), these
authors have made it quite clear that the general problem of measurement noise is beyond the scope of their techniques ,
This idea can be taken even further, if we wipe out the
term for "process noise, " and allow for the possibility of
measurement ncise only. In equation (2 , 16), this would mean
eliminating the term c(t), while retaining d(t+l ) and the other
terms of the model; this would make (2 . 1 6) an "explicit" model,
with memory U(t) , similar to our example of section ( iv).
At first glance, this procedure sounds both unrealistic, and totally inferior to the procedures of the paragraph above, Process noise
does exist in most social and ecological processes ; as long as we can account for process noise and measurement noise both, why should we
limit ourselves to the second possibility only?
Let us begin by seeing what this process really entails .
If we assume that there is no process noise at all, then we can
start out from our initial estimates (or data) for our variables,
and solve our equations exactlz to yield a stream of predictions for later data, right up to the end of our data set , these
predictions account onlx for the data in the initial timeperiod. Notice that this is exactly what we were thinking of doing,
early in section {ii), before we introduced the more "sophisticated "
concept of ordinary regression , Also, note that this is what
Jay Forrester ' s techniques(26) for "dynamic systems analysis "
Page II52
tend to involve, even though he prefers to judge the final fit of
his models by eye rather than by computer. Above all, note that
the practical value of social and ecological models usually lies in their ability to predict the situation at distant times, without
requiring knowledge of intervening times in the future . (This includes, of course, the ability to predict the results of different policies, )
U sing our new procedure, we evaluate models by their ability to
yield good predictions across long periods of time, not by their
ability to predict across the smallest possible period of times
therefore, we will generate models and coefficients better suited to
the practical demands which will be placed on them. In order to estimate such models, we will have to resort to the dynamic feedback
techniques of section (iv) above , The prior unavailability of such
techniques offers at least some justification for the apparent
disregard of empirical data in some of Forrester's more interesting
work{27 ) ; however, if these estimation routines should become
available soon at the Cambridge Project Consistent System, a more empirical analysis of these issues will become possi ble.
In short, the practical reasons for disregarding process noise
can be very strong, at times. From a theoretical point of view,
however, the reasons are not so obviou s. If we start with a given statistical process, governed by a given set of equations, then
those equations themselves are the best possible basis for prediction,
Page
II53
whether across one period of time or many . Thus "truth " implies
predictive power. When we have a given set of data , the maximum likelihood method allows us to make use of all the information
available  not just one measure , like longterm predictive power over the given data  to find the parameters and model closest to being "true . "
The difficulty in this argument i s that "closeness to truth , "
unli ke "truth " itself, can be measured in many different ways.
The Bayesian school of thought has begun to argue that this point ,
too , should be accounted for in practical statistical routines (28 ) r
however , their concepts of "loss fun ction " do not fully encompass the con cept of "longterm predictive power " here . As a practical
matter , most models in the social sciences and in ecology are
simplified , approximate models , which we do not expect to be "true " in any absolute sense ; we only expect tharn to approach truth , or,
more reali stically , we expect them to give us predictions similar
in a broad way to what we would predict if we knew the full truth ,
Even when these models contain a hundred variables or so , they will still be hundreds of times simpler than the complete systems which
they represent . If there were an infinite quantity of representative
data available , and if we had to choose between a limited set of
models, � of which � "true" in !!l absolute sen se , then the model
which performs best , on , say , predicting across ten years of time ,
Page II_54
over this data, can also be expected to give us the best predictions
of the future ten years hence. In order to estimate such a model ,
we should indeed try to minimize the errors across ten years , in stead of following the conventional likelihood approach.
In other words , if we wish to carry out an estimation which is
"robust "  an estimation which will give us good predictions despite
the oversimplifications of one ' s model  then a direct maximization
of predictive power is appropriate ; that is exactly what our "measurement noise only " approach entails .
In reality , we will have to accept limits both upon our choice of
models and upon the size of our data . We will have two sorts of
information to use in evaluating the predi ctive powers of our models , ( i ) the longterm predi ctive power !.§. measured directly over the
available data J (ii ) general information about the "truth " of our
model, as given by the maximum likelihood formulas . The first
information is a direct measure of what we want to know . On the other
hand , we only have a certain limited amount of this kind of
information , in our data . The second information does tell us
something about how close our model is to "truth " , which in turn
tells us something about predictive power. When our total information
is limited , statistical theory recommends that we make use of all the
information at our disposal , including both the "hard " and the "soft . " Our problem, then , is one of a more practical nature s
Page
II55
which of the two sources of information should we emphasize , when we want to build a model suitable for mediumterm and longterm prediction?
General guidelines for dealing with this problem will have to
come from experience , experience with both ordinary procedures
and with the new procedure suggested here. It should be clear, however , that the relative importance of process noi se and
measurement noise will vary from case to case . A direct comparison of the methods , say, in predicting the second half of one ' s data
from the first half, would probably be desirable , in most cases.
When the major flaw in one ' s exi sting model lies in its inability to describe measurement noi se accurately , then one would suspect the possibility that the unexplained portion of that measurement
noise would be organized enough to be partly "predi ctable " from
one ' s process variables and noi se , this would lead to di stortion of the parameters of the proce ss proper . For models which have
thi s problem , the best way to improve predictive power may be to
avoid this di stortion , by making sure that measurement noise i s not
falsely attributed to the process equation s (i. e. to process noise ) ; by falling back on a "measurement noise only " model , in which process noise does not exist at all , one can eliminate this
distortion entirely. Once again , as noted o.n the previous page •
the "measurement noise only " approach involves no distortion at all ,
Page I I56
insofar as it maximizes longterm predictive power , directly ;
its weakness lies in the lack of formal statistical efficiency .
When process noise is very large , and the neglect of process noise
would appear to seriously weaken one ' s ability to make full use of on e ' s data , then it would still be possible to compromise ,
by the relaxation methods to be discussed in section ( xi ) ;
these methods , by allowing process noise , but by making it
much more "expensive " to attribute randomness to process noise than to measurement noise , may reduce the false attribution
of the latter to the former , while preserving an adequate level of statistical efficiency. It is conceivable that in social science,
as in hard scien ce , there will someday be a viable distinction
between "practical " statistical work , where prediction is most
important , and "theoretical " statistical work, where "truth " as such
turn s out to be a more effective guide to finding new variables and
terms to use in one ' s models. However, once again , the practical
values of these techniques will have to emerge from empirical work. The empirical work of this thesis, in Chapters ( IV) and ( VI ) , does
provide a strong indication that the "measurement noise only "
approach i s superior to the pure maximum likelihood approach, in the social sciences ; this indication has been strong enough to totally
reverse our own initial bias in favor of the classical approach .
Still , the empirical studies here are only the beginning of a long
process.
Page I I.5 7
Finally , let us con sider one other situation where the
conventional approach to noise is inadequate 1 the case of
"ideal types . " Very often , in social science , we run across
variables like "Republican President " and "Democratic President "
which do not tend to va:ry across a continuous spectrum , they
tend to be simply "true " or "false . " The error in predicting
such variables would not follow a normal distribution , but the
problem n eed not be overwhelmingly difficult . On the other hand ,
we often find societies falling into certain distinct
"ideal types " (29 ) , such as "traditional " , "developed " and
"tran sitional " J we may find that a whole collection of other
variables  political stability ( JO ) , aggressiveness , economic
growth , etc .  depend heavily on which ideal type a society falls into . As an extreme example , let us imagine that there are three "ideal types " a nation might fall into , and that we have been
studying four so cial variable s which are all really determined
by the current "ideal type " 1 Variable 1
Variable 2
Variable 3
Variable 4
Type 1
Type 2
Type 3
1
0
1
0
1
1
0
1 0
0
1
0
Table II3 : Hypothetical Example of "Ideal Types " .
Page I I58
If we predict that variable one will equal one , and discover that
it actually equals zero , then thi s last piece of information tells us
exactly what to expect for variables two , three and four. If we
already made predictions for variables two , three and four , then
we would also know now exactly what errors to expect in these
predictions. Thus there is a connection , though complex , between
the "errors " in predicting different variables . If our example
had been somewhat more complex , with a lot of nonlinearity , and
a certain amount of freedom to deviate from one's ideal type ,
it is clear that the correlations between the prediction errors
of different variables could become hopelessly complex.
According to maximum likelihood theory , as sketched briefly in
section (v ) , it is important to minimi z e the "right " measure of
error, even when we estimate the coefficients we intend to use in
making predictions of this process. The "right " measure is suppo sed to correspond to the actual noise process going on . If it does become
important, in practice, that we do have such an accurate measure of
error, and if the actual noise process is as complex as above ,
then we face serious difficulties in estimating any parameters at all . In our simple example , we could escape from these difficulties ,
by carrying out a simple factor analysis to detect the ideal types ; then we could go on to study the ideal types only , and disregard
Page II.59
the original four variables. However , in the general cas e , a linear
technique like factor analysis may not be enough s also , we may still
want to consider the original variables, to account for whatever
independent variation they have . In any case , it i s clear that the conventional model of independent errors , following a normal
distribution , cannot deal effectively with this kind of situation . The "measurement noise only" technique could con ceivably reduce
the difficulties here , but one would still expect a better model to emerge , if one could account for the complex interrelations
of the process variables more explicitly .
In summary, i n order t o produce a "true " model o f a social
process , which is also capable of yielding good prediction s ,
one must have an accurate model both of the "predictive part "
( like equations ( 2 . J ) ) , and of the "noise part " ( like equations (2 . 4 ) ) 1
otherwise , the standard techniques of statistical estimation may
yield unrealistic estimates of both . If one pursues an unbalanced approach , giving more weight to the "predictive " part than to the "noise " part , one may soon find oneself 1n a situation where the
inaccuracies 1n one ' s noise model are so large that any improvements in the "predictive " part are reflected by little improvement ,
i f any, i n the statistical likelihood o f one ' s model. The "models
without process noise " di scussed above can , at the very least , serve as detectors for this kind of difficulty .
Page II60
(viii) HOW TO ESTIMATE EXPLI CIT
SOPHISTICATED NOISE MODELS
Suppose that we had decided to make the DeutschSolow model
for assimilation more sophisticated, not by working on the
"predictive " part , but by working on the "noise " part; suppose that
we decided to account for the possibility of "flukes, " as discussed
above , Then we might write the model 1
A(t+l ) • k A(t ) + k U(t ) + b(t+l ) 2 1 1
8
½t E.B >
2
+ p * 1
1
fzn Bl
(2 . 1 7 )
Our problem is to try to maximize "P", which, as in the case of
ordinary regression , will equal the product :
P • p(b(2 ) ) p ( b(J ))p(b(4 ) ) • • • p ( b (T ) ),
where T is the last time period for which we have data , An easier way
to approach this is by trying to maximize the logarithm of P 1
L = log P = log p( b (2 ) ) + log p(b(J ) ) + • • • + log p ( b(T ) ) .
We are trying to pick out the best possible values for the parameters k1 ,
I1ges
in f\ will lead to enormous derlvatives from Q21 again , destabilizing 2 the system again before there is a chance for "w " to build up enough to allow a large increase ln 912 • 11hus , at least when the scaling
problem i.s severe, the hope of imbalanced growth is not an answer to the danger of slow convergence.
The solution of this problem was rather straightforward for
ARMA estimation ; we simply scaled the variables of the problem
according to a common scale . More precisely, we achieved the same effect by setting :
'
Page III35
where " d" " refers to the standal'.'d deviation . On a more sophisticated level, what we are doing here is trying to maximize the expected
progress per iteration, :i.n light of our prior probabilistic
knowledge about L. We do not expect the units of measurement
of the variables to tell u s anything about their relative influence
on each other ; therefore, we demand a choice of "g " which insures k that a change of units will have no effect on our algorithm.
More generally, these variances give us an idea of the expected
order of size of a coefficient, and we set "gk " to keep the changes in line with the expected ratios between sizes and derivatives.
To handle the case of Bk
e
but reasonable expression ,
a l , r ' therefore, we write a rough
g = max( T ( 1P )A , A ) rr rr rr r
in By changing a 1,r by a certain amount, we are chang:i:ng at, r units proportional to 1 � thus our formula for g i s like our formula r for the g· k· with 9rs·• except that ..,i__s .. i s replaced by " 1 ". If Prr equals one, then this effect will tak e place on all the at,r '
and the analogy i s exact , Otherwise, if P ls smalle:::�, the der1.va:tive rr with respect to a1 , r of L will be much smaller, even when the optimal is still just as large ; thus we propose a large g ' 1,r r (O) ., is recalculated in every major in that case. Note, as "A
size o f a
iteration, the formulas above encourage us to recalculate the "g .. k
at the same time. In Chapters ( IV ) a.'1 d ( VI ) we have included
a brief discussion of the success of this general procedure with
Page III36
the estimat.ions we have carried out ; in the Appendix to Chapter ( II )
w e have suggested ways of generalizing the procedure for use with general , nonlinear models.
The choi ce of "w" requires a similar exercise in prior
estimation . At each step, our program looks at three essential
pi eces of data  L(A (
o)
,1
(0)
) , L(A' ,1, ) and � (B kB �O ) )t I(
k
•
Assuming that L is e ssentially quadratic , and that the current choice of
w
is
"right" ( i . e . that L(A ' ,'7!!) is the maximum of the quadratic
d:lstribution ) , the program " expects " that L ( A • ;rt') wi ll be better than L(A ( O )
;f. O ) ) by exactly half what the gradient would appear to
ind i cate . If thi s expectation is co:rrect. then the program concludes that B ' is not only acceptable , but also that there probably i s
little point i n exploring further i n the same direction ; i t sets
t( o )c�9
A(
O)
to A ' , and begins a new major iteration. If L ( A ' ,]"t )
i s worse , then, by the quadratic assumption , we have overshot by
at least a factor of two ; w should be cut in half, ari.d a new B..., tried
accordingly . In o:cd er t,o be a bit more conservative have specified in our program that w will be reduced by 40%, if the actual gain
i s less than 2516 of what i s indicated by the gradient. If L ( A ' , �) i s
better than 75'f'/, o f what is indicated by the grad.lent , our quadxatic
assumption tells us to double w . In a.n earlier version , we were more conservative here , and required 100%; however , convergence was slow
in some cases, and w e reduced the requirement to 8756, which has proven
adequ.at e 9 In the intermediate range , when B ' i s d eemed acceptable for
Page I IIJ7
the start of a new major iteration, w is still changed somewhat,
for the sake of the next iteration ; w is multl.plied ·by twice the
actual gain, L(A' ,1• )

L(A ( O ) ;i ( O ) ), divided by the gain predicted
by the gradi ent. At the other extr�,me, if w appears to be far off, the program will multiply w by
or by . 3 in each minor iteration ; ( ' _. more precisely, if w appears to be too small to let us set ➔B O 1 �B ' , Lf
even after w has just been doubled within the same major iteration,
or if w appears to be too large after having just been cut to 60%, then a larger change will be tried in the m,1xt minor :i teration.
Flags are set in the program, to force it to stop changing w,
as soon as it starts changing w in opposite directions within the

same major i teration ; in such cases, our procedures above insure that
..
either the last B ' or the one before it gave an L ( A ' , B ' ) much better than L ( A ( O ) ,B ( O ) ), and our program will set B ( O ) to this new
B'
the next major i teration. For reasons similar to those mentioned
for
in the previous paragraph, w i s in1 tialized at 1/'1',
At each step, the program prints out L, as defined in equation
(3 . 38 ) , and the direction of change of w . After five major iterations,
or after L appears to have stabilized to within , 01, whichever
comes first, the program stops, and as k s the user :i f he wishes to
continue ; if not, it prints out the analysis so far, and transfers to another program to ca:rry out s:lmulation studies of his model.
In the current version described above, five major iterations have
usually been enough for a close approximation, for analyses of actual
Page III38
social science data ; for safety, however, we have generally used ten in our own analyses . ( Once L is about 0, 1 away .from its
:maximum, then the current set of coefficient estimates has almost as high a. probabi li ty of exact truth  90% as high  as the
maximum li kelihood set itself ; thus 0 . 1 is a conservative upper limit to how much accuracy it makes sense to ask for. ) Unfortunately, the changes above were made piecemeal over a number of runs on·
different data, with the final improvements ex isting only in the
basic subroutine incorporated into the MIT version. Thi s routine has
a more effective procedure for generating initial estimates than
we u sed with our earliest test data ; thus the direct comparison , before and after, would overstate somewhat the relative merits
of the current system, In the Appendix to this Chapter , we have
provided a numerical example of convergence results before and after these :procedures for convergence were introdu ced .
'U1e fundamental purpose in using ARMA estimation, as we have
described it, i s to improve upon classical multiple regression. Thus we have decided to use multiple regression i tself, to provide the
initial estimates, i ( o ) . Not only are these li k ely to be reasonable
estimates, in terms of their general order of size and in terms of
the size of the biggest terms ; they also provide us with an assurance that our ARMA model will either represent an improvement upon
multiple regression, or, in some cases, will confirm multiple regression.
Page III39
Our subroutine has been written to allow other initial estimates,
but the main program now available at TSP does not make use of
this option. Originally, we used the regression coefficients that come from a standard model including regression constants ;
our results on Norway, i n section ( v ) of Chapter ( VI ) , were based on that system. With the constants , one introduces a greater degree of
freedom into the regression models, to offer a more interesting ( though perhaps axtificial ) comparison against the ARMA models,
which do not include that degree of freedom. However, in order to
insure convergence u�der all circumstances, we have eliminated the
regression constants in the MIT version ; users of that system still
have the freedom , in any case, to obtain regression constants based
on constant terms by using other modules in the same system or even by another run of the ARMA command, Both the MIT version, and our private version used on the Norway data, print out all the ARMA estimates and regression coefficients, along w:.tth the standard
deviations of the variables ( to assist in interpretation ) and
the likelihood values for both models. 1he significance of the
various coefficients can be estimated by looking at the likelihood of the models which result when the coef':ficients are removed from the model.
Several other options have been added, to extend this algorithm
somewhat, First of all, there is now provision for "exogenous
Page III 40
variables . " In the di.scu ssion above , the expression "fl�\ i "
could have been replaced systematicall.,v by "By  i " , where G is t
now a rectangular matrix , and where y t i includes both z  i t
and a few other components ; none of the equations above would have
had to be changed in form. Our program, in its current form , allows both endogenous and exogenous variables ,
Second , there i s provision to allow the user to dictate
apriori that certain components of 9 will be constrained to equal z ero . 'I'.his is done simply enough , by setting thei:r initial values
to zero , and constraining ( 3 . 4 1 ) to apply only to the o ther
components of
e.
Thus L is rrJ.aJdmized as a function of the other
components , subject to tM.s constraint. The basic subroutine allows
this for any coeffi cient, but , in the MIT version , we have limited this to those
. for which y . i s an exogenous variable. J J Th.ird , there i s provision to give the user some ability , Q
1
at least, to handle nonstationary processes . Box and Jenkins ( 19)
discuss at great length the prominence of nonstationary processes in practical statistics ; they point out the value of introducing some kind of careful procedure for dealing with nonstationazy
processes, even if the procedure must have a less rigorous
foundation than the u sual statistical processes , in order to give
the social scientist confronted with such :processes an alternative
other than either giv:lng up or using an inappropriate tool, However ,
the procedures they i ntroduce (20 ) involve processes which tend to grow
Page III41
k as t , for some constant k. By contrast, the commonest processes of
growth in the social sciences would appear to be those of exponential growth, processes which may come out of a dynamic relation like that of equation ( 3 . 5 ), but with a choice of "G" large enough to allow
growth. The concept of maximum likelihood, as discussed in
Chapter ( II ) , does not require a "fl" that generates a stationary
process; thus at first glance, the special procedures suggested by Box and ,Tenkins might appear irrelevant , However, our estimation
procedure has depended on equation ( 3 . 6 ) , not ju st on ( 3 • .5 ) and the
likelihood concept. Equation ( 3. 6) implies that the average size of the random component of our process remains the same across time ,
If w e w ere analy z ing a two...hundredyeax series of data on the US GNP,
for example, thi s would imply that a $10 billion error in our
predictions for 1 790 from 1789 should be treated as a smaller mat·ter than an $H billion error in our predictions of 1973 from 1972 ;
a $ 1 0 billion error would always be regarded as less significant
than an $ 1 1 billion error, regardless of the year in which the error
occurred. In practice, the measurement errors and random fluctuations both are likelier to be a fixed percentage of the variable itself GNP  than to be a fixed independent process. To handle this kind
of situation, we have introduced an option, ''AfillAWT", to deal with
a model o:f error slightly different from ( 3 . 6 ) :
1(21l)fi det A
)
,
Page III42
which is simP,lY the normal distribution for the ndimensional t vector e' , � , and which requires u s to calculate A 4 as the 1� lzt , 1 covariance of this vector. In practice, however, the simulation studies of Chapter ( IV) suggest that the ordinary ARMA co1111T1and
generally performs at least as well as ARJi'1A\iT, even for most of the nonstationary processes studied.
F"lnally , provi sion has been made for the possib:llity 
mentioned in Chapter ( II )  that the available data would consist, not of one string of observations across time
:for
our variable s ,
but of a. whole set of such strings ; a general model of the process of population growth , for example , might encourage us to develop
a :mod el for application to data,series involving the same variables
across many dlfferent countries , In order to handle thi s possibility,
\
we can use ( 3 . l} l ) as before , but must note : (. i ) aG .lk.. and ap _iL • across rs rs all the data strings, will simply equal their sum across all the
individual datastrings ; ( ii ) each datastring, s, will require its own 'tl s ) for initialization ; ( iii ) �(s) will , of course , equz.l the aa�� �L value of ..i, calculated in string s. ( Note that this example 1 ,r might be a good candidate for the use of "AHMAWT" , to prevent the analysis f'rom being dominated by nations of large population • • •
unless such weighting is ac tually d.esin.x1 . )
Page
APPEN DIX :
III43
NU MERI CAL EXAMPLE OF THE BEHAVI OR
OF DIFFERENT CONVERGENCE PROCEDU RES
All of the ARMA es t imations reported in Chapters ( I V) and ( VI )
were based upon the final form of our algo:ri thm, mak:i.ng use of the special convergence procedures described in section ( iv ) . However,
before we installed these procedures , we carri ed out a number of tests on the preliminary version of the ARMA routine, on simple
madeup data se ts , in order to check out the accuracy of the routine . One of these sbnple test series was u sed before we introduced the
possibili t y of different '' gk " for different "B k " , as in equation
( 3 . 4 1 ) ; thus we can see the effect of adding our new convergence
procedures by comparing the old test results against a new analys:i.s
o:f the same series. 'I'h e data series in question is a simple
univariate series of length seven  1 . 0 , 1 , 2 , 1 . 2 ,
1 . 3 , 1 . 5, 1 . 4 , 1 , 0 
fit to the model z +l = Gz + a + Pa l " This series does not fit t t t t
w ell to a simple ari thmet.ic progression ; thus the "distance " from
the regression model to the ARMA mod el turns out to be fairly great ;
the series i s a relatively severe test of convergence possibilities.
( The cost per iteration i s low , because the series i s short, but the
progress per iteration in log probability , as a percentage of the gap in log probabili ty , i s extremely slow . ) Our initial test output
Page III44
was a string of numbers, which we may arrange in a table : Major Iteration Number 1
UH
u"
2
,, u
""
J
4
5 "" r t "'
nu
uu uu
Theta i* o) B
LogF
,
( and 1 0 ) : P ( t ) = . OSA ; M( t ) = . l S R
P r oc e s s 5 ( a n rl 1 1 ) : P ( t ) = . 0 5 A ; M ( t )
P ro c e s s
=
( a n rl
12) :
P( t )
=
. OSR;
,� ( t )
=
=
. osc . o s r,
F i v e a n d f i f t e e n p e r c e n t e r r o r s w e re c h o s e n o n g r o u n d s
P a p.; e I V  9
t h a t t h e y s e er, " t y p i c a l " ;
compu t e r t i me w a s no t
a v a i l a b l e t o r e p l t c a t e t h i s s t u d y fo r d i f f e r e n t v a l u e s
o f t h e s e pa rame t e r s . a ppea r a nce of
11
I t s h ou l rl b e n o t e d t h a t t h e
!\ 1 1 tw i �e w i t h P roce s s 2 , a b o ve , d o e s no t
me a n t h a t t he s ame r a n dor, n u�h e r , A ( t ) , wa s u s e d i n bo t h processe s ;
i n ge ne r a l , e ve r y t i me t h a t we n e e d e d
a r a n dom n umbe r f o r a n e w a p p l i c a t i o n , w e i n vo k e d a new c a l l t o " r a n dom " , t h e r a n dom n umh e r �e n e r a t o r o f t h e P roj e c t C amb r i d g e T S P  C S P s y s t e m .
I n e q ua t i on ( 4 . 2 ) ,
one s ho u l d a l s o no te t h a t a n u n r e a l i s t i c c h a n g e o f s i g n cou l d o c c u r i f P ( t ) s h ou l rl e v e r e q u a l  1 o r l e s s ; t h i s
wo u l d b e a ve r y r a re e ve n t , w i t h t h e s ys t e ms we h a v e s pe c i f i e d, b u t eve n o n e s u c h i n c i d e n t wo u l rl pe r s i s t
t h ro u g ho u t a n e n t i r e s i mu l a t e d t i me  s e r i e s, m a k i n g i t to t a l l y u n r e a l i s t i c a s a r e p r e s e n t a t i v e of
so c i a l  s c i e n c e t i rne  se r i e s .
S i m i l a r l y , wh t l e
me a s u r eme n t e r ro r s a r e o c c a s i o n a l l y q u i t e g r o s s f o r
so c i a l s c i e n c e va r i a b l e s , i t i s u n r e a l i s t i c to i ma g i n e
s orieo ne g e t t i n � t h e s i g n w ro n g fo r s u c h v a r i a b l e s a s S N P o r pop u l a t i o n .
T h u s fo r P ( t ) a n rl M ( t ) bo t h we
i n s t i t u te d a c u t o f f of  . 7 5 ; va l u e s l e s s t h a n t h i s we r e
s e t e q ua l t o t he c u to f f ,
( No s i mu l a t i o n s we r e r u n
w i t ho u t a c u to f f ; t h u s i t i s pos s i b l e t h a t t h e c u to f f
P a r; e I 1 1 1 0
wa s n e v e r a c t ua l l y i n vo k e rl . ) � mo r e e l e R a n t p ro c e d u r e , ma t hema t i c a l l y , ri i r; h t h n v e b e e n t o u s e � a n d
of
( l+ P )
and
( l+M)
I n e c:i u a t i o n
(4.2) .
l1
1 fov,e v e r ,
i n s tead
ma j o r
a n d e n d u r i n g c r a s h e s do som e t i me s o c c u r i n t h o s e so c i a l s c i e n c e va r i a b l e s s ub j e c t t o e r r � t i c b e h a v i o r ; f o r
e x a mp l e , p h e n ome n a s u c h a s z � r o o r n e � a t i v e pop u l a t i o n s eem t o b e a vo i d e d a s a r e s u l t o f e x t r ao r rl i n a r y
p ro c e s s e s d i f f e r e n t f rom t h o s e o ne r a t i n g o n n o r ma l
pop u l a t i o n s . T h u s , o n g r o u n d s o f r e a l i s m, we d e c i d e d to use c u t of f s i n s t e a d o f a mo r A e l e R a n t a p p ro a c h . A l so , a
c u to f f o f  1 5 . wa s u s e d f o r t h e v a l ue o f B ( t ) l n s e r te rl
i n to e q ua t i o n ( 4 . 5 ) , o n g ro u n d s t h a t t h i s wo u l d p r e ven t
t h e po s s i b i l i t y o f i n vo k i n g a c u t o � f s e v e r a l t i me s i n a row o n M ( t )
.
T h e c h o i c e s a b o v e , i n b r i e f , g i v e u s a c h a n c e to
l oo k at t h e fo u r pos s i b i l i t l e s o f n o me a s u r eme n t n o i s e ,
of s i m p l e me a s u r eme n t n o i se , o f me d i um  c omp l e x
me a s u r e me n t no i s e , a n rl o f c omp l e x mea s u r e men t no i s e , a l l i n t h e p r e s e nc e o f s i mr, l e p r oc e s s n o i se ;
t h e y a l so
l e t u s l oo k a t s i mp l e m ea s u r eme n t n o i se a n d com p l e x
me a s u r eme n t no i s e , i n t h e p r e s e n c e o f me d i um comp l e x
p ro c e s s n o i s e . P ro c e s s e s 1 a n rl 2 c l os e l y re s emb l e t h e
p ro c e s s e s fo r wh i c h s i mp l e r e g re s s i o n a n d A RM A morl e l s ,
P a ge I V  1 1
re s pe c t i ve l y , s h o u l d h e i d e a l , i n t h e o r y .
T h e e x t r eme
c a s e o f z e ro p r o c e s s n o i s e , t h P. c a s e wh i c h s h ou l d b e
mo s t f a vo r a b l e t o t h e " ro h us t a r p ro a c h " , wa s n o t i n c l ud e d , o n g r ou n ds t h a t we a r e i n t e r e s t e d i n
e va l u a t i n g t h a t a p p r o a c h u n d e r mo r e no r ma l , m i x e d con d i t i o n s .
I n o r rl e r t o a c c o u n t f o r t h e po s s i h i l i t y o f
mo r e c om p l ex p ro ce s s e s , w i t h o u t v i o l a t i n g h omoge n e i t y ,
v,1e i n t ro d u c e d P roc e s s e s 7 t h r o u g h 1 2 h a s e d o n t h e
fo l l ow i n g e q ua t i o n s , w h i c h y i e l d a g r ow t h r a t e o f 1 . 6 % ( f r om t h e 1 i n e a r i z e d d i f f e r e n ce e q ua t i o n )
X ( t + l ) = ( . 3 8 X ( t ) + . 3 5 X ( t  1 ) + . 3 X ( t  2 ) ) ( 1+ P ( t ) ) Z( t ) = X ( t ) ( l + M( t ) )
(4.7)
T he c h o i c e s o f P ( t ) a n d M ( t ) he re we re i d e n t i c a l to t h o s e w i t h P ro ce s s e s 1 t h ro u � h 6, a s i n d i c a t e d i n
e q u a t i o n s ( 4 . 6 ) , a n d t h e s a me c u t of f s we r e u s e d .
F o r e a c h o f t h e twe l ve p ro c e s se s d e f i n e d a bo v e ,
t e n s a m[) l e t i me  s e r i e s , " Z l " t h r o u g h " Z l O " , we r e
g en e r a t e d . T h e n , i n o r d e r t o e s t i ma t e " c " i n e q u a t i o n ( 4 . 1 ) , f o r e a c h of t h o s e s a m p l e t i me  s e r i e s , we u s e d
t h e t h re e gene r a l te ch n i q u e s d i s c u s s e d t h r o u � ho u t t h i s
t h es i s : ( i ) c l a s s i c a l r e R re s s i o n ;
( i i ) t he II A R M .i\ 1 1
Pap.:e I V  1 2
;::i p p r o a c h ; ( i i i ) t h e
conven t i o na l
W i'l. Y
II
roh u s t a p p ro a c h " . T h e mo s t
of e s t i ria t i n g " c " , f o r t h e mo d e l
( 4 . 1 ) , i s to u s e a s t a n d a r d r e g r e s s i o n p r o g r am to do a
ma x i mum l i k e l i hood e s t i ma t i o n o f t h e r e l a t e d mo d e l : Z ( t+ l ) = cZ ( t ) + k + a ( t ) ,
(4. 8)
\'I/ h e r e " k " i s a co n s t an t to h e e s t i ma t e d , a n d "a ( t ) " i s a no rma l r a n dom no i s e p r o c e s s .
I n p r a c t i ce , t h i s
a mo u n t s to do i n g a l e as t  s q u ;::i r e s e s t i r1a t i o n , a s i n s e c t i o n ( i i ) o f r. h ape r ( I I ) ; o n e hope s t h a t
11
k" , wh i c h
i s e x p e c t e d to be ze ro , w i l l he e s t i ma t e d a s sor1e t h i n g
c l o s e to ze ro .
con s t a n t t e rm,
T h l s te c h n i q ue , r e g r e s s i o n w i t h a
i s a b b r e v i a t e d a s " r e g + k " i n o u r ta b l e s .
A h e t t e r wa y of u s i n g c l a s s i c a l r e g r e s s i on , t o
e s t i r, a t e
II
c I I i n ( 4 • 1 ) , i s t o u s e a s i mp 1 e r mo d e l ,
w i t ho u t t h e me a n i n g l e s s c o n s t a n t te rm : Z( t+l) = cZ( t ) + a ( t)
( 4. 9)
T h i s k i n d o f r e g r e s s i o n c a n he pe r fo r me d a u toma t i c a l l y i n t h e T i me  Se r i e s  P ro c e s so r s y s t e m . T h i s t e c h n i q u e ,
r e g r e s s i o n w i t h ou t a co n s t a n t te rm, i s a b b r e v i a t e d a s
" r e g " i n o u r t ab l e s .
r o r r e s po n d i n g to t h e s e two s i mp l e r e g r e s s i o n
P a p; e I V  1 3
mod e l s a r e two si m p l e A RMA mo d e l s .
Acco r d i n g to o u r
r e s u l t i n t h e b e gi n n i n g of C h a p t � r ( I 1 1 ) , t h e
c o r r e s po n de n ce i s mo r e t h a n j u s t o n e o f s i mi l a r i t y ; t h e ARMA mod e l s b e l ow a r e t h e gen e r a l i z a t i o n s o f ( 4 . 8 ) a n d ( 4. 9 )
t o ac cou n t fo r t h e po s s i b i l i t y o f "whi t e n o i s e "
i n t he p r o ce s s o f d a t a me a s u r e me n t . Mo d e l ( 4 . 8 )
co r r e s po n d s to :
Z ( t+ l ) = c Z ( t ) +a ( t ) + Pa ( t  l ) + k ,
( Li .
I O)
w h i c h c a n b e e s ti m a t e d d i r e c t l y i n t h e P ro j e c t
C amb r i d g e Ti me  S e ri e s P roc e s s o r , h y u s e o f t h e comm a n d "ARMA" , d e s c ri b e d t n t h e l a t e r p a r t o f C h a p te r ( I 1 1 ) .
T hi s t e c h n i q ue fo r e s t i m;i t i n � " c " we a b b r e vi a t e a s 11
a rm a + k 1 1 i n t he t a b l e s a t th e en rl o f t h i s c h a p te r .
Mod e l
( 4. 9 )
co r r e s po n d s to : Z ( t+ l )
=
(4. 11)
cZ( t ) +a ( t ) + Pa ( t 1 ) ,
w h i c h c a n a l so be es t i ma te d b y t h e comm a n d
11
/\ R M l\ " ; t h i s
te c h n i q ue fo r e s t i ma ti n g "c " we w i l l a b b r e vi a t e a s
" a rm a " i n t h e t a b 1 e s a t t h e e n d o f t h i s c h a p t e r .
F i n a l l y , t he e s t i ma ti o n a l go r i t h m d e s c r i b e d i n C h a p t e r ( I I I ) a l l owed u s to w r i t e a no t h e r comma n d , ARM AWT , to
e s t i ma t e t h e mod e l :
Pa ge I v  1 1�
Z( t+ l )
=
cZ( t ) { l+a ( t ) ) +Pa ( t l ) z ( t 1 )
( 4 . 12 )
T h i s comma n d , me n t i o n e d b r i e f l y i n C h a p t e r ( I l l ) , i s
e s s e n t i a l l y eq u i v a l e n t t o e s t i ma t i n � ( 4 . 1 1 ) , w i t h t h e
a s s um p t i o n t h a t t h e no i s e p r o ce s s , " a ( t ) " i n ( 4 . 1 1 ) , i s de t e r m i n e d a s a ne r c e n tage of t h e a c t u a l v a r i a b l e , a s i n ( 4 . 2 ) , r a t h e r th a n a p ro c e s s o f co n s t a n t Me a n a n d
va r i a nc e .
T h i s t e c h n i ci ue f o r e s t i M.3 t i n g " c " , we
a b b r e v i a t e as " a rrv'lw t " i n th e t a h l e s a t t h e en d o f t h i s
c h a p te r .
F i n a l l y , we h a d t o f i n d a " r o h u s t p r o ce d u r e " fo r
e s t i ma t i n g c , d r a wn f rom t h e d i s c u s s i o n o f s e c t i o n ( v i i ) o f C ha p te r ( I I ) .
I n t h e pu r e c a s e of z e ro p r o ce s s
no i s e , t h e s e p r o c e d u r e s r e q u i r e u s t o e s t i ma te t h e
i n i t i a l " u n d e r l y i n g " v a l ue s o f t h e v a r i a b l e s me a s u r e d ,
a n d t h e coe f f i c i e n t s o f t h e mo d e l , b y d i r e c t l y
m i n i m i z i n g t he a ve r a ge e r ro r s o f l on g  t e rm p r e d i c t i o n s
ma d e \'J i t h t h i s mo d e l .
I n o t h e r wo rd s , f o r t h e e s t i ma t e d
i n i t i a l v a l ue s , a n d a g i v e n s e t o f coe f f i c i e n t s , o n e
ma k e s a f u l l s e t o f p r e d i c t i o n s f o r t h e v a r i a b l e s o f i n t e r e s t , w i t hout e v e r m a k i n � u s e o f t h e me a s u r e d
v a l u e s o f t h e v a r i a b l e s a t i n t e r me d i a t e t i me s ; u s e s t h e a ve r a g e e r r o r i n t he s e p r e rl i c t l o n s ,
on e
p r e d i c t i o n s w h i c h a r e ge n e r a l l y l on �  t e rm p r e d i c t i o n s ,
Pa ge I V  1 5
a s o n e ' s c r i t e r i o n o f f i t ; o n e u s e s t h e me t h o rl o f
s t e e pe s t d e s c en t , o r a r e l a te d p ro c e d u r e , i n o r d e r t o
p i c k c oe f f i c i e n t s a n d i n i t i a l e s t i ma t e s t o m i n i m i z e t h e to t a l e r ro r i n t h e s e p r e d i c t i o n s . ( A " r e l a xe d " ve r s i o n
of t he se p r o c e d u r e s wou l d a l l ow a l i t t l e h i t o f
a l l ow a n c e t o b e m a d e fo r i n t e r med i a t e mea s u r e d v a l u e s , i n p r e d i c t i n g mo r e d i s t an t t i me pe r i od s . )
T h e f u l l mu l t i v a r i a t e , no n l i n e a r ve r s i o n o f t h i s
p ro c e d u r e , b a s e d o n t h e d y n am i c f e e d b a c k a l go r i t h m o f
C ha p te r ( I I ) , w a s no t a v a i l a b l e a t t h e t i me t h e s e s i mu l a t i o n s we r e ca r r i e d o u t .
Howe v e r , fo r e q u a t i on
{ 4 . 1 ) , t h e r e i s a me a s u r e me n t  n o i s e  on l y mo d e l wh i c h i s
muc h e a s i e r to e s t i ma te t h a n i s u s u a l l y t h e c a s e : X { t+ l ) = cX { t ) Z { t ) = X { t ) ea ) � (t
{ 4 . 13 )
X{ t) { l
+
a{ t) )
I f t h e me a s u r emen t n o i s e , a { t ) , i s o n t h e o r d e r o f 1 0 %
o r l e s s , t h e a pp r o x i m a t e e q u a l l t y h e re w i l l b e ve r y goo d , a c co r d i n g t o t he T a y l o r e x p a n s i o n o f ea_
(t)
. In
o r d e r t o e s t i ma te X { O ) a n d c i n t h i s mod e l , o n e c a n t r a n s fo r m { 4 . 1 3 ) a l ge h r a i c a l l y t o d e d uce :
P a ge I V  1 6
l o g Z ( t)
=
t l o� c
+
l o g X ( O)
I n o r d e r to p i c k t h e con s t a n t s , l og c
( 4 . 1 4) +
a ( t)
a n d l og X ( O) , to
ma x i m i z e t h e l i k e l i h ood of t h i s mo d P. l , a c c o r d i n g to s ta n d a r d ma x i mu m l i ke l i h ood t h e o r y , o ne n e e d o n l y pe r f o rm a s i mp l e r e g r e s s i o n o f
l og Z ( t) a ga i n s t t h e
i n d e pe n de n t va r i a h l e " t " a n d a con s t a n t t e rm . A s pe c i a l
ro u t i n e to pe r f o r m t h � s o r,e n� t i n n , ca l l e d " G R R " ( G Row t h
Ra t e) , wa s a d d e d t o t h e P ro j e c t C a mb r i dge T i me  S e r i e s P ro c e s so r i n J a n u a r y 1 9 7 4 . N o t e t h a t t h i s r o u t i n e
e s t i ma t e s " c " a n d X ( O ) i n e x a c t l y t h e s a me w a y a s t h e o l d ro u t i n e
C ha p t e r
11
1: X T R.t\ P" r:l i d , i n t h e wo r k r e po r t e d i n
( 1/ 1 ) .
F i n a l l y , a f t e r t h e s i mu l a t i o n p r oce s s e s a n d t h e
e s t i ma t i o n t e c h n i q ue s a r e d e f i n e d , we f a ce t h e p ro b l em of me a s u r i n g how we l l t h e e s t i ma t i o n t e c h n i q u e s
a c t u a l l y pe r f o rm . W e h a ve u s e d two d i f f e r e n t c r i t e r i a . F i r s t , th e r e i s t h e c r i t e r i o n o f p r e d i c t i v e powe r ,
me a s u r e d ex p l i c i t l y .
Fo r e a c h c om b i n a t i o n o f
t i me  s e r i e s ( o u t o f 1 2 X 1 0 = 1 2 0 s amp l e t i me  s e r i e s)
a n d o f e s t i m a t i o n t e c h n i q w� , we u s e d t h e v a l ue of " c " ,
a s e s t i ma te d fo r t h e f i r s t 1 0 0 t i me  pe r i o d s , t o t r y to
P ap.; e I V  1 7
p r e d i c t t he v a l ue s o f t h e v a r i a h l e Z ov e r t h e r e ma i n i n g 1 0 0 pe r i od s . F o r e a c h s uc h se t o f p r e d i c t i o n s we
c a 1 cu 1 a t e d f o u r me a s u r e s of e r r o r :
( i ) r . m . s . ( r oo t
me a n s q ua r e ) a ve r a ge pe r c e n t a ge e r ro r I n p r e d i c t i n g
pe r i od s 1 0 1 t h ro u g h 1 1 0 ; ( i i ) r . m . s . a v e r age pe r c e n t a g e
e r r o r i n p r e d i c t i n g pe r i o d s 1 0 1 t h r o u g h 1 2 5 ; ( i i i )
r . m . s . a v e r ag e pe r c e n t a ge e r ro r i n p r e rl i c t i n g p e r i o d s
1 0 1 t h ro u g h 1 5 0 ; ( i v ) r . r, . s . a v e r a ge pe r c e n t a ge e r ro r
i n p r e d i c t i n g p e r i od s 1 0 1 t h rour;h 2 0 0 . ( l\ l s o , a s e t of
p r e d i c t i o n s w a s ma d e , f rom pe r i o d 1, to pe r i o d s 1
t h r o u g h 1 0 0 . ) T he ex a c t r e s u l t s o f t h e s e te s t s a r e
s hown i n T ab l e I V  4, fo r e v e r y s amp l e t i me  s e r i e s . L e t u s d e f i n e i n a h i t mo r e d e t a i l h ow t h e s e
p r e d i c t i o n s we r e a r r i ve d a t . F o r t h e r e g r e s s i on mo d e l s ,
( 4 . 8 ) a n d ( 4 . 9 ) , we i n se r t e d t h e k n own v a l u e o f z ( l O O ) ,
a n d t h e e s t i m a t e s o f c a n d k , a n rl t h e mo s t p r o b a b l e
v a l ue fo r a ( l O O ) ( i . e . z e r o ) , i n o r d e r to p r e d i c t
z ( l O l ) ; t h i s p r e d i c t i o n fo r Z ( l O l ) w a s r e i n s e r t e d ,
a l ong w i th a ( l O l ) = O, t o g i ve u s a p r ed i c t i on of Z ( l 0 2 ) ,
wh i c h i n t u r n wa s r e i n s e r t e d , e t c . F o r e q u a t i on ( 4 . 9 ) , t h i s h a s t h e s ame e f f e c t a s i n s e r t i n g t h e e s t i ma t e o f
11
c 1 1 i n to ( IL l ) , an d u s i n g ( 4 . U t o ma ke t h e f o r e c a s t s .
l ! i t h t h e AHMA mode l s , e11 u a t i o n s ( 4 . 1 0 ) , ( 4 . 1 1 ) a n d 1
Pa r;e I V  1 8
( 4 . 1 2 ) , a l mo s t t h e s ame p r o c e rl u r e wa s u s e d . T he
me a s u r e d v a l ue o f Z ( l O O ) wa s i n s e r te d to g i v e t h e f i r s t p r ed i c t i o n, the p r e d i c t i on o f 2 ( 1 0 1 ) ; a ( l O O ) t h rough a ( l 9 9 ) we n? s e t t o z e r o .
HO\\le V e r , t h e A R H /\ e , it was decided that priority should be given to the simpler formul ation at this stage.
At any rate, the model written out above 
(G. 2 9 )  did not perform especially wel l , with our
initial Norway data, according to the usual statistical
tests based on shortterm prediction .
In terms of
statistical 1 ikelihood, the regression model in this
run did no better than the regression model of our
earlier run, without communications terms . (Gaps in likelihood l ess than one point, as discussed at the
base of Table Vl11, were not recorded, due to the
impl ication of no significance in such differences in
apparent performance . ) In other words, the extra terms did not appear to add anything. The estimate of
11
c3 1 1
here, as in all the other runs carried out on this
model and its analogues, was too smal l for the computer output formats to cope with .
11
c 1 1 1 , however, was on the
order of 1 %. Again, the regression model was superior
to the ARMA model , with a gap of likel ihood scores of
Page Vl 1 19
5 . 6 5 , ind icating odds of 2 8 0 to 1 favoring the
regression mode l . The ARMA communications model had a slight l y higher like l ihood  b y 1. 5  than the simpl e
mode l (6. 2 8 ), indicating odds of 4. 5 to 1 in its favor;
however, these are not e xact l y overwhe l ming od ds. The R
of the ARMA mode l was . 994 8 , versus . 9949 for the
regression mode l , just as before.
In l ongterm prediction, however, from 1939 to
1967, the ARMA mode l d id increase its margin of
superiority; its R. M. S . average percentage errors were 1 3. 4 %, versus 14. 7 % for regression, whil e its absolute
errors were 27% versus 39%.
™ li.
The communications term,
poor l y estimated, c l ear l y added something ..tQ
l onger te rm pred iction.
With a different estimation
approach, oriented towards predictive power instead of maximum like l ihood, the gain provided b y the
communication terms might have been considerab l y l arger.
Al so, with the original model, (6. 2 1 ), the
intermed iate provinces of Norway, instead of the
minimum Nynorsk provinces, might have been sing l ed out more effective l y as likely areas of l arge  sca l e
assimilation; again, the pred ictive power of the mode l might have b een enhanced.
In the third run, as an a l ternative hypothesis, we
considered the possibi l ity of using real income as a
Page Vl 120
variable to predict l anguage: A(t + l )
=
c1 A(t) + c�Y (t),
(6. 3 0 )
where Y (t) represents the real income in a region, and
where we have not written out the noise terms.
In
terms of likelihood theory, the performance of this model was exactly the same as that of (6. 29), as
described above. In l ongterm prediction, however, it did not do quite as wel l . The ARMA R. M. S.
average
percentage errors were 1 4. 1 %, and absol ute errors 2 8 %;
the regression percentage errors were 1 4 . 9%, absol ute errors 39%. These resul ts are cl oser to those of the
univariate models, (6. 27) and (6. 2 8 ), in qual i ty , t han
to the resul ts with (6. 29). In principle, however, the rel ative potential of the two model s in l ongterm
prediction wil l not be cl ear until a new type of
estimation system is avail abl e.
In the remaining runs on our original data, we
decided to expl ore communications indices other than that of simpl e migration. Two other measures of
communication  tel ephone cal l s and vol ume of mail 
were avail able; however, these were onl y avail abl e on a provincebyprovince basis.
We faced the problem of
how to reconstruct the matrix of communications from
each province ..tQ each other province. This probl em has
Page Vl 12 1
often b een faced el sewhere in regional science (3 5 ) and
in sociol ogy, and resol ved b y way of a "gravity" model . The original grav i ty model, proposed by Stewart, woul d
estimate provincetoprovince tel ephone communications,
for exampl e, as follows : C•• C..,)
= C
T, T • £ J
,
o ·r••
(6. 3 1 )
l�
where Tt and � are the total volumes of telephone
communication (or o ther communications variables, such
as migration) in each province, and where r4j is the distance b etween provinces.
A
modified version, studied
by Gal l e and Taueber (36), and discovered to have a
multipl e correlation of between 89% and 93% b etween prediction and real ity, is as fol l ows: C• .
t."
(6. 32)
where k is an unknown exponent to be estimated. (Note that the correl ation here, with a crosssec tional
model , is stronger in its impl ications than a 90% wou l d be in a predictive timeseries model , insofar as a
cross sectional study makes sense here . )
Curiously
enough, whil e equation (6 . 32) has been successful in
empirical tests, the parameter "k" has varied a great
Pa ge V1122
deal i n i ts est i mated valu e; Galle and Ta ueber report
that k was equal to . 62 for i nterurban m i grat i on i n the US i n 19351940, but only . 42 for the same data as
measured i n 19 5 5 1960.
I n order to est i mate the l i kel i est value of "k"
for i nterreg i onal commun i cat i ons i n the case of Norway, i t was necessary to f i t the model (6. 32) to the only
reg i ontoregi on commun i cat i ons data a va i lable  a ga i n,
the m i grat i on data . A d i rect f i t of ( 6 . 32) would have requ i red the use of nonl i near regress i on; however, equat i on (6. 32) can be transformed as fol lows : 1 og C · ·  1 og T •  1 oab TJ• C.,) I,
=
a  k 1 og rC.J ·• ' (6. 3 3 )
where the ent i re left s i de of the equat i on forms the
dependent var i abl e, and where "a" and "k" can be est i mated by mul t i ple regress i on.
As long as we were carry i ng out such a regress i on,
however, i t seemed appropr i ate to test out a new
expl anat i on for the reduct i on i n "k" from . 62 to . 42 as
measured i n the Uni ted States by Gall e and Ta ueber. I t i s fundamental to the commun i ca t i ons theory of
nat i onal i sm, a s descr i bed i n sect i on ( i v), that there
has been a h i stor i c r i se i n the strength of
longd i stance commun i ca t i ons, rela t i ve to
Pa ge Vl 1 2 3
shorterdistance communications, a t least for the a vera ge man . Using equation (6. 32), we may compute the
ra tio of communication across a l ong distance, R 1 , to the communications across a shorter distance, R 0 ,
between regions of equal siz e (� the same for al l
regions):
1
(R1) k 1
For given distances, R 0 and R , the terms invol ving c0 1 a nd Ti , etc. , cancel out; thus � .Q.Il]_y wa y � ratio
� .&!1.t l a rger, for a given comparison of distances,
lf
k g� smal l er.
ll
(e. g. A smal l varia bl e to the
zeroth power wil l equal 1, which is the maximum this
ratio can a pproach under the stated conditions . ) Thus
it is critical to our communications theory that k
shoul d tend to decrease in time, as the resul t of some aspect of "moderniz ation"; the most obvious aspect of
"moderniza tion" to consider is the economic factor, the
increasing income of peo pl e rel ative to the cost of
communication .
Thus we decided to test the model :
k( t)
= c
1

c Y ( t) , l.
(6. 3 5 )
Page V1124
where Y represents the real income of a region . Al so, we decided to consider the possibil ity that a term
representing social proximity (urban vs . rural
simil arity of regions) shoul d be accounted for .
Thus,
in the final regression equation, we decided to test: l og M i,i
=
l og M .;;  l og
s� 
l og Sj
r, . + 4.)
where
u '. l•
is defined to equal one if both regions are
rural or both urban, but zero if they differ ; a measure of distance was obtained from the Worl d A tl as (37);
s '.
is defined as with (6 . 29) . Note that Y (t)  real income
in the subregion from which migration occurs  was .D.Q.t
a surrogate for time in this regression, since the
regression was based on a combination of two 36 X 3 6 matrices of total migration
and 196 8 ;
in the cl oseby years 1967
the primary variation in real income was
between subregions . The covariance matrix produced is
Page V112 5
shown in Table Vl2 5:
u C,J. .
U•.
l,J
.25 1
')
y l og r ' • &.l • l og r '
. 01 7
. 0 12
"l
. 072
l og M '. • 'J
y log r • .
. 017
1. 2 9 9 . 286
. 13 5
.• log r1J
1 og M ,j
. 0 12
. 072
.87 5
. 54 9
. 286
. 5 4 9
. 135
1. 062
Table Vl2 5: Gravity Model Correl ations Inverting the threebythree matrix in the upper l eft of this table, and multipl ying the inverse by the
vector formed by the three upper numbers of the
rightmost col umn, we can compute the standard
regression coefficients for (6. 3 6 ): C3 C 1,
= =
. 24
.26
c , = . 7 0,
al l with the expected signs, and all clearl y very
significant for the large N we have considered and for the variances d isplayed in Tabl e Vl2 5 .
Thus our
income hypothesis appears to have been val idated rather
strongl y. This same regression anal ysis was al so used
to construct an approximate measure of communications
for one of the other communications variabl es availabl e in Norway, te 1 ephone communications ; however, c 2. was
Page Vl126
del eted from the regression equation, and c \ thereby
reduced to . 67, due to the computational difficul ty of
cal cul ating the ful l index, as based on different val ues of the income variabl e across time .
In the fourth run on Norwegian l anguage data,
equation (6 . 31) was used to reconstruct the matrix of
tel ephone communications, C i j , to repl ace M .: ; in equation (6 . 2 9 ) .
The years 19 3 9 to 19 57 were used as a
d atabase . Once again, the regression model did better than the ARMA model in terms of l og l ikel ihood, with a gap of 3 points, impl ying odds of 2 0 to 1 in favor of
the regression model . The R"l. of the regression model
was . 9 941, versus . 9 9 4 0 for the ARMA model ; this was
substantial l y worse than our earl ier runs . On the other hand, this was substantial l y worse than our earl ier
univaria te runs, encompassing a subset of the
independent variabl es here; this signal s us that the
d ata in the period 19 3 9 to 19 5 7 average out to be more
d ifficul t to pred ict than the previous database, 1 9 3 9
to 1967 . Indeed, these years contain al l of the wartime "outl iers" mentioned above, in Finnmark . In l ight of
these d ifficul ties, the model did rel ativel y wel l in l ongterm prediction .
The R . M . S . average percentage
errors were 1 3 . 7 % and 14 . 1 % for the ARMA and regression
model s, respectivel y; the R . M . S . average absol ute
Page V l 127
errors were 3 5 % and 40 % for the two model s,
respectivel y.
Perhaps a l ater run without outl iers
woul d h ave given a much b etter picture.
In the fifth run on Norwegian l anguage data, we
studied the same model as in the fourt h run; however,
this time we used equation (6. 3 6 ), adapted to predict
tel ephone communications, to reconstruct a matrix of tel ephone communications. The R� and t he l ikel ihood
scores turned out to be the same as in the fourth run, except that the ARMA model gained very sl ightl y in
likel ihood  b y one point; the odds against this being
a coincidence are onl y 3 to 1, according to l ikel ihood theory  not a substantial confirmation .
of this run seemed sufficientl y bad, with
that simul ations were not carried out.
The resul ts
rt
stil l l ow,
In the next run on Norwegian l anguage data, postal
data were used, with equation (6. 3 6 ), to construct an index of communications, to repl ace M ,i in equation
(6. 2 9 ). The regression model performed better than the
ARMA model , with a gap in l ikel ihood of 9, impl ying
odds of eq = 810 0 to 1 in favor of regression. The R� of the ARMA model was . 9958, versus . 995 9 for the
regresson model .
At first, these h igh val ues of R�
seemed rather encouraging.
However, the data period
used for this anal ysis was 1 9 4 5 to 1 9 67, due to the
Page V 1 128
absence of postal data in three of the war years; this impl ied that the outl iers in Finnmark were avoided . In our seventh run, as a corrective, we reeval uated the
simpl e univariate model , equations (6 . 27) and (6 . 2 8 ),
over the same timeperiod ; the results were the same as
with the postal model  implying that nothing was
gained by adding these communications terms  except
for an insignificant onepoint decrease in the likelihood of the ARMA model.
After these seven runs were compl eted, a careful
review was carried out, first of the ARMA models, and
then of the communications models. Another run was
carried out on a diffe r ent set of data  on births,
deaths and marriages as a single set of variables . In
that case, the ARMA model outperformed the regression
model by 3 3 9 points, by the usual l ikelihood measures, impl ying an astronomical ly high probability of its
superiority. In conventional l anguage, this gap of 3 3 9
points impl ies that, 1 1 the ARMA model was confirmed with  100
a p less than 10 �
an R
. " Concretely, the ARMA model had
of . 975 in predicting the marriage rate, versus
. 8 5 for regression; also, the variance of the errors in
predicting the death rate was reduced by 1 0 %.
Unfortunately, the computer refused to calculate a full
table of predictions for this case, because the table
Page Vl12 9
was too long; however, the ARMA mode l did not seem
notab l y superior to the regression mode l in that
portion of the predictions which the computer did print O Ut .
In order to exp l ain the mediocre performance of
the communications models, we looked very closely,
province by province, at the direction and s hape of the
errors made by the ARMA communication models (from run
number 2 ) in longterm prediction.
There did not seem
to be a notable tendency for errors to be biased in one direction in any special group of provinces, except for
the "intermediate province" group which the
communications terms should have been able to
distinguish; however, there did seem to be a very strong tendency for the predictions to be
systematically low, everywhere. With constant terms, of
course, this wou l d not have been expected. Stil l , the
independent variables in equation (6. 2 9 ) were close enough to being able to represent constant trends,
upwards, that it seemed very strange that such a bias
would develop. A series of intuitive arguments
convinced us that the outliers in Finnmark, extending for a handful of years in both urban and rural
Finnmark, could add a degree of apparent randomness,
enough to bias the coefficients substantially . Given
Page Vl1 30
that Nynorsk had never been used at a 1 1 in schoo 1 s in Finnmark, we fel t it woul d be best to change the data,
so that in tl1 years the varia bl e "A" (percentage assimilated) woul d equal 100% in Finnmark.
A fter these changes, two runs were carried out,
both highly successful.
The first run dupl icated our
original first run, based on equations (6. 2 7) a nd (6. 28).
This time, the R4 for the A RMA model was
. 9988, versus . 998 7 for the regression model.
The A RMA
model had a l ikel ihood score of 3 5 . 46 points higher
than the regression model , implying odds of 2. 5 mill ion
bil l ion to 1 aga inst its superiority being a coincidence.
Both model s, of course, were doing
astronomical l y better than any of the model s discussed above. In simul ation, however, the picture was a bit
mixed, though stil l improved on the whole. The R. M. S.
a verage percentage errors were 9. 9 % for the A RMA model
a nd 9. 4% for the regression model ; the absol ute errors
a veraged to 20 % a nd 2 7 % , respectivel y.
The second new run dupl icated the second old run,
in using equa tion (6.29) a s a model .
The superiority
of the A RMA model grew larger, when a more compl ete
substantive model was used , just as we ha d hoped
earl ier ; the gap in l ikel ihood grew to 4 2 . 5 1, impl ying 1
odds of 3 X 10 ' to 1 in f avor of the ARMA model . With
Page Vl1 31
the regression models, the addition of communications
terms produced only a slight gain in likelihood  2
points  but with the ARMA model, the gain in
likelihood was 9. 0 3 points, implying a very significant improvement •
(S ign if icant at the 1 eve 1 of
. 0 0011, " in conventional terminology . )
II
p
=
One may note
that the coef ficient of the main communications term was . 0 067 for the ARMA model, versus . 00 54 for the
regression model, both about right for a recurrent
feedback term. With a better substantive model, the
ARMA model improved much more in its longterm
predictive power, too, than the regression model did. The average percentage errors were 8 . 6% for the ARMA
model, versus 9 . 0 % for regression;
the absolute errors
averaged to 17% for the ARMA model, versus 25 % for
regression . Between the two measures, it is reasonable
to say that the ARMA model here, as elsewhere, displays on the order of 101 5 % less error in longterm
prediction than the regression model does . Also, as we pointed out at the beginning of this section, when
ou tliers are removed, the ARMA models are very much
superior to the regression models in terms of formal statistical likelikood. These statements remain true
despite the "ratchet" effects  similar to outliers, but persistent  which we mentioned earlier in this
Pa g e V 1  1 3 2
s ec t i on .
Page Vl1 3 3
FOOTNOTE S TO CHAPTER (VI)
(1) Deutsch, Karl W. , Na t i onal i sm .9.ll.Q. Soc i al Commun i ca t i ons, MIT Press, Cambr i dge, ·Mass . , 19 66, rev i sed second ed i t i on . Chapter 6 conta i ns the ma i n argument l ead i ng up to the mathemat i cal model ; Append i x V conta i ns the mathemat i cal model , and a verbal descr i pt i on of i t. (2 )
ibid•
(3) Hopk i ns, Raymond, "Project i ons of Popul at i on Change by Mob i l i zat i on and Ass i m i l at i on, " Behav i oral S c i ence .. 1972 , p. 2 5 4 . The programs were made ava i l abl e to us by Prof. Deutsch at the Harvard Department of Government .
(4) Deutsch, Karl W. , op . c i t . , Append i x V . Note that several vers i ons of th i s model have appeared i n pr i nt. The vers i on here, i n al l fa i rness, was actual l y taken d i rectl y from Hopk i ns, Raymond and Carol , "A D i fference E quat i on Model for Mob i l i zat i on and Ass i m i l at i on Processes", 1 9 6 9, unpubl i shed ; a copy of th i s paper was prov i ded to us by Prof. Deutsch, and descr i bed by h i m as conta i n i ng the f i nal rev i s i on of the model . Th i s rev i s i on appears, i n d i fference equat i on form, i n Hopk i ns, Raymond, 1 1 Project i ons of Popu 1 at i on Change by Mob i l i zat i on and Ass i m i l at i on", Behav i oral Sc i ence, 1 972, p. 2 5 4 . The reasons for the rev i s i ons to earl i er vers i ons are descr i bed i n Hopk i ns, Raymond, "Mathemat i cal Model l i ng of Mob i l i zat i on and Ass i m i l at i on Processes", i n Ma thema t i ca l Approaches .t.Q. Pol i t i cs, ed i ted by Hayward Al ker, Karl Deutsch and Ant i one Stoetzel , El sev i er Publ i sh i ng Co. , New York, 1973, p. 381. (5) S ee note 1.
(6) Hopk i ns, Raymond, "Mathemat i cal Model l i ng of Mob i l i zat i on and Ass i m i l at i on Processes", i n Ma thema t i cal Approaches .tQ Pol i t i cs .. ed i ted by Hayward Al ker, Karl Deutsch and Ant i one Stoetzel , El sev i er Publ i sh i ng Co. , New York, 1 973, espec i al l y p . 381.
(7) See note 3 . More prec i sel y, we used the Hopk i ns rout i nes d i rectl y, on sampl e cases suggested to us by Prof. Hopk i ns and on a few others.
Page Vl  1 34
(8) I n Chapter (V), we noted that the correl at i on across n i nterval s of t i me, when both process no i se and measurement no i se m i ght be present, woul d equal � 2r" , where � i s the correl at i on i nvol ved w i th measurement no i se, and r i s the true correl at i on across t i me of the process underneath. When th i s f i gure changes very l i ttl e w i th i ncreases i n "n", but i s substant i al l y d i fferent from 1 for var i ous val ues of n, then i t woul d seem that r i s very cl ose to 1 and that � i s not.
(9) The approx i mat i on here i s that exp (l +a) i s approx i matel y equal to l +a , w i th onl y about . 5 % error when "a" i s about 10 % . More prec i sel y, i f we take the exponent i al funct i on of both s i des of the equat i ons (6 . 10), and subst i tute i n from (6. 12) , the approx i mat i on rul e c i ted here bri ngs us back to (6 . 9 ) •
(10) Str i ctl y speak i ng , there i s one maj or qual i f i cat i on one m i ght make to th i s statement . When mak i ng a pred i ct i on, one usual l y starts from a g i ven base year, and appl i es the d i fferent i al equat i ons to that year as an i n i t i al cond i t i on . Th i s woul d correspond to ad i ust i ng k 1 and ks here, to f i t a g i ven year exactl y. One can expect to do better , i f one somehow averages d i fferent base years to get an est i mate of the underl y i ng real i ty, and uses that est i mate to make pred i ct i ons from . Adm i ttedl y, part of the advantage i n our "ext l " (E XT RA P) extrapol at i on probabl y l i es i n do i ng just that . I f there were a cons i stent change i n the rate of growth of these var i abl es , through t i me, and i f one were us i ng extrapol at i on models to pred i ct the same per i od of t i me as the one they were f i tted to, th i s woul d l ead to an unfa i r advantage for the extrapol at i on model s; the extrapol at i on model s woul d be centered at the m i ddl e of the process , but the ord i nary model s woul d be centered at the i n i t i al extreme. However , every one of the extrapol at i on runs here was accompan i ed by a run test i ng the pred i ct i ve power of the hypothes i s of a tsquared term i n (6 . 1 0); these runs gave no support to the i dea that factors i nvol v i ng a s i mpl e second der i vat i ve coul d be respons i bl e for the advantages of extrapol at i on . The ab i l i ty of extrapol at i on to average out extreme val ues measured i n the same, earl y per i ods of t i me i s not "unfa i r, " i nsofar as i t refl ects an advantage ava i l abl e to those try i ng
Page V l  1 3 5
to pred i ct the future from an extens i ve databank from the present and past.
(1 1 ) The computer pr i ntout here was l eft i n the custody of Prof. Karl Deutsch, Harvard Dept . of Government . The computer pr i ntouts from sect i on ( i i ) were al so l eft i n h i s custody, i n 1971 . These two groups of output are approx i matel y one foot th i ck, put together . For every run reported here, they i ncl ude pred i ct i ons and real i ty for every year, past and future, for wh i ch pred i ct i ons were made .
0 2 ) Deutsch, Karl W. , op. c i t . , Chapter 6.
{ 1 3 ) L i eberson, Stanl ey, Language .a..rui Ethn i c Rel at i ons in Canada, W i l ey, New York, 1970. See p. 47, 4 8 , 1 8 3 1 87 for rel at i vel y smooth graphs emerg i ng from seatterpl ots.
(14 ) i b i d . L i eberson focuses strongl y on the i ssues of l anguage "retent i on, " by those brought up i n one l anguage, as opposed to "demograph i c factors . " On p . 3 5 , he states that, 1 1 1 t i s far more correct to descr i be the Canad i an scene as an eq u i l i br i um based on counterbal anc i ng forces . " On p . 5 0 and p . 5 1 he emphas i zes, f i rst, that Engl i sh has been dom i nant i n terms of "retent i on" or ass i m i l at i on, but then, that the "revenge of the cradl e" has been central to French l anguage ma i ntenance. On p. 22 5 , he def i nes a var i abl e, "commun i cat i ons advantage, " qu i te s i m i l ar i n sp i r i t to the "l anguage pressu re i n commun i cat i ons" d i scussed here ; i n the subsequent verbal d i scuss i on, he i mpl i es that th i s var i abl e i s central to "retent i on" phenomena .
0 5 ) Deutsch, Karl W . , "Mathemat i cs of the Tower of Babel ", i n "Nat i on and Worl d", i n Contemporary Pol i t i cal Sc i ence: Toward Emp i r i cal Theory � McGraw H i l l , 1967 .
(16 ) Str i ctl y speak i n g, one must al so try to expl a i n the or i g i ns and convergences of such d i al ects, i nstead of merel y the dec i s i on by i nd i v i dual s to jump from one d i al ect to another . E quat i on { 6 . 26 ) , wh i ch can deal w i th the i dea of d i al ects gett i ng cl oser or further away as a resul t of commun i cat i on, i s conceptual l y q u i te cl ose to the model here . Yet one i s st i l l faced w i th the
Page Vl1 3 6
problem of explaining how divergence in dialect can come about. If the speech in each region were subject to random drift, or to systematic pressures based on interregional dif ferences in speech equipment, then a low level of interregional communications, in our model, would imply little damping of such drift. An increase in communications would imply a greater pressure for convergence, and a damping out of future drift. Note that such phenomena would also apply if there were a dramatic increase in communications between two regions with enough internal communications to resist assimilation as such; the growth of "franglais" is an interesting example.
(17) Schelling, Thomas C. , The S t ra tegy of Conflic t � Oxford U . Press, New York, 19 6 3. (Copyright 19 6 0. ) p. 1 04: "But where do the patterns (of potential compromise) come from? They are not very visibly provided by the mathematical structure of the game, particularly since we have purposely made each player ' s value system too uncertain to the other to make considerations of symmetry, equality, and so forth, of any great help . (i. e. of help in analyzing Schelling ' s parad igms for games of mixed conflict and common interest . ) Presumably, they find their patterns in such things as natural boundaries, familiar political groupings, the characteristics of states that might enter their value systems, gestalt psychology, and any cliches or traditions that they can work out for themselves in the process of play • • • " p. 1 51: 1 1 • • • the introduction of uninhibited speech ma y not greatly alter the character of the game, even though the particular outcome is different • • • " In short, tacit norms, before the introduction of explicit bargaining, are crucial to the existence of possible "patterns of convergence. " On p. 11 3114, Schelling hammers home the point that mathematical "solutions" to nonzerosum games do not provide a realistic alternative to his own theory of tacit norms, discussed on p. 9 9 1 11. In a sense, one might argue that the idea of "solving" for a unique or optimal static equilibrium may apply only to games similar to those originally discussed in such terms by Von Neumann and Morgenstern (note 3 9 of Chapter (V)); as in economics, there may be situations where d ynamic factors cannot be easily encompassed within such a static description. At any rate,
Page V11 3 7
Schelling ' s discussion, applied elsewhere by him to real political analysis, strikes us as fairly convincing.
(18) Deutsch, Karl W . , National ism and S ocial Communications, M IT Press , Cambridge , Mass. , 1966, r evised second edition, especially p. 969 7. Also, Deutsch et al , " Political Community and the North Atlantic A rea" , in In ternational Poli tical Communities, Anchor Books , New York , 1966 , p . 1 7 . In the latter reference in particular , the conce pt of "communit y " is defined in terms of an ongoing ability to communicate and res pond in decisionmaking processes of a continuous sort.
(19) S ee note 1 7 .
(20 ) See note 18 , particularly the second re ference.
(21) See note 16.
(22) Deutsch, Karl W. , Nationalism and Social Communications, M IT Press , Cambrid ge , Mass. , 1 966, revised second edition, p. 26.
(2 3 ) Feierabend , lvo K. and Rosalind L. , and Gurr, Ted R. , eds. , Anger, Violence and Politics: Theories fillQ Research, PrenticeHall, Englewood Cliffs, N. J . , 19 7 2 .
(24 ) K ravitz, Sheldon, A Theore tical Model .E.Q.r. .t.h.e. Analysis and Comparison of Ideologies, Ph. D. dissertation, May 19 72. Available c/o Widener Library , Harvard U. , Cambrid ge , Mass.
(2 5 ) Minsky , Marvin and Selfrid ge , Oliver G. , " Learning in Random Nets " , in Information Theory , Fourth London S ymposium published by Butterworths, 88 Kingsway , London W. C. 2. , U. K. , p. 3 39. In discussing this formula, Minsky and Selfrid ge consider only the cases E = 1 or E = O per e pisode, but the generalization d oes not ap pear very diff icult . These authors, in turn , refer to Bush, R . R . and Mosteller, F. , S tochas tic Models .f.ru:. Learning ., Wile y , New York, 19 5 5 , as a basic source.
(26) " Groupthink" as a smallgroup phenomenon has been widely d iscussed as a result of J anis, Irving , Victims of Groupthink, HoughtonMifflin, New York, 19 7 3 .
Page V l  1 3 8
( 2 7 ) H a u ge n , E i n a r , L a nguage C o n f l i c t a n d L a n gu age P l a n n i ng : � Ca se of Mode rn N o rweg i a n , Ha r v a r d P re s s , C a m b r i d ge , Ma s s . , 1 9 6 6 . Map on p . 2 2 9 .
u.
( 2 8 ) T he p r i ma r y s o u rce f o r t h i s d a t a , a s w i t h t h e da t a r e po r t e d be l ow , wa s t h e N o rweg i a n O ff i c i a l S t a t i s t i c s s e r i e s , common l y a v a i I a b l e i n U . S . 1 i b r a r i e s . T he N o rwe g i an n ame i s "N o rge s O ff i s i e l l e S t a t i s t i k k 1 1 ; w he n au t h o r d e s i gn a t i o n s a r e r e q u i r e d , t h e " C e n t ra l B u r e a u o f S t a t i s t i c s " o r " Se n t r a l . • • 1 1 i s u s u a l l y a pp r o p r i a t e . S c hoo l d a t a f o r J a n . 19 7 0 m ay be f o u n d i n t h e 1 9 7 2 11 11A r bok ( Y e a rboo k ) , wh i c h a l s o a p pe a r s a s Re k ke  X I I , n um b e r 2 7 4 . Da ta o n t h e u s e o f l a n g u a ge s i n e l eme n t a r y e d u c a t i on w e r e u s e d , a s on p . 3 3 5 of t h a t copy of t he A r b o k . T he l a n g u age u s e d a t a fo r e a r l i e r ye a r s we r e t a ke n f rom t h e e a r l i e r A r b o ks , b a c k t o 1 9 3 9 . I n s om e y e a r s , w h e n t h e u r b an / r u r a l b r e a k d own wa s n o t a v a i I a b l e , we u s e d t h e S ko l e s t a t i s t i k i s s u e s o f t h e N . O . S . E ve r y n umbe r ( i s s ue ) i n t h e N . O . S . s e r i e s i n c l u de s a l i s t o f t h e n umb e r s a n d t o p i c s o f o t h e r r e c e n t i ss u e s ; a l s o , o n t he f ro n t o r b a c k cove r a r e l i s t ed t h e n u mbe r s of p r e v i o u s i s s u e s on t he s ame t o p i c . ( T h u s , i n t h e 1 9 7 2 A r b o k a re I i s t ed t h e Re k k e a n d n um be r o f a l l p r ev i o u s A r b o ks . ) L a n g u ag e u s e i n e l e me n t a r y s c hoo l s i s e s s e n t i a l l y a ma t t e r of l oc a l c ho i c e ; H a u ge n , op . c i t . , g i v e s a few d e t a i I s o f t h e p r oc e s s o f l a n g u a g e c h o i ce . We d e c i de d , i n ou r c omp u t e r r un s , t o r e c a l i b r a te t he t i me pe r i od s o f t he da t a ; t h u s , l a n g u age u s e i n fo r c e i n s c hoo l s i n J a n u a r y 1 9 5 0 wa s t a ke n to b e a n i n d e x o f ac t u a l l a n gu age u s e i n 1 9 4 9 , g i v e n t h e l a gs i n v o l v e d i n c h a n g i n g po l i c y i n t h e s c h oo l s . ( 29 ) Sou rce s : N . O . S . , op . c i t . , Re k ke X I I , No . 2 3 3 ; R e k ke A , N o . 2 4 4 ; Re k ke A , N o . 2 9 2 . O r i g i n a l s t a t i s t i cs we r e f u r t h e r b r o k e n down b y sex , b u t ag g r �ga t e d fo r t h i s s t u d y .
( 3 0 ) N . O . S . , o p . c i t . , Re k ke X I , N o . 2 9 8 f o r t h e mo s t r e c e n t d a t a , a n d p r ev i ou s i t ems i n t he s ame t o p i c s e r i e s . N o te t h a t R e k ke X I I , N o . 2 3 2 , wh i l e n o t con t a i n i n g t h e a p p ro p r i a t e b r e a k down s b y u r b a n a n d r u ra l , does p ro v i de d e f i n i t i o n s i n E n g l i s h . ( 3 1 ) IJ . O . S . , b a c k wa r d s f rom R e k ke X I I , N o . 1 9 8 . A g g r ega t e d a c c o r d i n g t o t h e u r b a n / r u r a l de f i n i t i on s o f t own s h i ps s pe l l e d o u t i n t h e S ko l es t a t i s t i k k s e r i es .
Page V l  1 3 9
( 3 2 ) !J . O . S . , op . c i t . , b a c kwa r ds f rom R e k ke X I I , N o . 2 2 0 . Some a gg r e g a t i on r e q u i r ed i n e a r l i e r ye a r s ov e r s e x , e t c . I n 1 9 G 7 , R e k ke X I I , N o . 2 4 4 was used .
( 3 3 ) N . O . S . , op . c i t . F o r i n co me , R e k k e A , N o . 3 6 3 , u s e d f o r 1 9 6 8 o n b a c k . ( 1 1 Mun i c i pa I Tot a 1 I n co me , 1 1 f rom T a b l e I A . I n e a r l y ye a r s , some a g g r e g a t i o n req u i r e d ; howe v e r , t h e co l umn n a mes we r e u n i f o rm e n o u g h t h a t i t wa s n o t too d i f f i c u l t to rec on s t r u c t t h e s a me a gg r e g a t i o n s a s u s e d b y t h e N . O . S . au t ho r s i n l a t e r yea r s . ) I n Re k ke X I I , N o s . 2 4 5 a n d 2 5 2 , a n i n d e x o f con s ume r p r i c e s w a s f o u n d , fo r t h e e n t i r e pe r i o d . ( i . e . A n h i s t o r i c a l t a b l e w a s a v a i I a b l e . ) T he a ve r a g e s i z e o f t h e r a t i o , no rma l i z e d to u n i t s a p p ro p r i a t e fo r s t a t i s t i c s , wa s on t he o r d e r o f u n i t y .
( 3 4 ) T h e s e d a t a i n c l u d e d a t a on c r i m i n a l con v i c t i o n s e a s i l y a va i l a b l e I n t h i s pe r i o d , s t a r t i n g b a c k f r om t he A r b o k ; d a ta on he a d s of h o u s e h o l d s o n we l f a re , c on t i n u e d b a c k i n t h e S t a t i s t i c a l Mon t h l y ove r t h e e n t i r e pe r i o d ; d a ta on p o p u l a t i o n , n o t r,e ne r a l l y a v a i l a b l e i n r e c e n t ye a r s w i t h t h e d e s i r e d b re a k down , b u t i n t h e A rb o k w h e n av a i l a b l e . ( 3 5 ) l s a r d , Wa l te r , M e t h o d s o f Reg i on a l A n a l ys i s : A n I n t ro d uct i on t o Reg i on a l Sc i ence1 M I T P re s s , C a m b r i d g e , M a s s . , 1 9 6 6 , C h a p te r 1 1 . O n p . 5 0 0 , re fe r en ce i s m a d e to S t ewa r t a n d Z i p f , t h e t wo f a t h e r s o f t h e i d e a ; on p . 5 0 6 , a co n c e p t o f s oc i a l d i s t a n c e i s men t i o n e d , s l m i l a r i n s p i r i t to " U .,j " ; o n p . 5 0 7  5 1 0 , emp i r i c a l r e s u l t s a r e d i s c u s se d . See a l s o De u t s c h , K a r l W . a n d l s a r d , W a l te r , "Towa rd a Ge n e r a I i z e d C on ce p t o f D i s t a n c e " , B e h a v i o ra l S c i ence , N o v . 1 9 6 1 .
( 3 6 ) G a l l e , Owe n R . a n d T a u e be r , Ka r l E . , "Me t r opo l i t a n M i g ra t i on a n d I n t e r ven i n g Oppo r t u n i t i e s " , Ame r i can S oc i o l og i c a l Rev i ew, N o . 3 1 , F e b . 1 9 6 6 , t a b l e on p . 8 . N ote t h a t t h e s e a u t ho r s a r e e s s e n t i a l l y c r i t i c s o f t h e g r a v i t y mode l ; t h u s t h e i r r es u l t s a re p a r t i c u l a r l y i n t e r e s t i n g . F o r o t h e r wor k i n t h i s a re a , s e e n o te 3 5 .
( 3 7 ) Wo r l d A t l a s , Mo scow, 1 9 6 5 , p . 5 7 . A l a r ge m a p o f N o rwa y , w i t h ma j o r r o a d s i n d i ca t e d , w a s u s e d . D i s t a n c e was me a s u re d w i t h a c e n t i m e t e r r u l e r , fo r t h e mo s t d i r e c t maj o r ro u t e b y r o a d ; howeve r , i f
Page Vl 140
this s hou l d exceed the abs o l ute distance by 40% or more, then the direct dis tance p l us 40% was used. For distances from an urban area, either there wa s one maj or city, or severa l which 'cou l d be averaged. For rura l di stances, it was as sumed that popu l ation den sity was even throughout each region ; averages were es timated on that bas is. A l l of the data here was punched on cards, and read Into the M IT Mul tics machine ; the punched cards and code s heets may be made avai t ab l e to future us ers through the office of Prof. Deut sch, if there Is interes t J n s o doing.
Page 1/ 1
CV) G E N ERA L A P PLICATIONS OF TH ES E IDEAS: PRACTIC A L HAZA RDS AND N E W P OS SI BI LITI E S ( i ) I NTROO IJCTI ON ANO S UMMA RY Fierce debates continue to rage between those w ho
would stud y beh avior "with mathematics" and those w ho would stu dy it "by traditional means. " T hese debates,
by drawing attention to the extremes, h ave o bscured many of the serious h azards and many of the most
important application s of mathematical approaches in government and in psycholog y. In ex tending the
mat hematical approaches further, we h ave a special
responsibility to discuss the new application s and the
continuing h azards w hich may result.
We will begin, in section ( i i ) , by presenting the
viewpoint of the practical decision Maker, w ho h as not
used mathematical met hods so far, for good reasons. All of this chapter will be organized around the
difficulties w hich h e faces; oth er possible u sers of
our ideas  th e social scientist, the psychologist and
the ecologist  will be mentioned within more limited
contexts. In section Ci i i ) , we wi 1 1 su g gest a common
framework for evaluating verbal and mathematical tools,
both, based upon the co�Mon goal of prediction; within
Page V 2
this framework, the mathematical methods h ave a role to
pla y, at least in principle, both for what they tell us directly and for what the y tell us a bout common a b uses
of verb al methods .
For those w ho pre fer to d e al in
concrete e x amples, rather than a bstract generaliz a tions a bout methodo 1ogy, 1t1e have I nc 1 u de d a number of
relevant e x amples, mostly in the footnotes.
Also, at
the end of this section, we will d e scribe in d etail how this framework has motivated the development of new
mathematical proce dures d e s cribed in the other chapters
of this thesis .
In section ( iv ) , we will go f rom principle to
practice ; we wi 1 1 d iscus s specific w a y s in which
statistical methods may be used, in close relation with
ver b al me thod s, and be of signif icant value in re al
prediction e f forts . This discussion will not be b ased upon the well known philoso phy of logical positivism, but on the more recent philosophy of Ba yesian
utilitarianisri ( see section ( v ) of Chapter
( I I )),
philosophy which underlies the actual mathematical
developments we have d iscussed ;
the
at any rate, the
utilitari an approach helps keep us focuse d on the value
of our methods to seriou s policy ma kers .
In this
section, we h ave also trie d to crystallize out our own
experience with the numerous w a y s in which statistical
Page V 3
research can turn ou t to be usel ess and misl eading for the pol Icymaker, if i t is done in a caval ier manner ;
our suggestions are no t a definitive answer to al l of t hese d ifficul ties, bu t at l east they may hel p.
Final l y, in section (v ) , we point ou t th at the
central rol e of h uman psychol ogy in pol itics seriousl y l imits the possibil ities of naive empiricism , bo th
verbal and math ema tical .
Statistical studies, l ike
verbal research of th e purel y empirical variety, may be unabl e to transcend these l inits. However, the
mat hem at i ca 1 id eas
d i s c u s se d
in C h a p ter
(I I ),
a 1 on g
with o th er offsh oo ts of the Bayesian a p proach , can be applied in a d ifferent way, to h el p overcome these
l imitation s in a way wh ich words al one canno t ;
il l ustrate this point, we wil l mention specific
to
possibil ities fo r using these ideas in the fu ture to
cope wi th and ex pl ain the p henomenon of intel l igence, whether in h uman socie ties o r in h uman brains .
(ii ) THE L IA B I L IT IE S O F MATH E MA TIC A L ME THODS IN PRA CT ICA L D E CIS IO N  MA K IN G
Let u s star t ou t b y reconsidering o u r tacit
assump tion th at rnethematical methods do h ave some use, af te r al l , in pol itical science and in pol itical
Page V 4
decis ionmaking.
Many peop l e have ques tioned this idea
in the past few decades, and many more may have fe l t
strong private reservations about the idea. Whi l e we
c l earl y can be ex pected to reaf f irm the va l ue of
mathematica l methods, on the who l e, we a l so bel ieve
that the traditiona l com pl aints against mathematical
methods do contain rea l information which the user of such methods s hou l d not i gnore.
I n particul ar, l et u s try to ex pres s the
reservations about mathematical methods which might be
hel d by the active po l it ical actor. Practicing
dipl omats and pol iticians have often found that their
margin of succes s depends on their abi l ity to seize
upon uniq ue twists in the po l itica l or p sycho l ogical
environment, twists which a l l ow the individua l to
escape the seemingl y uncontro l l ab l e tide of events that
one woul d expect a mathematica l model to extrapol ate. Sometimes this invol ves the abil ity to establ ish
channel s of serious communications between dif f erent
pol itical group s, channe l s which can grow in importance
once they have been es tab l is hed.
Sometimes this
invol ves the abi l i t y to seiz e u pon an economic or
mil itary advantage .
Caes ar ' s G al l ic Wars are a c l a s s ic
exampl e of the l atter sort of imagination, evading
Lanchester 1 s Laws at every turn (l ) ; Lid de l l Hart, in
Page V 5
his clas sic text on military strate gy (2 ), h as
emph asized that such imaginative approac hes h ave bee n
d e cisive i n wars thro u g hout h is tory . I n both cases, one
achieve s a greater "be nefit" within a given "cos t
con straint", not by being tig ht an d pre cis e about
bud getin g one ' s re sources, but rather by pre serving the detachment an d the free dom one will need i n order to
seiz e upon w hole new option s, whic h may ope n up a whole new fron tier of pos sibilitie s .
Political creativity in
this form is d ifficult e nough to e n compas s within an y s c holastic conte x t, let alone the context of
mathematical mod els ; therefore, politic al s cie nti s ts
who h ave a stron g attachment to this proce s s would
n aturally ten d to be s ke ptical of mathematical models .
More ge nerally, succes sful political actors, like mos t succes sful profe s sion als, would te nd to b elieve that
they stretch their min d s to the limit, f n order to
arrive at their pol icy d e cis io n s ;
they may con clude
t hat the s heer complexity of their own de cisio n  makin g
militate s again st the prediction of its outcome by
math ematic al s ys tems which acco u nt for f ar le s s information conte nt.
Furthermore, f t is al so likel y
that a l arge part of this information, even when
ac ce s sible to the pol itical s cie ntist, may be e n cod ed
in a verbal form whic h militate s against its bein g
Page V 6
accounted for by mathematical mod els. Ciii) P R E D I C T I O N : A C OMMON G OA L F O R V E R R A L A rm M A T H EMAT I C A L S O C I A L S C I E N C E
T h e s e dif ficultie s can b e dealt with o n several
dif f erent levels . Le t u s be gin on the simples t lev el .
T h e dif f iculties above point to th e impos sibility
of constructing math ematical models w h ich will predic t
exactly what will h anpen in politics, in detail, in the s hor t term and in th e me dium term .
However, the s e
dif ficu l ties h ave also been enou gh to make i t
impos s ible for any h uman being, political actor or
oth erwis e, to pre dict ex actly what will h appen in all
of politics, in th e s hor t term or medium term. A
traditional political s cientis t or math ematician mig h t
deduce at this point that " true pre diction", in the
sens e of ex act pre diction, is impos s ible in political
science. T h er efore, in ord er to as s ure h im self that h e
is involved in s eriou s work, h e may res trict his
attention to pro posi tions w h ich meet an Aris totelian tes t of " tru th ", s uch as s tatements abou t h is torical
documents (3 ) or abs tract theorems w h ich he can prove by
rigorou s deduc tion.
Very few political actors,
h owever, feel that they WO LJ ld want to turn away from
Page V7
the d i ff i cult but pr i nary q uest i on of pre d i ct i ng the
d i fferences i n outcome b etw e en the d i fferent act i ons
the y could take. These pre d i ct i ons may always i nclude
factors of uncerta i nty , but the pol i t i cal actors would
f i nd i t i nterest i ng enou �h to reduce th i s uncerta i nty
as much as poss i bl e , i n an y poss i bl e way. Thus, i nsofar as pol i t i cal sc i ent i sts are concerned w i th develop i ng
object i ve i ns i ghts of the max i mum poss i bl e value to
the i r consumers, the pol i t i cal dec i s i onmakers at all
levels, the i r
ul t i mate concern would be w i th the
de vel opment of effect i ve probab i l i st i c theor i es to
pred i ct pol i t i cal and soc i al systems.
There are f i ve po i nts worth not i c i ng about our
emphas i s on pred i ct i on here. F i rst of all, th i s
emphas i s does not restr i ct i tsel f to the overtly
mathemat i cal phases of pol i t i cal sc i ence.
The
dev el opment of "pre d i ct i ve model s"  math er,at i cal QL
verbal , or even anal ogue for that matter  i s a general
concept, wh i ch can be used to gu i de h i stor i cal research
as eas i ly as f t gu i des stat i st i cs . Many of the "grand t heor i es" of pol i t i cal h i story, i nclud i ng especi ally
the theor i es of Spengl er (4 ), Toynbee (S ) , Turner (6), H e gel (7 ) , and Marx , were desi gned to hel p people
"understand" h i story i n terr,s of a verbal dynam i c model
wh i ch coul d al so be used to pred i ct the future.
Page V 8
T raditio n al ap p roaches to r esearch in political
science, h0\1eve r , mi ,r,ht te n d to
11
de ve 1o p" such theo r ies
by addin g com ple x strings of qualificat f on s, an d b y
fo r ci n g an elabo r ate , pe r f ect A ristotelian fit o f the
we ake r , more specialize d p ro oositio n s which eme r ge . Ou r own app roach would ask that political scie ntists
contin ually retur n to the main questio n , to the abi l ity of their theo ries, with the dis claime rs remove d, to
p re dict the b ro ad fi r s to rde r tre n ds r n the maj o r , most
o b vious variables of political h isto r y.
Seco n d, our emphasis o n p redictio n can be
j ustified on dee pe r ,r, ro tJ nds than those of satisfyin g
those who p ay fo r the bulk o f the po l itical research .
Followi n g the philoso phy of utilitarianism , o ne might
simpl y regard political science itself as o ne
particular phase of po litica l activity; one might even
suggest that its maj o r justif icatio n fo r existe nce , in the lo n g te rm, is its ability to co n tribute to
co nstr uctive political
activity.
to the p rimar y nee d of the
This takes us b ack
de cisio nmake r to p r e dict
the results of his actio n s , at least o n a p ro b abilistic
b asis.
On the othe r han d , even if o ne we re willin g to
accept the ethical p ri n ciple that truth should be
pursued fo r its own sake, as an ultimate goal equal to
o r highe r than the go al that o f human welfare , o ne
Page V9
still f aces the p robl em of d e fining what this "ultimate
truth" would consist of .
One might s ug � e s t that
ultimate tr uth, if it does exist in political science, l ie s not in the changeable f acts of cu r r ent
happens tance, b ut r athe r i n the l e s s changeable d ynamic law s which l ead f rom one s et of circumstance s to
anothe r ; in phy sics, f o r exampl e, the d ynamic f ie ld
e quations are consid e r ed the highe s t scientific t r u th,
whil e the codif ication of the wavef u n ction of the unive r s e is not an o b j ect of s e riou s s tud y.
Admittedly, our k nowledge of the d ynamic law s, unlik e
the laws them s el ve s, is l ikel y to b e changeab l e for a long time, in pol itical science as in phy sics ;
howeve r , it woul d b e meanin�l e s s to speak about the advancement of knowl edge as a wo rthwhile goal, we r e
the r e not such a pos sibility f o r change and expansion in the state of knowledge .
The r e ar e thos e who woul d
que s tion, in var ying d e g r e e s , the p rimacy of the mo s t
ab stract d ynamical e q u ations e v en i n a field l ik e
phy sics ; howeve r, even the "phenome nological approach", in that field, involve s the construction of powe r ful, gene r aliz ed p r edictiv e s tatements, s tatements about
what to expect af te r s etting up expe riMents of
dif f e r ent type s (8 ) .
Many times in political s cience, the concepts of
P a ge V 1 0
"c ausa tion" and "ex planation" h ave been cited a s fo rms
o f truth worth pursuing (9 ) . The sta tement th at
"A caused R" r1a y be translated roughly into the
sta tement th at, "A occurred before B, and th e dynamics
of the system were such th at B would not h ave occurred if A h ad not occu r r ed w h en it did, ceteris paribus. "
Once a gain, the critical q uestion to answer is th at o f
th e dynamic laws w h ich govern political systems.
Beyond the goals o f so cial utility and "ultimate
truth", the pol itical scientist might also pursue the
go al s of cultura l enrichment and entertainment. T hese
goal s are o f ten cited as a jus tification for extreme
traditional ism in pol itical science. Whether th ese
goal s are now being pursued effectively by all o f those
who cite them is a dif f icul t m atter to judge, wel l
beyond the ran�e o f the present discussion . However, a
l arge part of the "cultural enrichment" involved woul d appear to involve the l earning o f l essons abo u t h uman
psychol o gy, about w h at patterns o f thoug h t and beh avior one migh t predict on the part o f human beings or h uman groups in unusu al circumstances, in other cultures.
T h ird o f al l, one should note th at our emph asis on
prediction as the ul timate goal o f political science
does no t imply th a t wor k o f a more descriptive na ture
Page V1 1
should simply be abandone d .
Using s tatis tical method s ,
for example, one must f ir s t collect a s et o f d ata,
b e fo re one can f it or te st a mod el . After one has fit a verbal or statistical mo d el to a given s et o f
fir s torder d ata, one can then g o b ack to the original source s of inf o rmation to s pecif y the s trengths and
weakne s s e s of one ' s mo del J n more d etail, with greater
accuracy ; even if one cannot modif y one ' s mod el e asily
to handle the exce ptions, one can try to expre s s the information embod ie d in the exception s in a more
compact, more ab stract f orm, to mak e lif e easier for
tho s e who wis h to make pre dictions o r to modif y the
current mo d els in the f u tu r e .
In brie f , w e are s u � g e s ting that d escriptiv e
re s earch b e viewe d as a means to an end, with the end being pre diction . A direct and total as s ault on the
objective of pre diction may ind e e d b e a poor s trate g y
for achie ving this end . Howe v e r , o u r s ucces s i s likely
to be even le s s, if we do not keep the basic objectiv e fixed firmly in our mind s .
Every once in a while, it
is im portant to bring to�ether the vario u s
pro positions, mathematical and verb al, which one b elieve s to be u s e f u l in pre diction, and s e e how
e f f ective (and consis tent ) the y really are in co ping
with the overall picture. When there are major new
Pag e V  1 2
defects o r pos sibilit ies appa r ent at the gene r al level, it is import ant to ta ke note of then, at that level, so
that they can be u sed a s a g uide fo r mo re specializ ed resea rch in different b r anches of po l itical s cience. Descriptive wo r k may not only help provide the
ba sis fo r evaluating d ynamic models of politics ; it ma y also help those who wish to p redict polit ics, b y
telling them w h at the cu r r ent s tates of the s y stems a re to which they would lik e to a p ply the d ynamic law s.
The longer the policy horizon, howeve r, the mo re
impo rtant it is to u se mo r e gene ral d ynamic models,
instead of ass uming some sort of simple ex tension of
p resent trend s and condi t ions a s g a u ged b y descriptive stu dies.
Finally, while p rediction may be advocated a s
the p r im a r y goal of o b j ective political science,
normative political science r emains another m atte r .
One may note, in this connection, that the attempt
to max imiz e accu r acy in description, b y itself, lea d s nat u r all y to a numbe r o f uncoo rdinated, special i zed
effo r ts, focu sed in depth on d ifferent p r im a r y so u rces of info rm a tion ( 1 0 ). In p red ic tin g comp l ex d y n amic
s ystems, in contr a st, one finds oneself le d to focu s
fir s t of all on the inter actions between the p rima r y s ub s y s tems, at a n agg regate level. Thu s In or der to
ma ke "inter dis ciplina r y r esea rch" a r eality in the
Pa�e V 1 3
social sciences, it is es sential that the ideal o f
predictive power gain a t lea s t a s much credence a s the
ideal o f descriptive f ines se, In the detailed conduct
of actu al s tu dies .
Fo urth o f al l, one should n ote that o ur emphasis
on prediction does not req uire a reduction in the rigor of tho u ght, even if it do es require that we go beyond
the Ari s totel ian conce pts o f truth vers us f a lseho o d a s
a scertained b y traditional u ses o f dedu ctive lo gic . The
mathematical theory o f pro b a bil ity , and the B ayesia n theory o f
inductive lo gic, have long provided a
rigoro u s ba sis for handling model s which do predict the f u ture bu t which a vo i d the determinis t ' s pretense o f absolute certainty .
Aris totelian s tatements , which
tell u s that a pro po sition is s im ply true (probability
one ) or f alse (pro b a bility z ero ) are simpl y a s ubset o f the statements which can be ex pres sed in rigoro u s
probabilis tic f a shion. In either case, the s tatements
that we ma ke may well be Inaccurate, If they are fo unded on f a ulty information; the l ang u a ge o f
probability, however, a t l eas t l ets u s expres s
precisel y how
much confidence we do have in a
pro po sition, ins tea d o f forcing u s to s ay nothing or to
e xclude totally a real b ut les s pro bable contingency.
P a ge V14
When socia l trends wil l d epend on longl asting but uncertain and novel phenomena, the language of
prob ab il ity encourages u s to escape the fal l acy that
definite optimistic or pessimis tic predictions are
s omehow more infornative than the truth . (See, for
e x am pl e, footnote (6 ) . )
To the statis tician, all these concepts have l ong
b een ob viou s .
In verb al rese arch, however, the
cl assical Aristotel ian procedures have remained dor.1inant .
Onl y in recent years has the "B a yesian
school " begun to e d ucate ver b al de cisionma kers in the
use of pro b a bil ity theory as a general ize d l anguage of
thought ( l l ) . In section (v ) of Chapter (11 ) , we have
emphasiz ed the point that the B a yes i an approach to inductive re asoning can be a ppl ie d to ind uctive
rea soning as a whol e, not merel y to re asoning a bout quantitative variabl es.
When, in verb al research, one
finds onesel f deal ing with the behavior of quantitative
variabl es, such as the degree of po p ular d iscontent,
etc . , one m a y even go so far as to d iscuss the "de gree of fit 1 1 , the "de grees of free dor, 11 and the "exogenous
variabl e s" of one ' s verb al model , on the understanding
that one is e xpressing one ' s model in verb al terms onl y beca u se of the l ac k of hard d ata; even then, one m a y
want to draw together the el ements of one ' s model , and
Page V 1 5
e xpre s s them in incre a s ingly mathematical term s , even if the parame ters canno t be e as ily meas ure d, in order to make i t s meaning mor e and more e xplicit , and in
order to impro ve i t s "coh erence", i. e. I t s comp 1 e tene s s and i t s consis t ency.
Finally, and mo s t impor t an tly,
.Q.YL
empha s is .Q!l
Qredic tion � been the driving force behind bo th
.Q...Y.L
empirical rese arch, and Q!!L conclus ion tha t
conventional ro u tines f or timeseries analysis � inadequate.
The empirical wor k on poli tical s cience
in this the sis was mo t ivat e d almo s t entirely b y the
at temp t to conver t the Deu t s chSolow equations ,
mentione d in Chap ter
(I
I), int o a u s e f u l tool for the
pre dic tion of national as s imilat ion and polit ical
mo biliz a tion. We s tar t e d o u t years a go b y t e s t ing o u t the Hop kins ro u tine s ( 1 2 ), which try to e s timate the
coe ff icien t s o f the Deu t s chSolow mo del f rom only three da t a poin t s , on the as s um p tion that t h e Deut s ch
e q uations are to tally "true" in the Aris t o telian s ens e ; it came as no s ur pris e to u s that the re s ul t ing
predic tions we re rather poor.
Our ne x t s te p was to try o u t times erie s mul tiple
regre s sion, the mains t ay o f
11
econorie trics 1 1 ( 1 3 ), of
"path analysis 1 1 ( 1 4 ), and o f "cau s al analysis 1 1 ( 1 5 ) and
Page V1 6
so on .
It was eas y enough to meas ure as similation and
mobiliz ation coefficients w h ich were significantly different from z ero, and multiple correl ation
co efficients larger th an ninety percent; these res ults were wel 1 with in the ran�e of w h at are regarded as
"succes sful conclusion s " in mo s t quantitative pol itical
research tod ay ( l 6 ) .
However, our emp h asis on
prediction led us to loo k a bit more clo sely at these
resul ts;
we wrote a new pro gran, S E RI E S , to estimate
the regres sion nodels, and then to test their ahill ty
to predict d ata acro s s long intervals of tine,
interval s comparable to tho se tested w i th the Ho pkins pro gram . T he error s , w h ile l es s than tho s e of the
Ho pkins programs, we re s tl l 1 unacceptably lar �e . A
sim ple curvefitting procedure, b y contras t, was able
to make predictions with les s th an h al f as much erro r,
averaging to about 4 % error over perio d s of time on the
ord er of a century ;
this average encompas ses a number
of cases w herein the mo del was fitted to d ata in one perio d of time, and used to predict d ata in later period s .
\falter lsard, in h is cl as s ical s tud y of
methodol o g y I n re�ional s cience ( 1 7 ) , h as made s trong
statements against the ability of regres sion mo dels to
predict th e futur e ; while h is argument is p hrased in
Pag e V17
t h eore tical t erm s , t h e wide cov erage of his s tudies
v✓ould impl y an empirical b asis for h is con clusion s . T h e Broo kin g s I n s ti t u t e h as al so repor t e d major
difficul tie s in t h e u s e of regres sio n i n forecas tin g ;
t h e y found t h at t h e s e difficul tie s cou l d b e reduce d b y t h e u s e of an "ad jus t111 e n t" factor not too diff ere n t in
spiri t f rom our simple curv efi t tin g procedure (l R ) .
Thus t h e empirical basis of t h e s e con clusio n s goe s well beyo n d our ow n e x am ple s.
In our recen t p h as e of empirical poli tical
re s e arch, we b egan wi t h t h e hope t h at t his weakn e s s of
re g r e s sion, in e s t f matin � pre dictive models, could b e un ders tood wi t h in t h e clas sical an d elegan t framework
of nax imum likelihood t h eor y, as d e s crib e d in s e c t ion (v) of C h ap t er
( 1 1 ).
In s tead of que s tio n in g t h e
clas sical procedures of s ta t is tic s, w e hope d to apply t h e s e proce dure s to more sophis t icat e d models.
In
Chap t e r (III), we h av e not ed t h at "wh i t e nois e" in t h e
proce s s of measuring d at a can tu r n an ordinary
"au to r egre s siv e proce s s " in to a "mix e d au t o r e gre s sive
movingaverage proce s s. "
Accor ding to s t atis tical
t h eory, mul t iple regre s s ion Is a good way to e s t imat e
t h e former proce s s, b u t a b ad way to s tud y t h e lat t er . A simple diagram can s how how bad t his problem
migh t b e come, in practice . Giv e n a sin gle variab le, z,
Page V 1 8
which has a true correlation of ¢ with itself
across
time (i . e. z (t) with z (t+ I )), a n d a correlation of r
with the me asu r emen ts, x, that are m a de for z, we find
that the correlatio n be twee n x (t) a n d x (t+ l ) l s due to a n in dire ct path of correlation s:
F igure V 1: Pathways of Correlation Wit h Noisy Data If we m a ke the simplif yin g a s sumntion (a big one) that
the process is not a n y worse, that there is n o
cor relation between x (t) a n d x (t+ l) in depen dent of this pathway, the n classic al theo r y tells us that the
correlation between x { t) a n d x (t+ l ) will equal r times
w t i nes r, A.
•
•
1
nt , If regressio n were use d to predict . e . rl.'fJ
x (t+ l) from x (t) (the obseryed d ata), the regression coe f ficie n t wo uld e q u a l the simp l e correlation
coe f f icie n t, r�¢, inste a d of the n umber ¢ ; yet whe n predictions are made over lon ger i n terv als of time,
the n ¢, the coe f f icie nt of the u n derlying process I n
the real world, is the proper basis for pred iction (1 9 ) . This e x ample wou l d also a ppear to point to the idea
that sim ple "path coe f ficients" which are not e f f ective
in prediction are not like l y to re present the true
Page \/1 9
underlying relations, either.
If r  the correla tion between the true v ariable
and the measurenents o f the varia ble  were a bout 9 5 % ,
then the o b serve d regre s sion coe f f icient (r"l. ¢ =
( . 9 5 ) ( . 9 5 )) would be a bou t 1 0 � snaller in size than
the right re gression coe f f icient (¢) for use in
longrange pre dictio n . Furthernore , since this 1 0 %
error wo uld re present a general shif t In the value o f a coeff icient, one wo uld ex pect that the use o f the
regression model wo tJld le a d to errors which accumulate at the rate o f ten percent pe r time perio d;
it is easy
to see how this phenomenon alone co uld v itiate the predictive power o f regression.
In the case o f a
single v aria b le , this 1 0 % error a p plies to a single
large correlation coe f ficient; therefore , one can ho pe
that re gression will at least preserve the sign o f this co e ff ic Ien t intact . In the case o f many v ari ab les,
howe ver, the 1 0 % error wo uld a p ply to a cor relation
matrix ; small b u t critical crossterms, on the order o f
�5 % , might conceiv a bly have their signs reverse d , d ue
to the spurio us e f fects relate d to other, larger terms in the s ame matrix .
A fter all , the importance (and
much o f the detectability ) o f such "fee d b ack terms"
lies precisely in their a bility to accumulate and determine the longterm behavior of the system;
it is
Page V 2 0
precisely the lon gterm behavior which we find poorly
accounted for h y regre s sion .
In order to accou nt for s uch e f fects , within a
clas sical statis tical f ramework , we have devised a new
algorithm to e s tiMate mixed autoregre s sive
movin g  average proce s se s ("A RMA" proce s se s ) at a
manage able cost. We have applie d this new algorithm to the old Deutsch  Kravitz data on as siMilation and
mobiliz ation in a dozen or so nations , and we have also applied it to new d ata on ling uis tic as similation in IJorway . In both case s , s tatis tical theo r y indicated
that the A RMA model was better than the old regres sion
model ; it indicated onl y a sMall prob abilit y , f ar le s s than 1 % in almo st every run, that the improvement was due to coincidence .
However, when lcl.e. went
.Q.!1
to a pply
� .t,g_il of predictio n , we were q uite dis appointe d. The A R MA model d id indee d re d uce pre dict ion errors , in
comparison with regre s sion, by about 1 0 % of the
original root mean s q u are average of the errors , in the
case of o ur large st data s ample ; yet this is still far le s s than the 5 0 % re d uction achieve d earlier with
extrapolation.
The succe s s of extrapolation would appear to
indicate that the underlyin� proce s ses are still more deterninistic than either the re .r;re s sion or the A RMA
Page V  2 1
mo d e l s w e r e a b l e to d i s co v e r . A p p a r e n t l y, t h e
me a s u r eme n t no i s e a n d t r a n s i e n t f l u c t u a t i o n s w e r e too comp l i ca t e d f o r a "wh i t e n o i s e " mo d e l t o cope w i t h .
In
r e t ro s pec t, i t s ee ns c l e a r t h a t on e s h o u l d h ave
expected preci s e l y s uch a s i tuat i o n , i n the case of
b o t h t h i s a n d mo s t o t h e r d a t a i n po l i t i c a l s c i e n c e . T h e
so l u t i o n to s u c h d i f f i c u l t i e s, i n t h e c l a s s i c a l
p h i l o so p h y o f ma x i mum l i k e l i h oo d, i s f o r u s to po s e
eve r mo r e conp l i c a t e d h i g h e r  o r rl e r mo d e l s o f p r o ce s s no i s e a n d me a s u r eme n t no i s e . (W i t h some o t h e r
d a t a  s e r i e s, h owe v e r, t h e A R MA mo d e l , o r eve n t h e u s u a l r e g r e s s i o n mod e l , m i gh t b e a d e q u a t e . )
How ev e r, t h e
mu l t i v a r i a t e A R MA mo d e l a l r e a d y con t a i n s a l a r ge e n o u g h n umb e r o f d e g r e e s o f f r e e dom ; to d o u b l e o r t r i p l e t h e n um b e r o f coe ff i c i e n t s t o e s t i ma t e wo u l d p u t a h e avy b u r d e n o n a l l b u t a f ew ve r y l a r g e d a t a s e t s, wh i l e
s t i l l compe n s a t i n g fo r o n l )t mo de r a t e con p l i c a t i o n i n t h e no i s e p r o ce s s .
T h e s ucc e s s o f s i mp l e ex t r a po l a t i o n po i n t s to
!ll.Q.C_g_
.a.
pr a c t i ca l app ro ach .tQ. p r ed i c t i o n . I t po i n t s to t h e
p o s s i b i l i t y o f " ro b u s t " es t i ma t i o n, o f e s t i na t l o n t e c h n i q u e s w h i c h ca n p e r f o r m w e l l d e s p i t e a n y
ove r s i n p l i f i c a t i o n s i n o n e ' s o r i R i n a l mo rl e 1 (2 0) ;
good pe r f o r m a n c e, i n t h i s co n t e x t, me an s t h a t t h e
a
co e f f i c i e n t s o f t h e mo d e l a r e e s t i ma te d i n s u c h a w a y
Pa � e V22
that the mo del wi 1 1 have maximum pre dictive power .
In
s ection (vii ) of Chapter ( I I ) , we have suggested that
simple e x trapolation Is "ro bu st"  more ro bust than
A R MA estimation, even a priori  beca use it is base d on a sinple "ne asurement noi s e only mo del, " a s tatistica l
model huilt to e xtra ct the deterministic underlying
trends (if they e x is t) from a process a ff licte d by a
comple x pattern o f trans i ent noise and me asurement
error ; we have pointe d out that the general dynamic f eedback pro ce dure o f Cha p t er ( I I) can be u sed to
estimate mor e general mo d els o f this t ype ,
econo mically . ( It is also possible to m a k e some
allowance for process nois e in an a d hoc w a y (2 1 ) , but
the best way to ma ke such allow ance while preserving
ro bustne ss is uncle a r ; there Mi�ht be no general
theoretical answer to this q u e s tion . ) We have also
discussed another new techniq u e in Cha pter ( ! I),
"pattern ana lysis", to draw o ut more direct
me asurements of the unde r lying dynamic variables.
In brief : Ql1L emphasis on prediction has led .Y.1i to
� conclusion .that. sta tistical metho ds based
Q.O.
..tb_g_
concent of maximum lik elihood alone are inadequ a te 1n. pract i cal emnirical research . .Lt. � led � .t.Q. the
theore tica l conclusion .that. predictiv e power its elf
needs to be maximize d more e xplicitly 1n. the model
Page V 2 3
P.Stima tion techniques avail a bl e to beha vior al
scientis ts. In Chapter (II ) , we have su ggested wa ys to
build new s ys tems on this principle. Thes e s ys tems are
schedul ed to be av ail a b le as part of the TimeSeries
Processor pac kage, designed for social s cientis ts, in
1 9 74 at Proj ect C ar,bri rl .�e, M. I. T.
(iv) POS SIBI L ITI E S F O R S TATISTIC S A S A N E M PIRIC A L TOO L I N
R EA LWO R L D P R E DICTIO rl
Now l et us come back a n d l oo k more closely at the
questions we started from in sec tion (I i );
let us
reconsider the worries of the political actor about the
use of mathematical approaches.
We have deal t with
these worries so f a r o n a very basic l evel, o n the
level of defending the concept of predictive theories in the social sciences .
We have emphasized the point
that the explicit s tatistical techniq ues we have
proposed are merely one tool among many in cons tructing
such theories. We have implied that the choice between
these techniq ues for con structin � theories, a n d the
verbal a n d B a yesian techniq ues, s hould be decided o n a
case b ycase a n d even s tudyb y  s tudy b asis, based o n
Page V24
the ideal of pre dictive power, rather than d e cided by
any apriori fiat in f avor of one approach or the o t her;
we have implie d that statistic al analyses should merge into other analyse s, mathematical and nonmathematical, in s u ch a way that political science is divided up
ac cordinR to interf aces b etween substantiv e issu es
rather than interfaces hetween methodolo gical schools
of thou�h t.
All of these comments, while controversial within
the domain of political science, would se em r ather
bland and basic to many r e al political actors. Most
political actors would b e q uite willing to try any methodology that
11
\,mr ks", at any time, withou t getting
too committed to one methodolo gy or another. What
worries them is a question on another lev el: can we
expec t statistical methods to "work" very of ten, In
practice ?
In principle, this question can only b e answered
af ter the f act, in e ach c ase . Howev er, there are a
num b er of reasonable guesses one might mak e, b ase d on past experience, as to the most lik ely areas of
fruitf ul statistical r esear ch in the f uture, in
political science .
First of all, w e may e x p e ct to b e surprised in the
f u tur e, by statistical methods having a large r range of
Pap;e V2 5
app l ication than one might ex pect apriori.
Given that
our pre sent emotiona l ex pec tations are b ti i l t u pon
ver b a l techniq ues which have been thorough l y u sed and
thorou gh l y dev e l oped, whi l e our statis tica l techniq ues
are on l y now b eing perf ected and have bare l y begun to
b e u sed to cons truct predictive mode l s , we may expect
that the deve l opment of s tatis tics wi l l demonstrate a
much greater va l ue than one ' s intuition wou l d indicate today.
Second, we may expect statis tics to provide the
basic "real ity testing" for o perationa l po l itical
theorie s of the q uantitative type or the verb al
anal ytic type. A sim p l e re�res sion ana l ysis may make a poor te st of a verbal "hypothes is ", if interpreted
naive l y. However, a f u l l s tatis tical ana l y sis of a
given set of v ariabl es wi l l give a m uch l ar ger q uantity
of inforMation, information which the ana l y st shou l d
not g l o s s over, eithe r in his wo r k or in his written repor ts ;
if , in f act, it is dif f icu l t to make a
connection b etween one ' s ver h a l theorie s and the
aggregate, s tatistica l behav io r of the variab l e s these
theories pretend to exp l ain, then one has much to l earn in trying to exp l ain the d if ficu l ty .
Oftentimes , an
"obviou s " verbal theory wi 1 1 turn out, thou �h true in the ab stract, to req uire maj or q u a l ification in terms
Page V26
of what it tel l s us to ex pect to find in concrete d ata.
When governme nt pol icy is concerned with the concrete
resul ts themsel ves, these qual ifications m a y turn out
to be of cen tr al inporta nce.
The cl assical theor y of f ul l empl oymen t
equil ibrium, f or e x anpl e, depended o n the ide a that the su ppl y of savings woul d increase with hig her interest
rates, just as the sup pl y of a n y other commodity
incre ases whe n a higher price is of fered ( 2 2 ) . Empirical
studies h a ve not ref uted this Ide a;
however, the y have
shown that the predictive power of interest r ates in
predictin g s avin gs is extrenel y smal l , whil e the ef fect
of income variabl es, cited b y Key nes, has turned out to be very l ar ge ( 2 3 ).
Ke y nes himsel f w a s a bl e to observe
these e f fects b y a n al ytic method s al o ne, but major
governme n ts were ver y sl ow to chan ge their establ ished
viewpoints de spite his argume nts ( 24 ); the statistical
studies, b y con frontin g peopl e d irectl y with the trends
which had persisted up to the curre nt d a y, may have
been a crucial form o f "real ity testing" on this issue. Third, one ma y hope that statistics wil l hel p
il lumin ate the sl ow and stub born trends which underly
so cial phe nomen a. It h as of ten bee n suggested that the
most visibl e varia bl es i n pol i t ics  the turmoil , the
yearl y ups a n d down s in economies, al l ia nces and w ars 
Page V27
a re a ll s u r> e r f i c i al r i p p l es r i d i n g on a dee pe r cu r re nt • Econom i s ts have often d i scuss e d a "technolog i cal"
i ncreas e i n product i on ca pac i ty per cap i ta, an i ncrease
wh i ch cont i nues dur i ng rece s s i on and boom at v i rtu ally
th e same rate, an Incre ase wh i ch i s not f i x e d but wh i ch
var i e s slowl y accord i ng to uncer t a i n cau ses over long per i ods of t i me . Almost any d i scuss i on of the
"product i on f unct i on" i nclu des r e f erence to th i s
"au tonomous" or "technolog i cal" term (2 5 ).
Soc i olog i sts
l i k e Max Weber (2 G ), ph i loso r,h ers l i k e H e gel (2 7 ) or
Marx, and h i stor i ans l i k e S p engler (2 8 ), Toynbee (2 9) and
even MctJ e i l ( 3 0 ) and E i sensta rlt (3 1 ) hav e all d i scu ssed s u ch trenrls.
E v en i n our s i mulat i on stu d i es i n Chapter
( IV) , we found that errors i n shorttern pred i ct i ons
were r e duced f ar less by soph i st i cate d anal y s i s than
were errors i n med i umterm and longterm pre d i ct i on .
Pol i t i cal actors, i n the shortterm, are com pelled
to i mmer s e th emselv es i n deta i ls too small and too unpred i cta ble to be d ealt w i th e f f ect i v ely b y
stat i st i c i ans . f3 u t th e e f f ect i v eness o f pol i t i cal
actors also dep ends on the i r "h i s tor i c al v i s i on", on
the i r ab i l i ty to ju dge the res u l ts of the i r l i f e ' s wor k on the su bsequ ent t i d e of e v ents ; lon�term trends may be more determ i n i st i c, and mor e s usce pt i ble to
mathema t i cal analys i s . In ord er to sort out these
Page V2 8
underl ying trend s , one mu s t soMehow ad j us t f or the
exis tence o f a great deal of shor tterm fl uctu a tion .
This s hort term f l uctu a tion woul d ty pica l l y have a very
com pl ex and changeab l e p a ttern of a u tocorrel a tion; thus a compl ete model o f the "measurement noise" proce s s is
doubl y infeasibl e, and one mu s t f ace up to the need for
"robu s t" proced u res, a s des cribed in s ection (iii) above and in s ection (vii) o f Cha p ter ( I I ) .
Af ter a "robu s t" anal y sis, one m a y find tha t some
v ariabl es tend to fo l l ow de terminis tic l aws , over time,
bu t tha t o thers s til l invol v e a grea t deal o f a p p arent randomnes s. I n this ca se, one migh t ex pect tha t the
pol itical actor woul d have his grea tes t personal ef f ect
on his tory b y trying to change the l a t ter v a riabl e s 
1,\/hich � be changed  and by air1ing onl y indirectl y a t the former v ariabl es. Al s o, by knowing where events
woul d be hea ded if he acted 1 ike the average pol i tical actor, an informed po l itical actor may jud ge the
impor tance of ta king unu s u al l y intens e actions to break
out o f the exis ting trend s.
Fur t h ermore, if there
shoul d turn out to be a "cro s sro a d s " o f po s sibil itie s
ahead ("bif urca tion", in m a thematical l anguage), such tha t the choice o f p o s sibil ities ahea d wo ul d produce
very l ongl ived ef f ects , whil e o ther impl ica tions o f
Pa g e V 2 9
present policy wo ul d b e w ash e d o u t in time b y random
noise, one might well choose to or�anize one ' s entire policy aro und the go al of moving down the right ro ad,
even if this mean s f o cusing one ' s attention on v ariables which are har d er to aff ect .
In practice, there are serio us dif f icul ties in
using statistics b y themselv es in d e al in� with this
third obj ectiv e . Statistics mi�h t d o well in predicting
the stress which will pull at the f abric of various societies; it wi 1 1 not do as well in predicting the ab i 1it y of 1o ca 1
stress.
po 1 itica 1
1 ea de rs to cope w ith the
Still, to know the causes and the m a gnitude of
the stress would be l nterestinB in any case .
But there
d e e per his torical trends can h.e. anal yzed best
lf.
is a bigger dif f iculty with using statistics he n�.
ma ke � of
.tb.g_
we
The
l onges� possible relevant d a ta series ;
yet much of our historical d ata base, a s d escribed b y
Toynbee and McNeil, involves informa tion about
civiliz ations which have not left us a large suppl y of statistical d ata series.
In sone cases, a l arge su pply o f recent d ata m a y
b e enough.
It m a y even seem superior to historical
d ata, on gro unds th at it re flects exclu sively mo dern
phenomena which one wo uld ex p ect to continue in th e future ;
for ex ampl e, certain asp ects o f po pulation
Pag e V3 0
d ynamics m a y b e d ealt with reasonably enough from
recent d ata, e specially insof ar as the f u ture wi 1 1 d epend heavily on the impact of recen t phenomena such as largescale f emale literacy. On the other hand, if
there is any tru th at all in the finding s of Toynbee
and Spengler, then the central trend s of his tory may includ e re gular ris e s and d ecline s (3 2 ) , r egular
curvatur e s, w h ich would b e much harder to observe in short d ataseries  even s hort representative dataseries  than in very lon� d atas eries .
Thus the
long dataseries do much more than increas e the number of observation s and improve the accuracy of our
estimate s of parameters ; the y give us the power to d e al
with important qualitative e ff ects, with whole new
terms in the mod el, which might otherwise be mis s e d . F urthermore , onP. might e x p ect that the f u ture
would represent a d ynariic domain j ust as dif f erent from the pre s ent, as the pre s ent is from the past ; in ord er
to pre dict this domain, it may be more rational to loo k for regularitie s which hav e e x ten d e d from the distant
past to the presen t , and extrapolate them, rather than extrapolate mod els specific to the d ynamic domains of
the pre s ent.
If, in some cas P. s , his tor y were dominated
by lar ge, infre quent and apparently irreversible
changes, as in technolog y , then a longer d atabase
Page V3 1
would h e eve n more e s s e ntial, to give us an adequate
s ampl e to r e pre s e nt the varietie s o f such chan g e s and
co n f iguratio n s in the pas t;
o nly in this way can we
ho pe to d eal w ith the f urth er chan g e s which, u n d er this as sum ptio n, would rloninate the f uture .
(This do e s .!lQ..t.,
o f cour s e, e ntail buildin g mo d els tailored to s ho r ter,
wel l d el imited perio d s in an cie n t his tory, as
unre pre s en tativ e an d r e s tric t e d as recent his tory. )
Also, to d eal wi th the pos s ibility that the human race might be e nteri n g a n ew d omain of experie n ce, totally
dif f erent from an y o f its pas t his tory, o n e might
simply exte n d the bas ic context o f o n e ' s an aly sis s till
further, to in clud e the more g e n era l his tory of s pe cie s
on this plan et an d the patter n s o f evolutio n revealed therein. The biolo gical example of the trilo bite s ,
which b e came totally extin ct after overs p e cializ atio n,
do e s not have a f ull s c ale parallel I n the human pas t ;
this particular example has be e n mentio n e d o f ten i n the popular pre s s, but there may be other as pects o f
biolo gical his tory more g e n eral an d eve n more r elevan t
to our own f uture (3 3 ) . In g e n eral, it wo uld appear
futile to tr y to predic t the f in e d etails o f a compl ex, n atural s y s tem such as human s o ciety be fore we c an
co n s truct f irs tord er mo d els which c an co p e with a
ge n eral review o f th e aggre gate behavior of this s y s tem
Pa�e V32
across the whole of the avai lable datab ase .
All of these v i s i ons of h i stor y emphas i ze that
rap i d per i ods of expan s i on, last i n g for deca des or
m i lle n i a, have e x i sted often e nou gh i n the past, and
have b ee n terM l n ated ofte n e nou �h by the growth of
coun ter tre n ds; the y empha s i ze that an anal ysi s of
soc i al dynam i cs, base d o nly o n the data ava i lable i n
rece nt decades, m a y l ea d to a totally f alse p i cture of
the poss i b i l i t i es w h i ch l i e ahe a rl.
Th i s d i ff i culty
certa i n l y appl i es to verbal theor i z i n g, just as much as to sta t i st i cs;
w i th verbal theor i z i n g, howeve r, the
h i stor i cal data are far more e x ten s i ve for those who
are w i ll i n g to e xam i ne them. Wh i le we wo11ld not agree
w i th the exact de ta i ls of the theor i es o f S pe n gler a n d
Toy nbee, we wott ld con s i der i t all the more i m portant to
descr i be and expla i n the phe nome n a the y have d i scussed . F i n all y, stat i st i cal methods may hel p o n another
level  as a parad i gm to gu i de verbal rese arch, both i n gen eral a nd i n spec i f i c cases. We h ave emphas i z e d
throughout th i s chapter the value of verbal research, conce i ved as an attempt to do w i th verhal data what
stat i st i cal research doe s w i th mathemat i cal data. Yet
eve n i n stat i st i cs, where the methods use d are spel l ed out expl i c i tl y i n adva nce , we h ave seen that the
Pa P, e V  3 3
po pu l a r m e t ho d s o f a n a l y s i s c a n b e q u i t e u n r e a l i s t i c
and decep t i ve .
I t wo u l d s e em u n r e a so n a b l e to e x p e c t a n y
rio r e , a p r i o r i , f r or, v e r b a l r e s e a r c h .
I n d e e d , v✓ e h a v e
b e e n a b l e to com p a r e t h e two f o rr,s of r e s e a r c h i n
r e s pe c t to t h r e e i s s u e s : t h e emph a s i s o n p r e d i c t i o n ,
t h e w i l l i n g n e s s to d e a l w i t h t h e l o n g e s t po s s i b l e
t i me  s e r i e s , a n d t h e w i l l i n g n e s s to a c c e p t no i s e a s
pa r t o f p r ed i c t i ve t h eor i es ; i n a l l t h ree cas es ,
e s pe c i a l l y t h e f i r s t a n d mo s t b as i c , c l a s s i c v e r b a l po l i t i ca l s c i e n ce  w i t h a h a n d f u l o f no t a b l e
e x c e p t i o n s  d o e s no t a p pe a r t o h a v e g ro u n d s to c l a i m
s u p e r i o r i t y i n i t s a t t i t u d e s , a t l e a s t no t i n t h e p a r t s
of t h e l i t e r a t u r e w i t h w h i c h w e a r e f am i l i a r (3 �) .
I n t h i s s i t u a t i o n , t h e me t h o do l o � i c a l a d v a n ce s i n
s t a t i s t i c s  wh i c h a r e we l l  d e f i n e d , a n d w h i c h c a n b e
co n so l i d a t e d  c a n b e o f r, a j o r v a l u e i n e d u c a t i n g t h e v e r b a l l y  o r i e n t e d po l i t i c a l s c i e n t i s t .
T h e co n c e p t o f
s to c h a s t i c p r e d i c t i v e t h eo r i e s m a y s e em r e a s o n a b l e i n t h e a b s t r a c t to t h e v e r b a l po l i t i c a l s c i e n t i s t ;
howe v e r , w h e n h e t r i e s to t r a n s l a t e t h i s i d e a i n to a s t r a t e gy f o r h i s own r e s ea r c h , i t wo u l d no t b e
s u r p r i s i n g i f t h e d i f f i c u l t y o f do i n e so b ro u g h t h i m to
w i t h d r aw b a c k to A r i s to t e l i a n p ro c e d u r e s .
I n dee d , t h e
c l a s s i c a t t emp t to t r c'! c k down t h e l o n g t e rm " ca u s e s " of h i s to r i c a l e v e n t s i s , a s we h av e rien t i o n e d a bo v e , v e r y
Pa ge V 3 4
c l o sel y l in ked to the search for dy n amic mo dels. However, the concept o f
11cau se 11
is a \'leak enou gh
paradigm that historians have o f ten fo u n d themselves
forced to admit
11
riulticau sality 11 (3 5 ), and then to
withdraw into more descri ptive, more "o b j ective"
q uestion s. Furthermore, historians have o f ten adriitted
themselves to be intrigued by the "great ' what I f ' q uestio n s o f history", such as, "What \1ould have
hap pen ed if the S panish Arriada had wo n in
15 8 8 ?" ;
yet
such q uestio n s  which clear l y call for the u se of some kind o f predictive mode l  h av e been dismissed as
speculative (3 6 ).
I n short, there would ap pear to be a n eed for a
more durab l e paradi�m in analytic verbal research.
verbal social scientists can becone more and more
If
f amiliar, o n an I ntuitive level, with the co ncrete
methods o f statistics, in cor1in g to grips with co ncrete data, then they may be able to develo p a clearer and
clearer picture o f what it mean s to search for ro bust predictive models, mather,atica l or verb a l . A l so , they
may be ex pected to learn to appreciate the va l ue o f
treatin g q uantitative variables as such , even i n verbal discussio n , rather than re d u cing them to such possibilities as "high" and
1 1 low 1 1 ( 3 7 ).
The value o f statistica l rietho rl s as a paradigri f or
Pa ge V 3 5
v erbal re s earch m a y b e e s pecially gre at in those ca s e s
where s tatistics can d eal ex pl icitl y with so�e, but not all , of the his torical d ata of int ere s t . One can
imagine two w a y s in which this v al ue woul d b e felt, a s
s tatistics are bro u ght to b e ar on thos e d ata which are
a vail a bl e.
F irst of all , tho s e people doing th e s tati s tical
research woul d h a v e to choos e a s et of varia bl e s to
use, in d e vel oping pre dictive models .
When predicting
a s y stem l ik e a mis s ! l e, made u p of fiv e ma j or
sub s y stems or so, one ' s pr i mary concern in ma king
mediumterm pre dictions is with the "overall s y stem, "
with the s y stem made u p of the In teractions b etween the
five major sub s y s tems. Similarl y, when a s king for
l ongterm prediction of a s tatistical system, m a d e u p
o f fiv e cl usters of heavil y intercorrel a t ed s y stems of
v ariabl e s, one would normal ly s tart out b y a ggregating e ach of the cl usters, b y u s e of f actor anal y sis,
pattern anal y sis or o ther proce dure s, a nd then s tud ying the rel ations b etween the a g gregate varia ble s .
To try
to pre dict wh ere a mis sil e will go, b y pre dicting what
each of the s ubsystems woul d d o In total isol a tion from e ach other, is to ignore the mo st Important functional
rel ations.
Statistical analy s is, b y drawing us aw ay in
concrete ca s e s from a fix ation on the internal d ynamics
Page V 36
o f special ize d subsystems, may hel o bring us b ack to
stud ying the bro ad structure o f interf aces and mul tipl e
sub systems which is crucial to even a f irs t order prediction o f h uman societies .
Then, when researchers
go b ack to l oo k inside the subsystems, we m ay hope th at
they wil l f o cus mo r e attention on those questions about t he interf aces w h ich appeare d important at the gl o b al
l evel .
The "unity of science" may b e a deb atabl e
pro position when appl ie d to predictive model s o f, say, biol o gical s ystems and astronomical systems. However,
when one is trying to pre dict a singl e , h ighl y
integrate d system, th e need for interd iscipl inar y unity becomes overw hel ming . When smal l f ee d b ack terms from
one sub s ystem to ano t her can h ave o verw hel ming e ffects
in de termining system b e h av io r, it is essential to try to measure the aggregate be h av ior directl y.
In the next p h ase, af ter the statistics h ave
pointe d to concrete interdiscipl inary e f fects, human
verbal knowl e d ge can go o n to expl ain and to qual if y t hese concl usions.
One migh t, on verbal grounds,
regard the concl usions as misl eading, as
oversimpl if ie d, or as o ne  side d
in their emphasis; in
any case, h owever, even to d iscuss these co ncl usions
intel l igentl y , one must try to discuss, on the b asis of
ve r bal knowl e d ge, w h y one woul d expect certain
Pa g e V 3 7
co r r e l a t i o n s a n d ce r t a i n d y n am i c pa t t e r n s to vJO r k o u t a s t h e y do . O n e wo u l d h ave t o d i s c u s s t h e " t r u e " d y n am i c r e l a t i o n s b e tvve e n d i f f e r e n t s u b s y s t er,s ,
in
o r d e r t o e x p r e s s w h a t o n e f e e l s a r e w ro n � w i t h t h e s t a t i s t i c s . One wo u l d b e t emp t e d t o
11
h y po t h e t i c a l
II
or
II
e n ga g e i n
an a l y t i c" mo d e l l i n �  to s u g g e s t
wh a t wo u l d h a v e h a p p e n e d to t h e s t a t i s t i c s i f o n e h a d i n c l u d e d , a s h a r d d a t a , c e r t a i n v a r i a b l e s fo r w h i c h
h a r d ma t h ema t i c a l d a t a d o n o t h a p p e n t o e x i s t .
\"IO u
One
l d f o c u s o n e ' s a t t en t i o n o n t h e v a r i a b l e s of c en t r a l
i n t e r e s t , r a t h e r t h a n l o s e o n e s e l f i n a mo r a s s of
u n r e l a t ed h i g he r o r d e r v i c i s s i t u d e s .
I n b r i ef , o ne
m i g h t a cq u i r e t h e rnone n t ur, n e ce s s a r y to l a u n c h i n t o a f u l l  s ca l e ve r b a l d y n am i c a n a l y s i s , w i t ho u t c r a s h i n g
back unde r t h e we i gh t of p u r e c l a s s i ca l t r ad i t i on s .
Befo r e we c l o s e t h i s d i s c u s s i o n o f t h e va l u e o f
s t a t i s t i c s t o po l i t i c a l a n a l y s i s , i t m a y b e wo r t h
n o t i n g t h a t i m po r t a n t a pp l i c a t i o n s m a y a l so ex i s t i n
po l i t i c s p r o ne r .
E a r l y i n C h a p t e r ( I I ) , w e me n t i o n ed
t h e po s s i b i l i t y o f a g row t h i n eco l o g i c a l a n d
s o c i o l og i c a l mod e l s , b a s e d o n t h e v a s t acc ur,u l a t i o n o f
d a t a b y e a r t h s a t e l l i t e s . Ma n y o f t h e d i f f i c u l t i e s
c i t e d a bove  pa r t i c u l a r l y t h e dom i n a n c e of ve r b a l d a t a
ove r q u a n t i t a t i ve d a t a  wou l d n o t a pp l y i n t h i s c a s e ; a l so , o u r emp h a s i s o n f ee d b a c k ef f e c t s a n d
Pa.P; e V38
interdis ciplinary r e s earch wou l d apply more than ever.
Given the im portance o f world b alance s o f agricultural pro duction and po pulation, given the dangers o f
ecolo gical catas tro r>he thro 11gh high s tre s s and
imb al ance in s u ch s y stems, and given the pos sibil itie s
here f or treatie s to set up a coordinate d glo bal s y stem to monitor and help control the s e s y s t em s from s pace,
the practical political scientis t wo u l d have goo d
reason to think abo u t thes e ap pl ications . This is dou bly true, Inso f ar as the develo pne nt o f the s e
ap plications may b e f ar f ror, automatic. ( v)
B E Y O N D NAIVE E M PIRICISM : A DA PTIN G O lJ R I D E A S TO FIL L T H E G A P L E F T R Y S TATISTIC S
Now let us return once more to o ur s tarting point,
to the worries o f the political actor about u s ing mathematics .
(S ee s ection
( i i ).)
the s e worrie s on two l eve l s :
We have dealt with
Ci ) the leve l o f
def ending the no tion o f prediction ;
( i i )
the l evel o f
d e s cribing the practical applications o f s tat is tical
model ling to politic a l pre diction . We hav e emphasiz ed
the empirical ap pro ach, in bo t h verb al and s tatistical
Page V3 9
research.
On another level, however, the political actor
might q u e s tion the idea that e mpir ical approaches are eno u gh, when the s u b j ects o f one ' s inve s tigat i on are
intellig ent human b eings. In particular, the political actor wo uld h a v e to reconcile the u s e of o b j ective
metho ds to pre dict other actor s, while pres erving the s en s e of free wi 1 1 in making his own d ecisions. On a
primitive level, this paradox pos e s no difficu l tie s at all to the political actor ; it is eas y to conj ure up
the image o f a f as t d ealing political hack, working for a city machine, gle e f ully pushing people aro und as if
they were b uttons on a pinball machine. A t a more
advanced level, politicians f ind that they can pre dict
people better and influ ence their actions more
constructiv ely by exploiting emr,athy, by u sing their
own reaction patterns as a kind o f analo g u e mod el to
pre dict the reactions o f others . Thu s there are the
old, persis tent ad ages : "lf yo u want to predict what a man \>Jill do, try to put yo urs elf in his s ho es, " and,
"If you w ere a • • • what vmuld
YQl!.
do ?"
This proce d ure
is particularly ef f ective when the political actors
come froP1 the s ame b ackgro und as the people they are
pre dicting, when their b �c k gro u nd is cosmopolitan, or
when their reaction patterns are d ef ined at a general
Pap;e V4 0
enou g h level to make it easier to imagine how another person would re ally re s pond to a s itu ation very
dif f e r ent from th at of the political actor h im self. (Empath y may al so be u se d , o f cours e , as a tool in
t h inking of approac he s one might borrow from others in
coping with one ' s own prohlems . ) W hen a political actor
os cillat e s between thinking about o t h ers
" s ubj e ctivel y", in terms of empath y , and th inking about th er, "ob j ectively", in terrr,s of pre dictive empiricis m, a conflict emerge s , long be fore th e u se of s tatistics
as s u c h aris e s .
Rationality, and th e ac knowle d gement
of oth er s ' capacity for rationality , wou ld ap pear to
allow no e s cape f ron this conflict; as long as we h ave two distinct source s of information from w h ic h to
predict the b e h avior of people , we mu s t live as be st we
can with the conflictin� pre dictions , wh ile tr:x:ing to
reconcile ..t.bJ1m 1Dl concre te improvements .lo. ..t.b.!:t concepts
Yl..ft use on both s ides .
Pre dictive s tatis tical morl e l s , like empirical
v erbal models , cannot dire ctl y expre s s the ins ig h ts
derived fror,
11
emf)ath y 1 1 ; in particular, the y cannot
e x pre s s the ins ights derived from ac knowle dging th e
intelligence of oth er h uman being s (3 8 ) . This limitation may be of enorrm u s importance to the practical
nolitical acto r . On the othe r h and, the re late d
PaP;e V  li l
mathematica l con ce pt of max imiz in g a cardin al utility
f u nctio n expre s s e s the idea of human Intel l igence, more
vividly an d more precis ely than the u s u al verbal
formulation s . The argw'lents of Von N e uri a nn C3 9) an d of
Raif f a (4 0) in f avor of this con cept req uire little more than lor.;ical con sis ten cy on the whole, I n the ul timate
value s that the in divid u al purs u e s ; while the s eriou s
political actor would normally admi t that he sonetime s
acts s tu pid, an d some times acts at cro s s  purpo s e s
again s t him s elf, e s p e cially when limitation s on time
an d on kn owledge con strain his detailed
d ecis ionmakin g, he would rarely con sider s uch mis tak e s
a s a matter of f ixed
QL
d eliberate policy.
Often, when
the political an aly s t would accu s e him of in d ul r.;in g in
irration ality, he wo uld hav e a counterargument of his
own, b as e d on the knowle d g e an d concepts available to him at the time of his d ecision.
If we agree with Raif f a, then, that the
max iriiz ation of cardin al u tility is "va l id" as a
fou n d ation for mos t political d ecisionmaking, w e find
our s elve s le d to important conclu sions about political an al y sis too. First of all, we fin d ourselve s
reemphasizing the po i n t that verb al r e s e arch may h e regard e d as an attempt to perform valid statistical
in f ere n c e, accou nting for d ata which is le s s s tructure d
Page V4 2
and l ess managea hl e than the u su al statisti c al
timeseri es.
Within Raiffa ' s framewor k, the has l c
q uestions one asks are q u antitative in nat ure ; e . g. 
" If 1;\l e carry out action A , how � wil l it cost, .b.rui
DJ.Yj'ji
do we gain, and \l'J hat are the proba b 1 1 i ti es that we
wil l su cceed ? " ask, "lf
\"le
More generall y, Raiffa woul d have u s
carry o u t action A , startinr, from situation
B, wha t is the distrib ution of probabil i ties attached to the different possibl e level s of cost and to
different possibl e outcomes?
How much do we e xpect to
gain from each of the possible outcomes, If our
subseq u ent strate gy is optirial ? "
In each case, we d o the best we can to estimate
these q u a nti ties on the basis of the
avail able verb al
Information ; thus the rese arch carri e d out on that
information, is carri e d out for the pur pose of
extracting the most acc urate possible statisti cal
information.
We also ac count for the intell l gence of
other actors . We al so ac count for more direct
q uantitative evid ence, whenever we c an find it .
most so urces of information, w e e x pect to get probabilisti c indi cations of various kinds, certainties.
From
never
Through practice, we may hope to learn
more and more the art of formulating accurately the
Inter r elated patterns of statistical Im pli cations of
Page
v � 3
our verb al knowled ge, and to reduce the lo s s e s in
transl ation which al ways Intervene in going from r aw
obs erv ation to decis ion.
Second  and more important  the concept of
util ity maximiz ation of f ers us an ideal iz e d model of
intel l igent decisionma king, for usn In the prediction
of other pol itical actors .
At fir s t gl ance, this
concept may sound r ather culture bound.
However, the
concept of utility function i s very g eneraliz e d in the range of concrete behavior it can includ e . One can
ima gine all sorts of different utility functions. One can imagine many different l ev els of knowle dge and
aptitude brought to b e ar in maximiz ing utility
functions . One can ev en imagine differ ent levels of
b a sic cognitive s tructure, a s s u gg e sted b y Piaget { � l )
and b y ego p s ychiatri s t s { � 2 ), levels which one may hope
either to rememb er or to a dvance to . One can ima gine states of short term p sychological diseq uilibrium,
where a pol itical actor doe s not yet take the actions b e s t suited to m aximiz inr, his utility function,
because, on some l evel , he has not y et b ecome aware of
the po s sibi 1 i ties. { Th e detection of s uch dis equi 1 ibria
is particularly important to pol itical actors whose job is to persuade othe r s to change thei r course of a ction. )
T hus, s tarting from th e concept of utility
Pa g e V 4 4
ma ximiz ation, one can approa ch empirical re ality bit by bit, by u sing both empathy and empirical d ata to a d d
qualifications to one ' s view of o ther actors a s i d e al
d eci sionm a k ers . Even a s one a d d s qualif ications ,
however, one can continue to insis t th a t all
interpersonal differences in pers onality be analyz e d in terms of the current s tate parameters of a s y s tem which
obe y s the s ame gen eral d ynamic laws a s one ' s own mind , and which Is capable of chang i ng its s tate parameters
a s a result of the general le arning capability s hared
by all humans ( 4 3 ) .
From a theoretical point of view, the choice of
s tarting point is not a matter of m ere boo k k eepping ; it
defines one ' s implicit
11
prior probability"
distribution, a s d e s cribed in s ection ( v ) of Chapter Cl I). From a practical point of view, this proced ure
can help us avoid rigid s tereotypes of other political actors ; it can help us rememb er that the y , too, have a
capacity for change , and that the likely directions of
change are not entirely random. In any ca s e , this
proce dure allow s us to m a ke u s e of both major source s of information , infornation d eriv e d from empathy and information d erived from more obje ctive d ata. This
proced ure also sugge sts that the procedur e s mentioned
in section ( x ) of Cha pter (II) for utility ma ximiz ation
Page V 4 5
m i ght b e u s e d, not j u st as a techniq u e fo r analyz i ng decis i onmak i ng s y stems, b ut as the b as is fo r s u b stantive mod e l s of s u ch s y stems .
The u s e of u t i lity max i m i z ation as a model of
dec i s i onmak i ng has al r e ady le d to a numb e r of
p ract i cal applicat i ons, notably J n game theo r y and i n
microeconom i cs.
The conce pt of i deal u t i l i ty
max i m i zat i on p r ed i cts the b ehav i o r of an actor
cond i t i onal u pon the i nfo rmat i on that he has ava i lable
to !l.J..m.; thu s i t may lead to an l mplicl t morl el, a model d ef i ned i n te rms of variable s which a r e not d i r ect l y
o b s ervable to othe r acto r s.
In te rms of b ehavio r i s t
att i tudes, th i s i s a maj o r l i ab i lity, i n sofar a s i t
make s i t much mo r e d i f f i ct1lt to p r e d i ct b ehav i o r
conc r etely ; on the oth e r hand, s u ch i mpl i cit mod e l s may
a l low u s to i nf e r someth i n � about the h i d den variables
from the ove r t b ehav i or.
In the cas e of m i croeconom i cs, one doe s not
attempt to p r e d i ct the actu al levels of steel
p rodu ct i on, etc . , at least not i n the earl y s tag e s of
r e s earch ; i nstead, one de f ends th e p ropos i t i o n that the
levels of steel p rod uct i on w i ll b e equal to whate ver level i s nece s sa r y in o r d e r to max i m i z e soMe k i n d of
utility funct i on, i f the decis i on I s mad e b y an economy
wh i ch enj oys p e rfect compet i t i o n (4 4 ) .
(Str i ctly
Pa.r.;e Vli 6
spe aki n g , howeve r , e conomis ts now te n d to avoirl the
con ce pt of "so cial utili t y fu n c tio n ", o n g ro u n ds th at
th ey can dedu c e sinila r co n clu sio n s f r om we ake r
ve r sio n s of the s ame c r ite r io n. )
O n e might well h ave
atta c k e d th is th eo r y , in its e a rly s tage s , a s an
un s cie ntif ic  tho u g h mathema tical  exe r cis e in
p ro pa g an da. Howe ve r , a s th e t h eo r y was develo p e d, it
tu r n ed o ut to be a powe r f ul f r amewo r k f o r evaluating
the .inef ficie n cie s p r o d u ce d by situ atio n s o f lm.pe r f e ct
competitio n in t h e r e al wo rld (4 5 ) ; it h a s b e e n u s e d to a n alyz e the eff e cts of taxes a n d labo r laws (4 G ) o n
econonic eff icie n cy ; it h a s le d to t h e develo p me n t o f Liebe rman ' s p rin ciple s o f e co n omic o r g aniz atio n (4 7 ),
n ow a main stay of th e Soviet e c o n omy (4 8 ).
Mic r o e co n omic s, initially a n isolate d a n d e s s entially u nempiric al th eo r y , has tu r n e d into a powe r f u l
mathematical f r a mewo r k f o r a n aly sis , a f r amewo r k allowin g th e u s e f u l b r i n �in R toge t h e r o f vast
q u antitie s of empirical data, a f r a mewo r k impo rtant to bo th th e p r e di c t io n a n d th e comp r ehen sio n o f e co nomic p h e nomena.
Yet all o f this s uc ce s s was b a s e d o n a s tatic
co n c e pt of u tility maximiz a tio n , a con c e pt of o ptimal
equilib rium, r elated to th e cla s s ic co n ce pts o f
l a g r a n ge (4 n ). 3in ce th e n, No rbe rt t\lie n e r (5 0 ) h a s
Page V 47
dis cus s e d t h e more mod ern, more powerful d ynamic
t h eorie s of m ax imiz ation, w h ich h e would consider a
s ubs tud y of "cybernetics . " H e h as suggested that this
bod y of th eory be ap plied to t h e h uman brain (51). Karl
Deutsch (5 2 ) h as gone on to s 11 g g e s t that cybernetics may
also b e applie d to po l itical science . Cons idering t h e
power that a primitiv e, s tatic conce pt o f maximization h as h ad, over d ecades, in economics, th ese s ugges tions
would appear to make a great deal of s ens e .
Unfortunately, th es e id eas are caug h t between th e
"mighty op posite s " of mod ern methodology  t h e
be havioris ts, w ho would d e mand q 11antitativ e empirical
proof that th e initial mod el predicts all t h e variables
in d etail, and the traditionalists, who would not have patience with t h e math e�atlcs. Also, in th e last f ew
years, t h e relevant p h a s e of "cybernetics " h as been
renam e d "control theory." He h ave d is cus s e d th e value
of "control th eor y" (i . e . of o ptimization techniques)
in C h apter (11) as a tool in analyzing social s ys tems ; however, control theor y may also be us e d its elf as a
normativ e mod el, of th e proce s s e s w h ich allow h uman societies  or even t h e h uman brain its elf  to
function .
(See note (5 3 ) for mor e concrete
pos sibilities . ) As with microeconomics, one will e xpect to f ind that th e real s y s t ems inv olv e imperf ections and
Page V4 8
approximations to the optimum. (5 4 ).
However, one may
also f ind It intere sting to be abl e to see where these imperf ections are , and to appreciate the capacities
that human beings and political societie s  like
e conomies  do have , to co pe with d ata on a scale far
beyond the capacity of pre s ent d ay computers .
I nso far
as one agree s with the traditional ist that the human
being is stil l the mos t releva n t uni t of analysis in politics , one can try, in the future, to e xpand the
interf ace b etwe en cybernet i cs in p s ychology and
c ybernetics in political science.
I n summary , we have conclud e d that the
mathematical method s o utlined in Chapter (I I ) can
ind e e d be applied to political science , but that the y s hould always b e consid ere d as only o n e branch of a
more comple x , inte grate d system of analysis , oriented toward s the goal o f prediction;
the Raye sian
phil o s o ph y of util ity max imiz ation and condit ional
probabil itie s could play a central role in organizing
this s y stem of analy sis , but the behaviorist philoso phy
of total empiricism doe s not hav e the pow er to account for major parts o f this sy stem, parts which appear
e ssential in th e l as t part o f this chapter.
Page V  4 9
FOOT NOT ES T O CHA PT E R (V)
C l) One might take up considerabl e space discussing this particul ar point. Traditional Lanchester ' s Laws for ancient warfare indicate an equal number of deaths on both sides, regardl ess of concentration ; thus, they tend to impl y no possibil ity of strategy in such warfare. On the other hand, the "Laws" usual ly have the discl aimer "ceteris pad b us" attached, impl ying equal l evel s of material and social "technol ogy. " I f "social technol ogy" includ es superior strategic abil ity on the part of a commander, l ike Caesar, then the l aws become a poor guide for the wouldbe superior strateg l st. We have l imited our statement here to the cl aim that Caesar won his victories by evading Lanchester 1 s Laws, by avoiding the necessity for attrition or perhaps by expl oiting the l oopholes in the laws, rather than inval idating them; this much, at l east, seems fairl y clear to us from . Caesar ' s account itsel f. Liddel l Hart, in his classic mil itary textbook, S tra tegy ( Praeger, NY, Second Edition, 1967), cites (p. 3 3 8 ) Caesar ' s llerda campaign, Cromwel l ' s Preston campaign, and a few others, as the cl assic bl oodless victories; he goes on to write, on p. 3 3 9: "Whil e such b loodl ess victories have been exceptions, their rarity enhances rather than detracts from their value  as an indicator of latent possibil ities, in strategy, and grand strategy. Despite many centuries ' experience of w ar, we have hardl y begun to expl ore the fiel d of psychol ogical warfare. From a deep study of war, Cl ausewitz was l ed to the concl usion that  ' Al l mil itary action is perme ated by intel l igent forces and their effects. ' 1 1 ( 2 ) LiddellHart general l y prefers to talk about the "indirect approach" and the "unexpected" more than the use of imagination, b ut clearly the former require the l atter. Hart writes (ibid) on p. 3 42 : "A more profound appreciation of how the psychol ogical permeates and dominates the physical sphere has an indirect val ue. For it warns illi Q..f � f al l acy .aru!. shal l owness tl a tt emp t ing .t.Q. ana l yze a.!151 theoriz e a bou t s t ra t egy 1n terms of Ma them a t ics. To treat it quantitatively, as f f the Issue turned onl y on a superior concentration of forces at a selected pl ace, is as f aul ty as to treat it geometrical l y • • . " Also, on p. 162, In the
Pap;e V 5 0
sec tio n " Co n c l u sio n s f rori Twe nty  F ive Centu rie s " , Lid d e l 1 H a rt discu s s e s the c h a r acte ri s tie s o f the rr10 re u s u a l , b l oo d y victo ri e s : " • • • sca n ning, in tu r n , the decisive batt l e s of h is to r y , we f i n d th at in a l mo s t a l l the vic to r h a d h is o ppone nt a t a p s ycho l o�ica l d is a rlva ntap;e be f o re t h e c l a s h took p l ace • • • mo st o f the examp l e s f a l l into o ne of two catego rie s • • • de scribe d in the wo r d s ' l u re ' a n d ' t r a p ' . " '.';ee a l so n ote 1. T he f u l l we l �ht o f these o bj ectio n s w i l l n ot be dea l t with u n ti l sectio n (v) o f o u r tex t.
( 3 ) In wr iting thi s , a n d remembe rin g some of the inte re stin g gene r a l izatio n s we h a d h e a r d f rom p u r e ve rba l po l itica l science , it w a s dif f icu l t to overcome the se l ective mem� r y a n d f a ce u p to the o ve r a l l rnethodo l o �ica l views sti l l p romine n t in the f ie l d . But a quick r eview soon set o u r riemo r y s tr aig h t. F o r e x am p l e : "It s ho u l d be noted that the err1 D h a s i s he re I s on deductio n , not on i n d u ctio n. In the wo r d s o f a nothe r p a r tici p a nt in the semi n a r , P rofe s s o r S. E. Fine r , we a re makin g a n attempt a t ' de sc ribin g the po l itica l po s sibi l itie s. 1 C o n side r a h l e emp h asis s h ou l d be put o n the wo r d ' desc ribing ' : we remain in the h um h l e s ph e r e o f de s c riptio n a n d do not attempt to r ise to the m o n� l of ty o ne of s pecu l atio n. " (p . 4 0 of " Ge ne r a l Metho do l op;ica l P r o b l en s " , b y G u n n a r Hecksche r , in C omQa rative Po l itic s , Eckstein , H a r r y a n d A pte r , navid E. , e d s. , F ree P re s s o f G l e ncoe , 1 9 6 3 . ) I n his to rica l re sea rch , the p ro b l em is no re s e r io u s , a s : " My p rincip l e s a n d metho d s of resea rch a n d w r i tin g we re l a r ge l y 11o r ke d o u t u nco n s cio u s l y , th r o u g h l is tening to exce l l e n t te ache r� a n d f o l l owin g the be s t mode l s • • • T h e h is torian h a s both t he right a nd the d uty to make mo r a l j ud �enents. He s ho u l d not attempt to p ro p he s y , but he may o f fe r ca ut f o n s a nd is s ue wa r n i n e s. " ( p. 1� 1�  h S , Vista s of Hi sto ry, S amue l E l iot Mo ris o n , Kno p f , NY, 1 9 6 4 . ) O ne m a y a sk w h at the wa r ni n g s a r e s u p posed to b e b as e d o n , if not o n p ro b abi l it i e s o f u n de s ir abl e even ts co n ditio n a l upon ce rtain po l icy decisio n s ; a l s o o ne may q ue stio n w hethe r methodo l o gica l decisio n s o u gh t to be b a sed o n u nco n s ciou s f acto r s. See a l s o n ote s 1 0 a nd 3 6. ( 4 ) S pen g l e r , Os\f1a l d , The Dee l ine o f the \Jest, K n o p f , N Y , 1 9 2 6 , t r a n s l ate d f rom the 1 9 1 8 o rigin a l by C h a r l e s Atki n so n , p. 1 0 6  1 0 7 : " T he aim o nce
Page V  51
attained  the idea, the en t ire content of inner possibilities, f u lfilled and made ex ternal ly actual  the C u lture suddenly hardens, it mortifies, its blood congeals • • • This  the inward and outward f u l fillrient, the f inality, that awaits every living Cu lture  is the pur port of ;:il l the historic "decl ines", amongst ther, that d ecline of the Classical w h ich we know so well and f u ll y, and ano ther d ecline, ent i rely comparable to it in course and d irection, which will occu p y the f irst centuries of the coming millenium b ut is heralded alread y and sensible in and around us tod ay  the decline of the Wes t . " p. 10 9 110 : '' Every Cu lture, every adolescence and maturing and decay of a Culture, ever y one of its intrinsical ly necessary stages and perio ds, has a d efinite d uration, always the same, always recurring with the emphasis of a sym bol."
(5 ) Toynbee, Arnold ,J . , A S tudy of History. abrid gement of Voluries I VI, O x ford U . Press, N Y, First American Edition, Four th Printing, 1 9 4 7, p . 2 L� 4: 1 1 The prob lem of the breakdowns of civiliz ations is more o bviou s than the prob lem of their growths. Indeed it is almost as obviou s as the prob lem of their geneses . The geneses of civiliz ations call for ex planation in view of the mere f act that this species has come I nto existence and that we are able to enumerate twent y  six representatives of it  including in that number the f ive arrested civiliz ations and ignoring the abortive civil iz ations. We may go on to observe that, o f these twent y  s ix, no less than sixteen are now d ead and b uried. " p. 2 4 5 : "lf we accept this phenorienon as a universal to ken of d ecline, we shal l conclu de that all the six nonWestern civiliz a tions alive today had broken down internall y before they were bro ken in u pon b y the im pact of Western civiliz ation f rom o u t side • • • For our present p ur poses it is enough to observe that of the l ivin� civiliz at ions every one has al read y bro ken dmm and is in process of disintegration except our own. " p. 2 5 3  2 5 4: " The metaphor of the wheel in itself o f fers an il lu s tration of recurrence being concurrent with pro gress • • • Thus the d etection o f periodic r epetitive �overnent s does not impl y that the process itsel f is of the same cyclic order as they are. On the contrary, if any inference can historicall y b e drawn from the period icit y of
Page V 5 2
these minor movements (such as the rise and decl ine of GraecoRoman civil iz ation), we may rather infer that the major movement which they bear in mind is not recurrent but progressiv e . " (Comments in parentheses inserted b y us . ) Al so, in variou s pl aces, Toynbee emphasizes both schol astic rigidity and corruption as s ymptoms of decaying civil iz ations; he hints at a dif f erent, l ess charitabl e expl anation of the common methodol ogical dif ficul ties we have mentioned In the text. However, even if Toynbee ' s expl anation has sor,e tr uth in it, a redu ction in the cost of adhering to good methodol ogy shoul d stil l f acil itate nore worthwhil e research .
(G ) Turner ' s theory is wel l known as an attempt to articul ate the f actor s w h ich caused American progress in the l ast f e\J centuries; hoHever, it is not onl y interesting in its own right, but l s an exampl e how such ideas can be usef ul sometimes to those trying to decipher more general l aws of histor y, which, in turn, may be useful to present pol icymakers . Wal ter Prescott Web b, in "The Frontier and the 4 0 0  Year Boom", p . 1 3 6, tJri tes in cor,ment on Turner ' s ideas : "Assuming that the f rontier cl osed a bout 1 8 9 0, it may be said that the boom (in i.!lJ.. of Western civil iz ation) l asted approx imatel y four hundred years . It l asted so l ong that it came to be considered the normal state, a f al l acious ass um ption for any boom. It is conceivabl e that this boom has given the pecul iar character to modern history, to what we cal l Western civil iz atio n. 11 (Art i cl e b y Web b l ocated in Tayl or, George R. , ed . , The Turner Thesis : Concerning the EQ.u:. of � Frontier 1D. American History 1 Heath Co . , Lex ington, Mass., Third Edition, 1 9 7 2. Web b goes on to suggest that the search for "ne,v frontiers" is essential l y an irrational , desperate attempt to preserve a dying enterprise ; however , his assunptio n that new foci o f economic devel opment cannot be found does not al l ow for some of the possib il ities o f technol ogical progress over the nex t f ew decades . Over centuries, the l imits of the earth itsel f may be expected to prevent unl imited growth; on the other hand, when one speaks in terms of centuries, one cannot entirel y rul e out the possibil ity of devel oping economic activities b eyond the pl anet earth itsel f. Certain aspects of economic and technol ogical growth depend on l arge numbers of
Page V5 3
indegend ent random d isturhances, which, on the whole, may accumul ate and be subject to accurate prediction by statis tical procedures or the verbal equivalent ; however, the historic develo pment of nucl ear power, fo r exampl e, o r the future possibilities of elementary p articl e physics and nonequil f hrium (nonl ocal) thermod ynamics (see note 5 1) , involve a more sweepin� form of prio r ignorance, which translates into probabil ities f ar from one or zero and whose values may change according to government or even individual decisions . At any rate, it is possibl e that the ideas mentioned b y Webh m ay have appl ication to other parts of the historical data base, beyond the West.
( 7 } Hegel , "The Philoso phy of Histo r y", excerpted in The Phi l osophy of Hegel , Carl Friedrich , ed . , Modern library, N Y , 1 9 5 1i , p. 2 1  22: "The Principle of development contains further the notion that an inner destiny or determination , some kind of presup position, is at the base of it and is brought into existence . This final d etermination is essential . The spirit which has worl d history as its stage, its prope rty and its field of actual ization is not such as vrnuld move carelessly about in a game of external accidents, but is instead the absolute d etermining factor. " p.2 3: "Horld history presents therefore the stages in the devel opment of the principl e whose memory is the consciousnes s of free dor,1 • . • 1 1 S tages then l isted .
( 8 ) Schwinger, Julian, Particl es, Sources and Fields # AddisonWesley, Reading , Mass. , 1 9 7 0 , Preface, especiall y paragraph two of preface . The most phenomenol ogical approach in basic physics, the "Smatrix" approach, restricts its attention to describinF, the S matrix . This matrix is defined as the matrix which predicts .all scatterin g resul ts , fo r al l possible scattering ex periments in highenergy physics ; these, in turn, constitute the vast majority of highener�y data . (9) In q uantitative political science, the obvious reference here is to Blal ock , Hubert M . J r . , Caysal I nference .in Nonex perimental Research, u . of North Carolina Pre s s, Chapel Hil l, 1 9 6 4 . Hayward Alker also recommend s : Simon, Herbert, 11o,jel s of Man# l1iley, MY, 1 9 5 7 , Part I, and Dahl ,
Page V  5 4
R • A • , 1 1 C a u s e a n d E f f e c t i n t h e S t u cl y of Po 1 i t i c s a n d D i s c u s s i o n " , i n C a u s e a n d E f f ec t , O . L e r n e r e d • , F r e e P re s s , N ew Yo r k , 1 9 6 5 •
( 1 0 ) H . R . T r e vo r  Ro pe r , t h e n o t e d t r a d i t i o n a l h i s t o r i a n ( an d a n t agon i s t o f To y n b e e ) , w r i t e s , o n p . v i . of H i s to r i ca l E s say s � Ha r pe r a n d Row , N Y , 1 9 6 6 : " I t i s p e r h a p s a n a c h ro n i s t i c t o w r i t e of a h i s to r i a n ' s p h i l o s o p h y . To rl a y mo s t p ro f e s s i o n a l h i s to r i a n s ' s pe c i a l i z e ' . T h e y c h oo s e a p e r i o d , sone t i me s a ve r y b r i ef p e r i od , a n d w i t h i n t h a t per i o d t h e y s t r i v e , i n  d e s p e r a t e con p e t i t i o n w i t h e v e r  ex pa n d i n g e v i d e n c e , t o k now a l l t h e f a c t s . T h u s a rn e d , t h e y c a n comfo r t a b l y s hoo t down a n y ama t e u r s w ho b l u n d e r o r r i v a l s w h o s t r a y i n to t h e i r h e av i l y f o r t i f i e d f i e l d ; a n d , o f cou r s e , k n ow i n g t h e s t r e n g t h o f mo�d e r n d e f en s i v e we a po n s , t h e y t h ens e l v e s k e e p p r u d e n t l y w i t h i n t h e i r own f ro n t i e r . T he i r s i s a s t a t i c wo r l d , a M a g i no t l i n e , a n d l a rge r e se r v e s wh i c h t h e y s e l dom u s e ; h u t t h ey ha v e no p h i l o s o p h y . Fo r a h i s to r i c a l p h i l o so p h y i s i n com pa t i b l e w i t h s u c h n a r row f ro n t i e r s . " N o t e 3 6 a n d n o t e 3 a r e a l so i n t e re s t i n g i n t h i s co n n e c t i o n ; a l s o , Pa g e s V 3 4 a n d V 3 5 o f o u r t e x t co n s i d e r i n t e r d i s c i p l i n a r y e f f e c t s i n s onewh a t mo r e d e t a i l .
( 1 1 ) R a i f f a , How a r d , Dec i s i o n A n a l ys i s : I n t ro d u c to r y L e c t u r e s Q.O. Ma k i ng C h o i c e s Under Uns;e r t a i nty, A d d i s o n  We s l e y , R e a d i n � , Ma s s . , 1 9 6 8 . T h i s boo k , a t l ea s t , i s c l e a r l y i n t en d e d t o conm u n i c a t e to a b ro a d e r comm u n i t y . T h e p h i l o s o p h i c a l f o u n d a t i o n o f t h i s v i ew o f p r o b a b i l i t y i s d e s c r i b e d i n Ky b u r g , H e n r y E . J r . , a n d Smo k l e r , H . E . , e d s . , S t u d i e s l.n S u h i ec t i ye Pro b a b i l i ty W i l ey , N Y , 1 9 6 4 . A n a t o l R a p o po r t h a s c r i t i c i z ed t h e a b u s e o f p ro b ab i l i s t i c co n c e p t s b y d e c i s i o n ma ke r s w h o do n o t f u l l y u n d e r s t an d t h e m ; s ee h i s S t r a tegy a n d Co n s c i enc e, H a r pe r a nd Row , N ew Yo r k , 1 9 6 4 , e s pe c i a l l y C h a p t e r , 1 0 . Neve r t h e l e s s , a f a l s e e s t i ma t e o f u n ce r t a i n t y m a y b e l e s s d a n g e ro u s t h a n a fo r ce d c ho i ce o f a b s o l u t e ce r t a i n t i e s ; Ra i f f a , i n a m emo co  a u t ho r e d w i t h Ma r c A l p e r t , h a s d i s c u s s e d i n d e t a i l t h e p ro b l em o f e d u c a t i n g a n d " c a l i b r a t i n g 1 1 d ec i s i o n  ma k e r s ( i . e . com p e n s a t i n g f or t h e i r o v e r co n f i d e n ce ) , to e s t i m a t e p ro b ab i l i t i e s mo r e r e a l i s t i c a l 1 y• ( S e e 1 1 /\ Pro � r e s s R e po r t o n t h e T r a i n i n g o f P r o b a b i l i t y A s s e s s o r s , " av a i l ab l e i n 1 9 7 1 a s a n u n pu h l i s h e d m a n u s c r i p t f r om t h e o f f i c e o f Prof . Ra i f f a i n t h e L i t t au e r B u i l d i n g , H a r v a r d
PaP,e V 5 5
u. ) Stil l , Rapoport ' s comme nts o n the hazar d s of m athematical appro aches are well worth not i n g, for those vs1ho \'JO U 1 d want to u s e s uch approaches .
(1 2 ) The programs E V A L, D I F F a n d D E L TA o f R aymond Hopk i n s were mad e available to us by Prof. Deutsch, Dept. of Government, Harvard. They have been des cribed in Ho n k i ns, R a yrio nd, " Pro j ectio n s o f Population Chanp;e b y Mob i l i z atio n and As sim i lat i o n, " R ehayioral Sci ence, 1 9 7 2, p. 2 s 1�. ( 1 3) Johnston, J. , Econometric Method s, McGrawHill , rJ Y, 1 9 7 2, Secon d Editio n, p . i x: "The purpos e of this boo k is to provide a f airly s elf  con t ained d evel opment and explorat i on of econometr i c method s • • • I t i s d i vided into two parts . Part 1 contains a f ull ex pos i tion of the norm al regre s s i o n model. Th i s serve s as an e s sen t i al bas i s f or the theory of econometr i cs in Part 2 . 1 1
(14) I t '.\l a s surpri s ing to f i nd the phras e 1 1 pat h anal y s i s " s o rare in boo ks up to 1 9 7 3. I n 1 9 7 1, we d i s cus sed the s u b j ect at length with Prof. Alker, at M I T, one of the ma i n expone nts of this approach, with Prof. Raymo n d Tanter then of the Center for Res earch in Conflict Resolutio n at the Univers i ty o f Mich i gan, and w i th s tudents ta k i n g "path a n aly s i s " a s a sub j ect in the I nter Univers ity Consortium for Pol i tical Research . I n a 1 1 cases, it w a s c 1 ear that "pa th analy s t s " w a s intended as a k i nd of ref i ned "cau s al an al y si s ", using regre s sion coe ff icients (or time  ser i e s re gre s sion coef f i cients, i n the soph i s ticate d versions ? ) a s indices of the size and d i re ction of inf luence. Simon, Bl al ock and Boudon h ave also been as soc i ated with "path a n alysis, " in dis cus sions at var i o u s universities . (1 5 ) See Bl al ock, note 9 . Sinple cor relation w a s u sually u s e d i n 1 1 caus 21l anal y sis " b a sed on Bl al ock ; this i s the un i var i ate s pecial ca se of regres sion analy si s .
( 1 6 ) As an arbitrary ex ample, p i c ked from a good antholog y of p a p ers i n this field, consider Sin ger, .J. David, ed . , Quanti t a t i ve I nternat i onal Pol l tics, lns il'!:hts lill.d Evidence . Free Pres s, NY, 1 9 6 8, tables of re sults on p . 2 7 8  2 8 1, p . 1 1 2, p . 1 5 2  1 5 3 , p. 1 9 9, p . :r n s , p . 6 5, p. 2 3 2. Statis tical si g n ificance score s (1 1 p 1 1 ) he r e often run to 1 1 . 0 5 1 1 ,
Page V5 6
or even to as poor as
( 1 7)
11
.10 1 1 •
lsard, Walter, Methods of B,eg i onal Analys i s : An I ntroduc t i on ..t.Q. R�glonal Sc i ence, MIT Press, Cambr i dge, Mass . , 196 3, Th i rd Pr i nt i ng, n . 2 2 .
(18 ) nuesenberry , .J . S., Frornr,, G . , Kle i n, L . R . , and Kuh, E . H . , e ds. , .I.he. Broo k i ngs Model : Some Further Results, RandMc Nally, Ch i cago, 1 9 6 9, p.2 9 6297. T h e full regressi on mod el , just i f i e d b y sol i d s i gn i f i cance i nd i cat i ons, d i d not perform as well as a " condensed "  dramat i cally reduced  rnod el, at f i r s t. Then "adjustr,en ts" were appl i ed, wh i c h d ra mat i cally i r,proved the f i t; these adjustments seer,ed to en ta i l mul t i ply i ng eac h coeff i c i en t by a constant, su i table to adjust the pred i c t i ons of eac h var i able to th e r i g h t level i n the f i rsth alf test d ata. On p.2 9 8 th ey c au t i on, even sti 1 1, : " If the mod el does i ndeed suffer from om i ss i on of i mpor tan t but slowlyc h an� i ng var i ables, then i t i s probably not very useful for longrun analys i s or projec t i on . " E st i r,at i ng or adjust i ng coeff i c i en ts to max i m i z e pred i c t i ve power d i rectly. over th e tr i al d ata, i s th e essence of our proposal i n sec t i ons (v i i ) and (x i ) of Chapter ( I I ) of th i s thesi s. (1 9 ) From Chapter (I I I) , "0 " i s s i rnply the m atr i x 1 1 8 1 1 i n th e un i var i ate case ; g i ven past error levels, a (tk ), i t i s the bes t bas i s even for pred i c t i ng x (t + l ) . However, loo k i ng at x (t+n) from x (t) only, we ge t a long �a,!.h of correlat i ons mult i ply i ng. out to �. • • thus to g e t the op t i mal pre d i c t i on of x (t+n) , one rnul t i pl i es one ' s pred i ct i on of x (t+n1) by ¢ .
(2 0 ) See note 1 9 of Chapter (I I ) of th i s thes i s .
(2 1) See sec t i on (x i ) of Chapter ( I I ) of th i s thesi s, for a way to i ntroduce a k i nd of " i nterest rate " or "d i scount fac tor ", to pre d i c t i ons of more d i stan t t i mes i n the future. Suc h procedures may be unavo i d able w h en a small amount of process no i se does ex i st, an d does accumu late through ti r,e .
(22) Mc Crac ken, Harlan L . , Keynes i an Econom i cs l.o. .t.1::1.g_ S tream of Ec;onom i c Though t, Lou i s i ana S tate U n i vers i ty Press, 1 9 6 1, p.5 1 : " Perh aps one of the f i nest con tr i but i ons Keynes mad e to econom i c
Page
V 5 7
theory and economic pol icy has been on the s ubject o f inve s tment. According to previo u s clas sical analy sis, s av ings, inv e s tment, and the rate o f intere s t all f itted in to the s tandard pattern o f d emand, s up�ly and price. A high rate o f intere st increa s e d s avings and d ecreas ed inve s tment, while a low rate o f intere s t d e creas e d s av ing s and increa s e d inve s tment, so it was a f unction o f price  the rate o f interest  to gravitate to the equil ibrium point w h ere s aving e q ualled inve s tment. There would be no s uch thing as oversaving or und e rinve s tment { i. e. d epre s sion ), as the y were continuo u sly being bro u ght into bal ance by an automatic regula tor. For Keyne s cl as s ical intere s t theory was in error at two basic points. Firs t, while a priori reasoning l eads to the natural conclus ion that a high rate o f intere s t stimulate s s aving and a low rate reduce s s aving, a pos teriori evid ence • • • " Comments in parenthe s e s our own. Gard n er Ackley d e s crib e s e q uivalent ideas in Ma croeco nom i c Theo ry, McMillan, r� Y , 1 9 6 1 , (Twel f th Print ing 1 9 6 7 ) p. 1 5 4 l S S : "Wick s ell ' s anal y si s { the clas sical anal y s is > . • • gav e u s, as has been stre s s ed, a rudimentary theory o f the aggre gate d emand for goo d s. This d emand consists of two main division s : cons 11r1er d emand and inv e stment d emand. Each o f the se d emand s was conceive d to be intere st elas tic : the l ower the intere s trate, the gre ate r the inve s tment demand ; and the greater the con s umer d emand, too { the l atter idea is, o f cours e, m erely a re statement o f the i d e a that s aving d epend s negativel y on the intere st rate ) . • • If either type o f d eriand d ecl ine d, the r e s ulting f al l in the rate o f in t ere s t woul rl s timulate them both, and s hif t re s o urce s to the one which had not rlecline d . If, hm1 cv er, f or any reason { particularl y expansion or contraction o f the money s uppl y by the ban k s ) the rate o f intere st were prevente d from performing this regul atory f unction, aggregate d emand • • • woul d be altered • • . B u t if wag e s and price s should not d ecl ine (eno u gh ) • • • \vor kers wo uld become uneriplo y e d, and real a s \.\l ell a s mone y income woul d be cut. "
{ 2 3 ) The history o f the s e studie s is rather complex. The majo r initial s tu d y, by Simon Kuznets, Na tional Income : A S urve� Q.f. Findings, rr n E R, NY, 1 9 4 6 , u s e s technical language dif f icult to s ummariz e h ere. Elizabeth W. Gil bey, the E conomics
Pag e V5 8
of Consumption, Randor, Hou se, NY, 1 � 6 8 , p. 2 5 : " l n attempting to test this hypothes i s, contradictory resul ts arose from the use of tine series and crosssectional data. Simon Ku znet ' s study of data going al l the way back to 1 8 7 0 showed that the percentage of aggre gate incone saved had in fact remained constant in the United States. " (The contradiction invol ved the distribution of saving across hous ehol ds. ) A more recent survey is Patinkin, Money, Interes t i!lli! Prices, Harper and Row, N Y, Second E dition, 1 9 6 5, p. fl 5 1 6 6 1i. Patin kin discusses l argel y the "weal th eff ect", based on the " Pigou effect", a more recent attempt to re surrect cl assical ideas ; on p. 6 5 6 and 6 5 7 Patinkin cites numerous studies which measure w eal th as real assets times interest rates. On p. 6 6 3 he describes his own resul ts for "beta" and "al pha", the forner w h ich he e quates ..,, i th 1 1 Y L "(income), and the l atter, at the top of p. 6 5 9, defined as beta tines interest rate. Thus the l atter resul ts expl icitl y measure the hypothesis that interest rates affect the pe rcent §ge of incor,e save d, whil e the former do measure something cl osel y rel ated; al so, the studies of Gol dsmith cited by Patinkin r eaffirmed the idea of a "const ant savingincome ratio. " Patinkin describ es his own studies, anrl some of the previous studies, as showing l arge and significant effects by var i abl es d er i ved from interest rates ; howev er, the actual regression coefficients of al pha ran to . 0 4. 0 8, at the most, much smal l er than the coeff icients of beta, which was al ready a l arger number to b e �in with.
(2 1J) Many woul d identif y the "l ib eral " Roosevel t with the "l iber al I I Keynes. However, U S G N P d ata indicate rather strongl y that the main recovery from the Depre ssion coincid e d with major mil itary spending induced, not by econonic theory ( though Keynes ' theory might hav e recommend e d it, given no al ternative spending options on the same scal e ), but by Worl d War II. Keynes i an theory, in many respects, was not ful l y acce pte d in the US until John Kennedy became presid ent . Schl esinger, Arthur F. , A Thousand Davs, HoughtonMiff l in, Roston, 1 9 6 5 , p • 1 0 0 5 : "The ( tax cu t ) bil l made sl ow progre ss through Congress. Pu bl ic reaction at first was muted. Kennedy used to inquire of the profe ssors of the Council what had happened to the several mil l i on col l e g e stud ents who had
Pa,P,;e V5 9
presumably been taug ht the new eco nomics • • • Still • • . o n Septemb er 2 5, 1 9 6 3, the worst was over • • • The Yale speech h ad not been in vain ; and the American government, a generation after General Theory, h ad accepted the Keynesian revolutio n . " Reearrling Key nes an d Roo s evelt, Sch lesinger h as w r itten, in The Age of Roosevelt, HoughtonMif flin, Ros ton, 1 9 6 6, Vol. Ill, p. 2 3 6 : "T h e First New Deal, in the main, d istru s ted s pen din g . Its co nse r v atives, like Jo h n so n and Moley, were orthodox in their f iscal vie\\/S and wanted a balanced bud get; and its liberals, like Tugwell and La Follette, d isliked spen din g as a drug which gave the patient a f alse sense of wellbein g before s urgery could be completed. " p . li 0 3: " S h ortly after Roo sevelt ' s inauguratio n (1 9 3 3) Keyn e s spo ke o n ce again in a brilliant pam phlet called ' The Means to Prosp e r ity . ' Here h e argued with new f o rce an d d etail f o r public spen ding as th e way out o f the depressio n . Em plo yin g the concept of the ' multiplier ' , i n t r o d u c e d b y h i s s t u r.l e n t • . • 1 1 p • 11 0 4 : " ' U n f ortun ately, ' Key nes wrote in A pril 1 9 3 3, ' It see ms impossible in the world o f today to find an ythin g between a government which does nothing at all and o n e which goes right o f f the deep en d ! ' " p. 4 0 5 : " • • • on May 2 8 , 1 9 3 4, Key nes came to tea at the Hhite House . T h e meetin g d oes not seem to h ave been a success . " p , l.f 0 6 : " • . • to Frances Perkins Roosevelt comnlained stran gely, ' He lef t a who l e rigamarole o f figures . He rrius t be a matheriatician rather than a political economist. ' "
(2 5 ) Solow, Robert M. , "Tech n ical Change an d the Aggregate Productio n Functio n ", Reyiew Q.f. Economic statist ics, 1 9 5 7, p . 3 1 2  3 2 o . So 1 ow wr ites : 1 1 Mot only is delta A ove r A (the percentage in crease in the autonomous terr,) uncorrelated with K/ L, but one migh t almost conclude from the grap h that d elta A over A is essentially a con s t ant in time, ex hibiting more or less ran doM f luctuations about a f ixed mean . " Loo kin g at f igure 3, o n e notes a pos sible excentio n to this, which Solow admits is a very tentativ e co nclusio n : the growth of th is term might h ave actually b e e n f aster, slightly, during the depths o f the depressio n (actually, laggin g it by three years in the graph ), th an un der normal con ditio ns .
Pa.�e V 6 0
(2 6 ) E ssays in Sociolog�. f rom Max Weber, introduce d b y Talcott Parsons . Weber ' s con ce pt o f "ratio n aliz ation", as a tn�nd extendi n g from the 11 1 Conce pt I of Pl ato 11 to the "cage" o f modern machine civiliz ation, doe s amou nt to a longterm vision of history .
(2 7) See note 7 . { 2 8 ) See note 4 .
(2 9 ) See note 5 .
(3 0) McNeil, W . H . , The Rise of the \'JesL Mentor, NY, 1 9 6 5 . A s ide from the title, the themes are a bit too compl ex to summarize here . Man y of them are reminiscent of Turner, note 6 . Throu ghout the boo k, however, McNeil does kee p returning to the theme of human societie s adapte d to the p astoral niche as providing the soil on which new civil iz atio ns may develop or to which old civiliz ation s may spre ad, as the old heartl and decays . (3 1 ) E isenstadt, S . N . , The Poli tical Svstems Qf. f:npires, Free Pre ss of Gl encoe, 1 9 6 3 . Chapter 2 attempts to e xpl ain the "un ivers al states" of the Toyn bee and S pengler theories, al most the same societies . (3 2 ) See notes 4 and 5 .
{ 3 3) Lorenz, Konrad, On Aggression, H arcourt Rrace and Worl d, N Y, 1 9 6 6 , transl ate d b y M . W il s on • This source is al read y pop ular among some political scientists . d u s t as rel evant may b e Sim p son, George Gaylord, The Ma i or Features of Evolu tio n, Columbia University Pres s, N Y, 1 9 5 3, p . 3 9 1 : "The po pul ation s making a quantun shift (e . g. evol ution of human intelligence) do not l ose adaptation altogether ; to do so is to become extinct . I t is al so clear that .th.a direction Qf. change is adaptive, unless il � rn beginning • • • Yet the ve r y f act that selection pre s su re Is strong can onl y be a conconittant of movement from a more poorl y to a better adapte d status. Sele c tion LlQ.t. l inear h.Y..t. cent r ipe tal l:Lb,e.n adaptation perfected . I t is the ' stabilizing sel ection ' • • • The quantum chan�e is a b reakthrough from one port i on of stabilizi n g sel ection to another . " The
uu
Par,;e V6 1
Indian caste systen, or the e arly C aribbean system of C arib pre d ators and Arawaks, are interesting e x am ple s of centripe t al d evelopment among humans in relatively static ecolo�ie s / economies. p. 3 9 2 : " Quantum evolution usually is and at some lev el it m ay always be involved in the o pening or socalled ' explosive ' ph ase of adaptive rad iation. T h e relative rapidity with which a variety of ad aptation zone s are t h en occu pied s eems q uite ine x plicable e xce pt by a series of (? ) and also, often, successive quantum s hifts into the varied zones. The rates thereafter slow dovm. 1 1
(34 ) See notes 3 7 , 3 and 1 0 . Also: Aron, R aymond , Main Sociological Though t, Vol. I , Basic C urrent� Rooks, 1 9 6 5 , trans lated by Harold Weaver, p . 4 : " American sociologists, in my own e xp erience, never tal k about laws of h is tory , first of all because th e y are not acquainte d with th em, and nex t because t h e y do not b elieve in th e i r e xistence. "
.Lo.
(3 5 ) T h e "multicausal ap proach " h as appeared t n h istorical rese arch , if our memory is correct, but th e sociologists  w ho find it h arder to retreat into simple narrativ e  h ave spoken much riore about the idea. See , for e x ample , Vernon, Glenn M. , Human Interaction: An Introduction ..:tQ. Sociology , Ronald Press , N Y , 1 9 6 5 , p. 3 0 and p. 8 08 1 especially . See also Mac i ver, Robert M. , Social Causation, Ginn and Co. , Boston, Mass. , 194 2.
( 3 6 ) Trevelyan, G.M . , Lill 8utob i ogra12hy gOQ Q ther Ess ay s , Longman, Green and Co. , 1 9 4 9 , London, p. 9 1: " T h e endles sly attractive game of speculating on th e mighthavebeens of history can ne ver take us very far with sense or safety. For if one thing h ad been different, everything would th enceforth h av e been different  and in what way we cannot tell. • • As serious stud ents of history , ru.J. tl.e. m do is to watch and ..:tQ. inve s tigate, how t hing led ..:tQ. anoth er course act u ally ta ke n . T his pursuit is rend ered all th e more f ascinating and ronantic because w e know how very nearly it was all completely different. E x cept perh aps in terms of ph ilosoph y , no event \I/ as ' in e v itab le • ' 1 1 H is tor ians h ave of ten d is cu ss e d th e "turning points, " tiries w h en t h e susequent course of events vmuld h ave be en very d ifferent il
.Lo. � ™
.Lo. �
Page V6 2
small events h ad wor k e d out dif f erently ; see, for example, Handlin, O s car, Choice QL Des t i ny : Turning Points ln American Histo ry,. A tlantic Monthly Pre s s, Boston, Mas s . , 1 9 5 4 . However, as th e las t two chapters of this example make clear, it is us ually not consid ered acceptable to imagine j us t how the s ubs e q uent ev ents mi�ht h ave been dif f erent, concretely .
(3 7 ) It has been s ugg e s ted that diff erences in be h avior of " high " and "low " situations may make such a treatment valuahle, or that an actual division of the 1;1Jorld into "hig h " and "lov.r" make s it d esirable. However, j us t because o ur s ample i s weak in the mirl