Beyond regression : new tools for prediction and analysis in the behavioral sciences

PhD thesis with early description of backpropagation algorithm for neural networks, also known as reverse mode automatic

341 48 36MB

English Pages 454 Year 1974

Report DMCA / Copyright


Polecaj historie

Beyond regression : new tools for prediction and analysis in the behavioral sciences

Table of contents :
PJWthesis_best_front_to_xi......Page 2
PJWthesis_best_xiui_to_7_original......Page 13
PJWthesis_IIp1_to_45_best_original......Page 28
PJWthesis_IIp46_to_99end_best_original......Page 73
PJWthesis_chIII_best_original......Page 127
PJWthesis_chIII_p44_to_end_a_fixbest_original......Page 170
PJWthesis_chIV_best_original......Page 179
PJWthesis_chVI_p1_to_70_best_original......Page 249
PJWthesis_chVI_p71_to_end_best_original......Page 319
PJWthesis_chV_best_original......Page 389

Citation preview

See discussions, stats, and author profiles for this publication at:

Beyond regression : new tools for prediction and analysis in the behavioral sciences / Article · January 1974 Source: OAI





2 authors, including:

Paul Werbos National Science Foundation 164 PUBLICATIONS   11,863 CITATIONS    SEE PROFILE

Some of the authors of this publication are also working on these related projects:

New functional explanation of how brains work View project

Hard core artificial neural networks View project

All content following this page was uploaded by Paul Werbos on 22 June 2016. The user has requested enhancement of the downloaded file.




A thesis presented by

Paul John Werbos to

The Committee on Applied Mathematics

In partial fulfillment of the requirements for the degree of

Doctor of Philosophy In the subject of statistics

Harvard University

Cambridge, Massachusetts August, 1974

Copyright reserved by the author


PREFACE The Initial Impetus for this thesis came from a

suggestion by Prof. Karl Oeutsch, my thesis supervisor, that I l ook more cl osel y at the prediction of national

assimil ation and pol itical mobil ization, by use of the Deutsch-Sol ow model ; his co11111ents have been of major

hel p to me with al l the empirical work on national ism,

and In revising the original structures of Chapters (V) and (VI).

The earl ier work reported in section (ii) of

Chapter (VI) was carried out under his supervision and

support, through a research project funded by the Cambrl dge Project.

I have been surprlsed more than

once by the sensitivity and receptiveness of his

Intuition, in suggesting areas of research which l ooked

unworkabl e at first but which l ed In the end to useful

and surprising Innovations.

The opinions expressed In

Chapter (V), however, remain my own responslbll lty,

particul arl y In their more bul l -headed aspects.

Professor Mostel l er, In the Harvard Statistics

Department, has hel ped by Introducing me to current

work on robust estimation, by suggesting the use of

simul ation tests, and by monitoring the general content

of the first four chapters.

Professors Anderson and

Bossert, of the Commit tee on Appl ied Mathematics, and


Professor Demps ter, of the Harvard S ta tis tics

Depar tmen t, have hel ped me to find a more orderl y way

of presen ting the ma thema tical Ideas here, hel p which

was sorel y needed in the summer of 1973. To the

Cambridge Projec t, and i ts sponsors a t ARPA, mus t be

given thanks, no t onl y for al l the compu ter time used

in this research and in I ts documen ta tion, bu t al so for the oppor tuni ty to transl a te some of these ideas from theory in to opera tional sys tems wi thou t the ex tensive

del ays more common In such research; wi thou t their

suppor t, this research coul d never have been brough t to

a s ta tP. final enough to al l ow i ts con tinua tion.

The personal and financial condl tions which l ed to

the compl e tion of this thesis were ra ther compl ex, and

deb ts are owed to more individual s than shoul d properl y

be ci ted here. Cer tainl y

have a s trong deb t to my

paren ts In this respec t; a deb t to Richard Ney, whose

ideas on the s tock marke t financed a l arge par t of my

ac tivi ty In this period;

a moral deb t to Prof. Nazli

Choucrl, of M. I. T. , who, In a brief discussion in the

spring of 1973, encouraged me to con tinue this work;

deb t to a col l eague, Gopal Krishna, for hel ping me see

more cl earl y tha t the psychol ogical hurdl es I faced a t

firs t were no t to tal l y unique;


a deb t to Dr. Karreman,

of the Bockus Research Ins ti tu te, who, In one of those

( i V)

1960 1 s programs to encourage high school students, connected with the Moore School of Electronics at the Unlversl ty of Pennsylvania, got me started asking

questions about the phenomenon of Intelligence,

questions which led me to the dynamic feedback concept,

which was applied only later to formal statistics.






TABLE OF CONTENTS................................ LIST OF FIGURES............................... LIST OF TABLES..................................




(vi11) (Ix)





Ct) INTRODUCTION...........................


(ii) ORDINARY REGRESSION...................




(iv) MODELS WITH MEMORY...................








NOISE MODELS••••••••••••••••••••••••




(Ix) PATTERN ANALYSIS .....................








(Vi )






FOOTNOTES TO CHAPTER (II) •••••••••••.•••••



I 11-1

( I ) INTRODUCTION••••••••..••..•.••••••••••

I 11-1









III -32



FOOTNOTES TO CHAPTER (II 1) ...............




Ct) INTRODUCTION...........................



I V-5

(tit) DESCRIPTION OF RESULTS..............







(1) INTRODUCTION AND SUMMARY.. •.••.•••••••••










FOOTNOTES TO CHAPTER (v) .••.•••.••••••••.••



Vl -1


V 1-1


Vl -14


Vl -49


Vl -85

Local vs. Regional Language-Dominance as a Dynamic Factor ••••••••••••.•••••

Vl -90

Some Operational Dimensions of Nationalism: Narcissism, Stereotyping and Aggression.......................

Vl -94

Toward More General Models...........




V 1-107

FOOTNOTES TO CHAPTER (VI) ................

Vl -133

( viii )




V-1: Pathways of Correlations With Nois y Data...


Legend to Figures Vl-1 Through Vl-4........


Vl-1: RMS Average Errors In Long-Term Predictions of As s lrillated Populations , In Percentages


Vl-2: RMS Average Errors In Long-Term Predictions of Differentiated Populations , In Percentages ............................


Vl-3: RMS Average Errors In Long-Term Predictions of Mobilized Populations , In Percentages..


Vl-4: RMS Average Errors In Long-Term Predictions of Underlying Populations , In Percentages.


( I X)



Tlt 1 e

I1-1: Table of Operations for EQuatlon ( 2.11 ) •• I 1-2: Table of Operations for EQuatlons ( 2.14 ) ••••••••••••••••••••


11-3 1, -32

11-3 : Hypothetical Example of "Ideal Types " ••••

I1-5 7

I1-4: Table of Operations for EQuatlons ( 2.17 ) •••••••••••.•.•••••



I1-5: Table of Operations for EQuations ( 2.18 ) •••••••.•••••••••.•.


11-74, -75

I11-1: Example of Convergence Res ults With Early Vers ion of the ARMA Es timation Routine. I11-44 I I1-2: Sample Ses s ion With Convergence Res ults For Final Version of the ARMA II1-45 to 47 Es timation Routine............... IV-1: Average coefficient es timates , and dispers i on errors of es t i mates , for the s i x es timation rout i nes and twelve s imulated process es defined In

section (it) •••••••••••••••••••••••••••••

IV-3 4

IV-2: Prediction Errors as Defined In Section ( t I ) .....•...••..•.........

IV-3 5 to 40

IV-3 : Estimates of Growth Factor, "c"•••.

IV-41 to 46

IV-4: Errors In Prediction & Mis cellany••

IV-47 to 70

V I-1 : Samp 1 e of Res u 1 ts fr om O F. LTA ••. •••.•.•.••


Vl-2: Regress ion Statis tics for Mobil i zed and Underlying Populations •••••••••••••••••••


Vl-3 : Regres s ion Statistics for As s imilated and Differentiated Populations •••••.•••••••••


( X)

Table Number



Vl-4: Long-Term Prediction Errors with SERIES In Predicting Moblllzatlon•••••••••••••••


Vl-5: Long-Term Predictions Errors with SERIES In Predicting the Underlying Population••


Vl-6: Long-Term Prediction Errors wt th SERIES In Predicting the As s imilated Population.


Vl-7: Long-Term Prediction Errors with SERIES In Predicting the Differentiated

Population •.•••••••••••••••.•••••••...•.•

Vl-8: RMS Average Errors of Predictions of Mobilized and Underlying Populations ,


by EXTRAP ••••••••••••••••••••••••••••••••


Vl-9: RMS Average Errors of Predictions of Ass imilated and Differentiated Populations , by EXTRAP ••••.••••••..••••••


Vl-10: Indices of "M" ( Moblllzatlon ) and of "A" ( Ass imilation) From the Deuts ch-Kravitz Data Us ed For Runs Reported In Tables Vl-1 to Vl-9 ••••••••••••••••••.••


Vl-11: Statis tics Concerning Regres s ion for Mobilized and Underlying Populations .•••


Vl-12: Statis tics Concerning Regress ion for As s imilated and Differentiated

Po pulation s•....•...••.••....•.•.•..•...


Vl-13: ARMA Models for Mobilization Proces s es ••


Vl-14: ARMA Models of the Ass lnilatlon Proces s .


Vl-15: RHS Averages o f Percentage Errors With Long-Term Predictions of Moblll7.ed and Underlying Populations , Bas ed on


Vl-16: RMS Averages of Percentage Errors With Long-Term Predictions of Mobilized and Underlying Populations , Bas ed on the

ARMA Method •••••••••••••.•••••••••••••••


(XI )

Table Number



Vl-17: RMS Averages of Percentage Errors WI th Long-Term Predictions of Mobilized and Underlying Populations , Bas ed on the Robus t Method ( GRR) •••••••••••••••••••••


Vl-18: RMS Averages of Percentage Errors With Long-Term Predictions of As s imilated and Differentiated Populations , Bas ed on Regress I on . . • . . • • • . • • . • . • • . . • • . . . • . . . . . •

Vl-5 7

Vl-19: RMS Averages of Percentage Errors With Long-Term Predictions of As s imilated and Differentiated Populations , Based on the AR��A Met hod • • • • • • • • • • • • • • • • • • • • • • . • • • • • •

Vl-5 8

Vl-20: RMS Averages of Percentage Errors With Long-Term Predictions of As s imilated and Differentiated Populations , Bas ed on the Robus t Method ( GRR) ••••••••••••••••••••.

Vl-5 9

Vl-21: Predictions of Future Mobilized and Underlying Populations , by the Robus t Method ( GRR) ••••••••••.•••.••••••.•••••••

Vl-6 0

Vl-22: Predictions of Future As s imilated and Differentiated Populations , by the 11 11 ext2 vers ion of the Robus t Method ( GRR)

Vl-6 1

Vl-23: Definition of Mobilization Variables and Spans of Years Used For Runs Oes criberl In Tables Vl-11 through Vl-22...••••.•••


Vl-24: Definition of As s imilation Variables and Spans of Years Us ed For Runs Des cribed In Tables Vl-11 through Vl-22...........

Vl-6 3

Vl-25 : Gravity Model Correlations .............


(Xi f )

SYNOPS I S This thesis provides a broad, coherent exposition of a new mathematical approach to social studies and to

related fields.

This work began as an attempt to apply the

classical techniques of statistics and econometrics to

the Deutsch-Solow model of nationalism, In order to

turn this model Into a workable tool for predicting the

political future.

I n the course of this effort, It

became clear that the usual statistical methods do a

poor job in fitting dynamic models to real-world data,

If we Judge these models by their ability to make good

predictions across time.

I t also became clear that

newer and better methods would not be feasible,

economically, unless we could Invent less expensive

algorlthms, too.

Thus the goals of this thesis are

five-fold: (l) to describe new ways of fitting models to data;

( It) to define new algorithms which make

these methods feasible; (iii) to Introduce evidence for

the superiority of these methods, both for real-world and for simulated data;

( Iv) to discuss the

applications of these Ideas, in broad terms, to social and even biological sciences;

(v) to discuss the new

work on nationalism which has led us In these



Let us begin with the first three goal s.

We have studied not one, but two, new approaches to fitting model s to data.

First, we general ized the

work by Box and Jenkins, on "ARMA" processes, on "mixed Autoftegresslve Moving-Average processes."


( I I I) discusses the mathematics of this approach, In


It shows how an ordinary, multivariate

autoregressive process, observed by way of "noisy" data Cl .e. data measured wtth random measurement errors or

conceptual distortions), becomes a "vector ARMA

process. "

It then shows how to appl y "dynamic

feedback" to estimate the coefficients of such

processes, at much lower costs than were possible


the resulting computer program Is now

avail able to the public through the M IT Cambridge Project Consistent System.

In Chapter (V), we discuss

why we considered this approach Important for quantitative pol itical science.

In studtes of s Imu 1 a ted. data, the ARMA approach

general ly yiel ded only hal f as much error as regression

did, In estimating the coefficients of a simple model;

It was more efficient In making use of l imited data and It led to l ess systematic bias, both.

However, with

real -worl d data, the ARMA approach did littl e better

( xi v)

than regression In making l ong-term predictions;


error dlstrtbutton curves for ARMA are onl y about 10%

smal l er than those for regression, and the curves are uniforml y cl ose to each other.

After re-examining these empirical resul ts and the

theory of maximum l ikel ihood Itsel f, we formul ated a

new, more radical and more successful approach to the fitting of model s.

I n essence, the idea ts to maximize

l ong-term predictive power directl y, over the known data, Instead of maximizing formal l tkel Ihood.

Formal l y, this Idea rests upon the aprlort expectation

that many social processes are governed by rel ativel y

deterministic underl ying trends, obscured both by

measurement noise and by transient deviations of great compl exity.

The qual ttattve, pol itical basts of this

idea ts discussed In Chapter (V).

Sections (vii) and

(xi) of Chapter ( I I) discuss the statistical basts.

The "measurement-noise-onl y" approach strongl y

outperformed both ARMA and regression, over both

real -worl d and simul ated data. I t outperformed ARMA

most strongl y In our most compl ex simul ated processes, which seem most representative of the real worl d.

According to our error distribution r,raphs, the new

method cuts In hal f the errors In l ong-term predictions of real -worl d variabl es;

the biggest reductions occur


with those variables, such as national assimilation, and with those cases, near the middle of the

distributions, for which the simple models of the first half of Chapter (VI) can do an adequate job of


These empirical results for our new approach came

from special computer programs, which exploited the simplicity of the models under study.

In Chapter (II),

we discuss how the algorithm of "dynamic feedback" can be used to estimate more general

"measurement-noise-only" models, at minimal cost,

especially for models which are very Intricate,

nonlinear and nonMarkhovlan;

we also discuss how the

algorithm can ftt more conventional models, can

optimize policy, and can perform "pattern analysts" - a dynamic alternative to factor analysts.


feedback" ts essentially a technique for calculating

derivatives inexpensively, for use with the classic

method of steepest descent. In section (iv) of Chapter

(111 ), and in section Ctt I) of Chapter (VI), we discuss

how our experience here with steepest descent has led

us to new ways of adjusting the "arbitrary convergence

we Ights" of steepest descent;

these methods speeded up

the process of convergence by a large factor.


Appendix to Chapter (I I) discusses extensions of these

C xv I )

methods , for the general, nonl i near cas e. From a pract i cal poi nt of v i ew, the appl i cat i ons of s uch mathemat i cal Ideas In the s oc i al s c i ences rema i n controvers i al. The extreme pos i t i ons of "behav i or i s m" and "trad i t i onal i sm" rema i n popular; d i v i s i ons s t i ll ex i s t between quant i tat i ve and verbal s tud i es of s oc i al behav i or.

In Chapter CV ) , we

des cr i be how our mathemat i cal tools m i ght f i t In, i n the broader context of s oc i al s tud i es and pol i t i cal dec i s i on-maki ng.

From a ut i l i tar i an and Bayes i an po i nt

of v i ew, we s ugges t a methodolog i cal approach Intermed i ate between "behav i or i sm" and "trad i t i onal i s m, " i n wh i ch the d i fferent frameworks m i ght be Integrated more clos ely w i th each other.


s ketch i ng out the pos s i b i l i t i es for s uch an Integrated framework, we als o po i nt out that the algorithms of Chapter (II ) , taken as part of "cybernet i cs, " have a d i rect value as parad i gms , to help us unders tand the requi rements of the complex Informat i on-proces s i ng problems faced by human s oc i et i es and by human bra i ns . We als o ment i on poss i ble appl i cat i ons to other f i elds , Includi ng ecology. F i nally, Chapter (VI ) pres ents our emp i r i cal and analyt i c work on nat i onal i sm. In s ect i ons (t i ) and (t i t ) , we d i s cus s our s ucces s

( xv I I )

In making l ong-term predictions of national assimil ation and mobil ization, by use of the

Deutsch-Sol ow model . Tabl e Vl-8, for exampl e, gives the

average errors In predicting the



popul ation assimil ated, over time periods on the order of thir ty years; these errors are uniforml y distributed between 0% and 2%, except for four outl iers (20% of al l

cases) at 2.68%, 3. 0 8%, 3.09% and 6.21%.

The fail ures

of these predictions are al so Informative; they give us

a picture of those external factors which real l y do

have the power to divert the processes of assimil ation and mobll ization from a steady course. We have

tabul ated the predictions of the "robust" method for

the years 1980, 1990 and 2000;

these predictions are

subject to caveats discussed In section ( Ill). In section (Iv) , more compl ex model s of

national ism are synthesized, by drawing together Ideas

from the l iterature on this topic and Ideas from social psychol ogy.

The future possibil ities of these model s,

In verbal and quantitative anal ysts, are sketched ou t

briefl y.

These model s attained high l evel s of

"statistical significance, " and l ed to noticeabl e

Improvements In l ong-term prediction, in empirical

tests described In section (v); however, these tests,

based on cl assical estimation routines, are regarded as


prel iminary. The communicat i ons concepts of section ( Iv), as appl ied In section (v), al so yiel ded an

expl anation of one of the Inconsistencies observed with "gravity model s" In previous research; this expl anation was val idated empiricall y.

The MIT Cambridge Project has begun Impl ementing

the al gorithms of Chapter Cl I).

Page I -1

(I) GENERAL INTRODUCTION AND SUMMARY The original purpose of this research was to apply

the classical techniques of statistics and econometrics to the Deu tsch-Solow model of nationalism, In order to

turn th Is mode 1 In to a work a b 1 e too 1 f or pre d Ic t In g the

po 1 ItIca 1 fut u re •

I n t he course of thI s res earc h, It

became clearer and clearer that the usual statistical

m ethod s do a poor job In fitting d ynamic models to

rea 1-wor 1d data, If we judge these mod e 1s hy the Ir

abll tty to make good predictions across time.

Furthermore, It became clear that newer and better

methods would not be feaslhle, economically, unless we could also develop new, less exnenslve algorithms.

T hus the goals of this thesis have heen five-fold:

(I) to descrthe new ways of fitting models to data; (II) to define the new algorl thms which make these

methods feaslhle; (Iii) to Introduce evidence for the superiority of the methods (see Table IV-1, on Page

IV-34, and the graphs which start on Page Vl-3);

(Iv) to discuss the applications of these Ideas, In

broad terms, to social and even biological sciences;

(v) to discuss the new empirical work on national ism

which has led us In these directions.

Let us begin by discussing the first three gonls.

Page I-2

Strictly speaking, we have s tudied not one, but two, new approaches to fitting models to data, In pol ltlcal s cience. The first approach was es sentially an extension of work by Box and Jenkins , on "ARMA"

proces ses , on "mixed Auto.Regres s ive Moving-Average

proces s es."

Chapter (111) dis cus s es the mathem atical

s tatis tics of this approach, In detail.

It begin s by

pointing out that an ordinary, m ultivariate a utoregres s ive proces s , obs erved by way of data which were not meas ured perfectly accurately (I.e. meas ured with random meas urement errors or conceptual dis tortions ), turns Into a "vector mixed autoregressive moving-average proces s ."

It then proceeds to s how how

the algorithm of "dynamic feedback, " discuss ed In Chapter (II) , can be applied to es timate the coefficients of s uch a process , at a lower cos t than was pos sible with previous methods ;

the res ulting

procedure has been tes ted, and made available to the general us er, as part of the MIT r.amhridge Project Cons is tent Sys tem.

In Chapter CV) , we discus s why this

approach seemed Important to us, In quantitative po 1 It Ica 1 s c I ence . In studies of s imulated data, this approach did quite a bit better than the bes t form of regress ion. In Table IV-1, on Page IV-34, one can s ee that the

Page I-3

average es t Imates ("av") produced by "arma, 11 for the coefficients of a s imple model, were much closer to the true values than were thos e of "reg, 11 In the twelve cases s tudied; this Implies much less sys tematic bias. Als o, the dispers ion of the


arma11 es timates was about

half as much as that of "reg, 11 on the whole;


Implies les s random error In es timation, or, In other words, greater practical efficiency In making use of 1 I m Ited d ata •

However, In s tud Ies of rea 1 -wor 1 d data ,

the long-term predictions by this method were only s lightly better than thos e by regres sion;

for example,

In Figures Vl-1 through Vl-4, on Pages Vl-4 through Vl-7, one can s ee that the error distribution curve for "ARMA" Is only about 10% smaller In area than that for "Regres sion, " and that the curves are uniformly close to each other. After re-examining the empirical res ults of this research, and the concepts of maximum likelihood thems elves , we have arrived at a new, more radical and more succes s ful approach to the fitting of models .


es sence, the Idea is to maximize long-term predictive power directly, over the known data set, Ins tead of maximizing formal likelihood. Formally, this Idea res ts upon the aprlorl expectation that many s ocial proces s es are governed by relatively deterministic underlying

Page 1-4

trends, obs cured both by meas urement noise and by transient deviations of a very complex s ort.


qualitative, political basts of this Idea Is discus sed tn Chapter CV).

The statistical basis Is discuss ed In

sections (vii ) and (xi ) of Chapter (11) . The "meas urement-noise-only" approach perf ormed much better than hoth AR�A and regres sion, on both real-world and s imulated data. In Tahle IV-1, "ext" Is markedly s uperior to "arma " In es timating coef f icients; In the text of Chapter (IV ) , we note that this s uperiority is greates t for the s imul ated data generated by the more complex proces ses (11 and 12) , process es which may be more repres entative of the real world. In Figures V 1-1 through VI -4, the "meas urement-noise-only" approach, des cribed as the "robust" approach, had much lower dis tributions of error than ARMA or Regres sion did, In long-term Prediction.

If one allows for the spread of the

vertical axis In thes e graphs , one can see that the "Rohus t" method cuts the long-term prediction errors In half, roughly;

the bigges t reductions occur with those

variables, s uch as national ass imilation, and with thos e cases , near the middle of the distributions, for which the simple models of the f irs t half of Chapter (VI) can do an adequate job of prediction.

Page I- 5

The empirical results for the "robust" meth od were al 1 based on special computer programs, designed to take advantage of the simplicity of the models under s tudy.

In Chapter (I I), we discuss how the general

algorithm of "dynamic feedback" can be used to estimate s uch "measurement-noise-only" models, at a minimal cost, especially for models which may be very Intricate, nonlinear and nonMarkhovtan;

we also

discuss how the algorithm can be used to fit more conventional models, to optimize policy, and to perform "pattern analysts" - a dynamic alternative to f actor analysis.

The technique of "dynamic feedback" ts

essentially a technique f or calculating derivatives Inexpensively, to be used with the classic method of s teepest descent. In section ( Iv) of Chapter ( 1 1 1) , and In section (t it ) of Chapter (VI) , we point out how practical experience with steepest descent In this context has led us to new ways of adjusting the "arbitrary convergence weights" of steepest descent; these methods appear to have the power, In normal, practical situations, to speed up the process of convergence hy a large f ac tor. In the Appendix to Chapter (II) , we mention a fP-w generallzatltons of these methods, which may be helpful In the general, nonlinear case.

Page 1-r;

From a practical point of view, the applications of thes e and other mathematical approaches in the social s ciences remain a s uhject of dispute. The extreme pos itions of "behaviorism" and "traditionalism" remain popular;

a division s til l tends to exis t

between quantitative and verbal s tudies of s ocial behavior.

In Chapter (V) , we des cribe the way that our

mathematical tools might fit in, in the broader context of s ocial studies and political decis ion-making.


a utilitarian and Bayes ian point of view, we s ugges t a methodological approach intermediate between "behavioris m" and "traditional ism, 11 in which the different frameworks might he integrated more closely with each other.

In s ketching out what s uch an

integrated framework might l ook like, we als o point out that the algorithms of Chapter (II ) , taken as part of "cybernetics , " may have s ome direct value as paradigms , to help us try to understand the requirements of the complex information-proces s ing problems faced by human societies and by human brains . We als o mention the pos s ibility of applying these approaches to other fields , s uch as ecology . Finally, In Chapter (VI) , a few s ubs tantive conclusions emerge from our emntrical and analytic work on nationalis m.

The relative s uccess of our long-term

Page 1-7

prediction of national assimilation and mobilization,

as shown tn Table s Vl-8 and Vl-9 In section (ti) of Chapter (VI), ts of some substantive Interest;


for example, that In predicting the percentage of

population assimilated, over periods of time on the order of thirty to forty years, that the errors are

uniformly distributed be twe en 0% and 2%, except for

four outliers (20% of all cases) at 2.68%, 3.08%', 3.09% and 6. 21%.

T he exact sources of weakness In these

predictions are also of Intere st, Insofar as the y give

us a picture of those external factors which really do

have the power to divert the processes of assimilation and mohtltzatl on from a steady course . In Tables Vl-21

and Vl-22, In section (tit) of Chapter (VI), we have

listed the predic tions of the "robust" method for the years 1980, 1990 and 2000;

the se pre dictions are

subject to caveats discussed In the text of that section.

Both In sections (ti) and (tit), all

predictions are based on the Oeutsch-Solow model, with

minor modifications.

In section (Iv), more complex mode ls of national

assimilation and mobilization are synthesize d, by

drawing together Ideas from the literature on this topic and Ideas from social psychology.

The future

posst ht 1 tttes of the se mo de 1 s , In v erh a 1 an d

Page I- 8

quantitative analys ts , are s ketched out briefly.


section (v) , a preliminary tes t of the models ts des cribed. The main methodological conclusion of Chapter (VI) Is that the available tools for time-series analysis cannot cope adequately with the level of complexity represented by such models ; however, In the prellminary tes ts, the mode ls attained a high level of "s tatis tical s lgnlf lcance,


and did

have a noticeable value In Improving long-term prediction. The communications concepts of s ection (Iv), as appl led In s ection (v), als o had the splnof f of s ugges ting a rational ex planation of one of the Incons istencies observed with


gravlty mnde ls , 11 In

previous research; this explanation was val !dated em p 1 r Ica 1 1


In concluding this Introduction, It would s eem appropriate to des cribe what might come out of this work, In the f uture. However, thes e pos slh 11lt 1es are dis cus s ed In enough detail In each of the separate chapters .

Still, It s hould be of general Interes t that

the programming of the general algorithm of Chapter (II) ls already underway at MIT, as part of the large-s cale "DATATRAN" project on Multics, and Is scheduled to be available to the social scientis t In 1974.

Page II-1


interested more and more in the use of mathematical models

to describe the dynamic laws of the systems they study. Karl Deutsch

and Robert Solow, for example, have proposed(l) the following model to predict the size of the assimilated population, A, and the

unassimilated population, U, in a bilingual or bicultural societys dA • dt

aA + bU

dU dt



where "a", ''b" and "c" may be treated as constants, at least for

medium lengths of time. The constant ''b" represents the rate of

assimilation, as a fraction of the people yet to be assimilated,

per unit of timer "a" and "c" represent the natural growth rates

of the assimilated and unassirnilated populations, respectively. Mathematical models may serve two general purposes in the

social sciences. On the one hand, they may be used as a tool in

verbal reasoning, as a technique for formulating one's assumptions and their consequences very clearly and very coherently;

they may be used to construct paradigms, which, like metaphors,

Page II-2

may be very useful but which are not meant to be taken literally.

The "prisoner's dilemna" parad1gm(2) is a good example of such

a model. On the other hand, mathematical models may be used to make actual predictions of variables which can actually be measureds

economists, for example, have long been in the business of predicting the GNP, as a number, from the use of equations()) originated by

Keynes and Samuelson. These equations offer different predictions

for the GNP, depending on one's assumptions about government spending and tax ratesr thus they can be used, not merely in prediction, but

in helping the government to choose a policy for spending and taxation which will maximize the real GNP.

Our major concern in this thesis is with the second type of

model - predictive models, like the Deutsch-Solow model above.

Given such a model, the social scientist would want to ask three

questions, (1) how likely is it that the model is true, empirically? (11) how can we measure the values of the constants in the mod.el? J

{111) if the model is true, but if certain policy-makers could change

some of the constants or even control some of the variables directly,

what should they do in order to get the '"best" results? The first two questions concern the problem of estimation, the core of classical

statistics. The third question falls roughly into the area now called

"control theory". All three questions can be answered, by use of

existing methods, but only for certain restricted classes of models.

Our main objective in this chapter is to present a more general

Page II-J

method , to allow us to answer these questions for any explicit

model, of any complexity, at a minimal cost in terms of computer

time . More precisely, if a user specifies his model in terms of equations built up out of elementary operations and functions,

known to a standard computer package, then our method could give

this computer package the power te answer the three questions above, at a minimal cost. As more data become available in the social

sciences and in ecology , and as models are developed which reflect

the true complexity of the sacial systems themselves, the need for such a general method may grow greater and greater.

In this chapter, we plan to explain the dynamic feedback method ,

by building up examples of its most important applications , these

examples will grow in complexity until, in section (xii ), we present

the general algorithm explicitly. Thus we will start out in

section (111)

by showing how to reduce the cost of conventional

nonlinear estimation. In section (iv ) , we will show how the dynamic

feedback method can cope with simple models with "memory"r even

simple models of this type are difficult to handle by other methods. In sections (v) through (vii ), we will discuss the basic problem of

induction, as seen by the statistician. This material prepares us for

the discussion of more advanced applications in later sections and

also in Chapter (III ) . In particular, in section (vii), we will

propose a new, "robust" approach to estimation, which, � for

Page I I-4

simple models , calls for the use of the dynami c feedback method J

later , in section (xi ) , we will specify exactly which "models with

memory " are used in this approach . and in Chapters ( IV) through ( YI )

w e will di scuss the evidence that this approach i s worthwhi le ,

In section (xiii ) , we will discuss the problem of estimation with

complex noise models , In se ction ( ix ) , we will discuss a radically

new con cept , "pattern analysis, " for dealing with situations where the nonlinearities and complexities of a process defy the use of

straightforward estimation 1 the applications of this concept would include problems now dealt with by factor analysis or by pattern

recognition techniques, Once a person has finished estimating a model , he may then wish to go on to use this model in formulating policy 1

in section ( x ) , we will show how the dynamic feedback method can be

u sed at that stage , too , to help one maximize the ut ility function of one ' s choice . Finally , in the Appendix, we will mention a few

technical procedures , which can help speed up the convergence of a computer routine based on the dynamic feedback method , ( i i ) ORDINARY REX;RESSION Let us begin by discussing the first two questions listed on

page II-2 , from the viewpoint of classical maximum likelihood theory . How would we ascertain the "truth " of a model like the

Deutsch-Solow model , equations (2 , 1 ) , if we were given the values

Page II-5

of "A" and ''U " every year for some nation, from 1901 to 19?3 ?

If we were given the values of the constants, a, b, and c, then we

could simply solve these equations, starting from the known values

of A and U in 1 9 01. In order to avoid having to solve a differential equation, we could rewrite the model in a simpler, but equivalent,

form 1

A(t+l ) • k A(t ) + �U(t) 1

U (t+1 ) • kJU(t),


(2.2 )

''U( t ) " means the value of U in the year t, and where

k1 , � and k are all constants. In either case , we could predict 3 A and U for 1902 through 19?3, by starting from our knowledge of

A and U in 1 9 01, and using our model. We could compare the predictions of the model against the observed data. And we would discover that

equations (2. 2 ) are simply false , as written , there would always be some difference between our predictions and the data, while the

equations (2. 2 ) do not allow for any such error . F.quations (2 . 2 ) are completely deterministic. This complaint may seem like quibbling,

but it is central to the classic concepts of statistics. In practice,

admittedly, one may be more interested in the predictive power of a

simplified model, rather than its formal statistical truths however,

in section (vii ), we will be able to discuss this possibility as an

extension of the more classical approach discussed here. At any rate, to construct a model which has some hope of being "true ", in the

Page II-6

social sciences, we need to express the idea that there will be

a certain amount of unpredictable random noise in the system we are studying. Thus we might rewrite equations (2 .2) to get ,

(2 . Ja) U ( t+l ) • k U ( t ) + c(t ), J

(2. Jb)

where b (t) and c ( t) are random error terms, obeying , 1 p(b) ., -J2'ftB p(c)




) -½ ( £ C


(2 . 4 )

In other words, we do not know what b(t) and c(t) will be in advance s the probability that b(t) will equal some particular value, b, is

given by p(b) in the formula. Strictly speaking, since ''b" is a

continuous variable, p(b) is actually a probability density function s

one may think of it as the probability that b(t) lies between ''b",

and a nearby point, "b+db", divided by the size of the interval (4),

db, These functions for the probability of b and c are sirnpJy the

classic bell-shaped curve, or "normal distribution, " The constants

in front of the "e" are there, te make sure that the probabilities of

the different values for b add up to one, when the formula is

integrated, The constants ''B" and "C", like the constants "k1 ", "� " and "k ", need to be specified before our model is complete . 3

Page II-7

According to this elementary model, the probability of b (or c)

is highest when b (or c) is zero ; in other words, it is highest

when the exponent is zero, instead of a negative number. When b gets

to be a large number, positive or negative , in proportion to B,

the exponent gets to be a large negative number, and the probability falls off very quickly. It should be emphasized that this simple

model of noise , while standard, is far from the only possibility

in this case s in section (vii), we will mention a few other


Once we have decided to formulate such a simple model,

at least to start with, classical statistics can tell us exactly

how to measure its "likelihood of truth" for any combination of

the constants k1 , k2 and k • In sections ( v) and (vi), we will 3 discuss in more detail how it is possible for some statisticians to arrive at such strong statements s for the moment, however,

we will relegate the theoretical abstractions to a footnote( 5) ,

Even on a very concrete level, one can get a feeling for the power of the classical approach ,

Looking back at equations (2 , 3) , we may define ,

•1 ( t+1) " is simply the best prediction one could make for A(t+1),

Page II-8

at time t, given our knowledge of A ( t ) and U (t ) , and given our

model. From equations (2 . 3 ) , we get s

b ( t ) • A (t+1 ) - i( t+l )

(2 • .5 )

c (t ) • U ( t+l ) - 'O'( t+1 ) ,

Intuitively , one would expect that a model which gives us "good " predictions ,


would be likelier to be true than

a model which gives u s "bad " predictions r one would expect that

bigger errors , b and c , would imply a lower probability that the model is true . Indeed, when we look back at equation s (2 .4 ) ,

the only probability fun ctions we have with this model ,

we c an see that larger values o f b and c would imply a lower probability . More exactly , as we look at these equations , we

can see that the probabilities of these errors really depend

upon (

b 2 and


(Cc )2 - i . e the size of the square of the error,

As part of our model , we assume that the errors at different times

are all independent of each other. Thus in order to combine all the different probabilities fer different times, t ,

into one overall probability , i t i s legitimate to multiply them

all together 1 thi s has the effect of telling us to add up a.11 the exponents, the square error terms,

(Bb ) 2


c 2

Cc) '

to get an overall measure of the probability of the model.

Therefore we can measure the total effective size of the errors in equation (2 . 3a ) by ,

11 ...

Z:c � t ) > b




.l'age 11-�

In order to pick the best values of k and k , in our model, 2 1 we do not have to account for the other part of the error, the c2 term, since our choice of k and � does not affect 1

equation (2 , Jb ) , Indeed , to pick the best values of k and � • 1 in equation (2 . Ja), we do not even have to worry about the value

of B , since B does not appear in that equation; thus we can simply try to minimize s

2 L • � (b(t ) >

(2, 6 )

Similarly , in equation ( 2 . Jb) , we can pick k by minimizing the J analogous function s L' •


(c(t ) )


Notice that we now seem to have two separate measures of truth, for ( 2 , Ja ) and ( 2 , Jb ) treated as independent equations ,

Formally speaking, we have found that the maximization of

li kelihood for the composite model, (2 . J ) and ( 2 .4),

can be decomposed into the maximization of likelihood for

(2 , Ja) and (2 . J b) as separate equations, attached to the

top and bottom equations of (2 . 4) , respectively.

This decomposition is due to the simplicity of the original models

it would not be valid for many more complex models. In section (vi) ,

we will present more details of this decomposition with ordinary

regression 1 in Chapter (III ) , however, we will focus on a of standard statistical models , the "vector ARMA" models, for which

Page II-10

such a decomposition is impossible , and for which

an equation-by-equation estimation procedure cannot have statistical consistency.

Even in the simple case here , however , we have yet to specify

how to pick k 1 and � to minimize "L" , in equation ( 2 . 6 ) .

Let u s begin by substituting into (2 , 6) the value of b(t ) from

equation (2 . Ja) 1

L •


(A(t +1 ) - k 1 A(t ) - �U (t ) >


( 2. ? )

Our problem, again , i s to minimize L as a function of k 1 and k2 ,

while treating the measured data series, A(t ) and U(t ) , as fixed. From basic calculus , we know that a function has its minimum,

for variables k 1 and k2 , only at a point where its derivatives

with respect to k 1 and k both equal zero. In other words, if the 2 derivative of L with respect to k were not zero, but , say, +10, 1 this means that L will change whenever we change k1 , and that the change in L will equal 1 0 times the change in k 1 , roughly ,

for small changes in k 1 thus , if we change k by -1/1 00 , then 1 1

L would change by about -1/1 0, proving that it hadn't yet reached

a minimum at our original choice of k and k • Thus we can try to 2 1 find values for k and k such that the derivatives of L 2 1 with respect to both of these parameters will equal zero.

Page II-11

Differentiating , we get 1

� 2 (A(t+1) - k A(t) - �U(t)) ( -A(t)) 1


• -2 ( � (A(t+l)A(t) - k1 (A(t) ) t:


- �U(t) A(t)) ) ,

which we will try to set to zero. And we get a similar expression for

!� 1

putting them together, we get two algebraic equations ,

L t

A (t+l)A(t)

L A (t +l)U (t) -#:

- kl L t = kl


L. U ( t)A(t) 'I:

+ �

2:' 1:'

U (t )A(t)

+ k }::(u ( t ) ) 2



We can calculate these sums by looking at our data; we can solve

these simple simultaneous equations for the variables k and � 1 exactly , by classical algebra, or by using programs available on

any computer. The procedure above is the procedure of classic

. multiple regression.

All of this reasoning , however , supposes that we decide to look

at a very simple model , like equation ( 2. Ja). It also assumes that

the "errors" , b and c, follow a normal distribution. There is nothing to stop us from using the same calculating procedure in cases where

Page II-12

we do not expect the noise to be normal ; from a classical point of

view, this may still be equivalent to accepting the normal

distribution as part of one ' s "simplified model, " but the effects

of such a "simplification " are far from obvious apriori. ( 6 ).

What happens , however, if we move on to consider a more complex

model? What would happen if we decided to change equation (2. Ja )

itself? For example , in equation (2. Ja) , we assume that the rate of

assimilation, k , is constant in any given country. In reality, we 2 know that this is unreasonable. If the "unassimilated " outnumber the

"assimilated " by a large majority, they may feel very little pressure

at all to assimilates on the other hand, if they are a tiny isolated group, dependent on an economic world which is mostly "assimilated " , then their rate of assimilation is likely t o be higher than it

otherwise would be. There are other factors involved, but, holding

those factors constant , our model is likely to be "truer " and better

if it accounts somehow fer the power of percentage dominance.

How could we revise equation (2. 3a ) to express this kind of

effect ? First of all, we need to find some kind of measure of

"percentage dominance. " The simplest and most obvious measure is

simply the difference between the percentage of the population

which is assimilated and the percentage of the population which

is unassirnilated. In order to avoid having to multiply everything

by 1 00 , let us look instead at the difference be tween the fraction

Page II-1 J

which is assimilated and the fraction which is not. The fraction

of the population assimilated equals , by definition , the ratio between the number of people assimilated , A(t ) , and the total

number of people , A(t)+u(t) 1 thus it equals A(t)/(A(t)+u(t ) ).

The fraction unassimilated U { t)/A(t)+u(t)r the difference between the two equals ( A(t)-U(t) )/(A(t)-+U(t)). Somehow, we

wish to express the idea that an unassimilated person is more likely to assimilate if the "percentage dominance " of the assimilated

population is larger. If we recall that � was defined as the rate of assimilation per unassimilated population per unit of time, we may simply postulate that � • instead of being constant,

will be larger if "percentage dominance" , as defined above, is larger. For simplicity , we may consider the idea that � is directly

proportional to percentage dominances

(2 . 8 )

This time, � is assumed to be constant. While the actual

relation between � and percentage dominance is not likely to be

quite this simple, this equation still gives us some expression of

the important qualitative idea that there is a strong and consistent

positive connection between the two. To generate an explicit model of assimilation, we may substitute this equation back into (2. Ja ) s

(2 . 9 )

Page II-14

The second term on the right is an "interaction term, " nonlinear

in A and U , A great deal of fuss has been made about this kind of

nonlinearity, with terminology such as "curvilinear regression", "polynomial regression", and even "spectral regression" often

used , (? ) , However, this kind of situation is fairly easy to deal with, We can solve for b {t ) , as before, to get , L �


(A(t+1 ) - ki_A ( t )


As a function of k


(2 . 10 )

and � • this is really the same kind of

expression as (2 . 7 ) , with "U(t ) " replaced by a more complicated expression which we might call ''U'(t ) " z U'(t ) •

A ( t )-U ( t � U(t) A(t )+u (t

The derivatives with respect to k


and ½ are the same, with a few

"prime " signs interjected, and we wind up with the same

algebraic equations to solve and almost the same sums to oaloulate ,

(We have to sum up {U'(t ) )


and U ' (t )A ( t ) instead of U(t)



U { t )A(t ) , ) In practice, one would normally begin by calculating

the variable ''U'" f'rom one's ex isting data, and injecting it into

a standard regression package to calculate the sums and solve

the equations , in a computer :package such as TSP, one could compute

U ' f'rom one's previous data. by use of the command "GENR"(generate ), and issue a regression command { OLSQ) with U' and A as independent

Page II-15

variables. What is essential in this example is that we

continue to express A(t+1 ) as a linear combination of

other variables, which are defined as specific functions of

the available data..

( 111) NONLINEAR :RID;RESSION AND DYNAMIC FEEDBACK However, if we want to move on to more interesting models of

social phenomena, we will often find that we have to estimate

constants which do not simply multiply an expression we already

Jmow how to calculate , like U' (t) J we will find that there are

constants on the "inside" of the model . For example, in equation (2 . 8 )

we said that k2 is directly proportional to the dominance of A over U as a fraction of the total population. How do we know that it is a matter of direct proportionality? � is the rate of assimilation,

per unassimilated person per unit of time, as originally

defined in equation (2 , 3 ) . In equation (2 . 8 ) , if U is 25% population, then A-U will equal ½ I if U is almost O, then A+u

of the A-U A+u will

equal 1 . Thus we assume that the rate of assimilation will always be twice as much in the latter case, as compared with the former case.

But how do we Jmow it is only twice as much? It might be four times

as much. After all, the pressures on a tiny community, near 0%, may be

much, much larger than on a community near 25%, which may be large enough to protect its own members, and to give

them economic

Page II-16

opportunities almost as great as they would find after assimilation. A-U -U ) 2 , Or ( A-U ) J • So instead of i0u• we might have written ( A A+u A+u Even

without considering more complicated possibilities, it would be

interesting to try to measure just how strong these effects are that

we have been talking about , we may write , � (t) •

kz •

, ( A t -u t ) k4 , A ( t +u ( t )


where k4 , like is a constant we would li ke to estimate . To turn this into an explicit model of assimilation, we substitute in to ( 2 . 3 a) 1

k A t -U t ( � ( � ) 4u ( t ) + b ( t ) ( A(t+1 ) • k1 A ( t ) + k! --z A (t +u ( t

We can solve for b(t) , to get , L


L (b(t ) )



k A t -U t ( ) ( ) ) 4U ( t ) k! ( A • ' ( A ( t+1 ) - k1A ( t ) - --z ( t ) +u ( t )


( 2 , 11 )



( 2 . 12 )

When w e differentiate L , and try t o set the derivatives to zero , as before, we find a very unpleasant set of equations emerging ,

k 2 A t -U t ) 4 ) ( ) ( k. � k! (t A(t) ) ( ( u(t) A ) + • --i l--z � A ( t )+u ( t ) k A ( t )-U ( t ) 4 U { t )A ( t+1 ) • k1 �A ( t )U ( t ) ( A ) ( t )+u ( t ) A (t) A ( t +1 )


A t -U t ¾ 2 + � � L (U (t ) ( A (t )+U t ) ) ) II-17

(Note that we use the asterisk to indicate multiplication, as in

FORTRAN. ) To solve these three equations as functions of k1 , � and

k4 is not only a difficult exercise in algebra , it would appear to be impossible . There are many equations in algebra for which there simply

exist no "closed" solutions - no solutions which can be expressed in terms ef the ordinary "vocabulary" of mathematics. ( 8 ).

Thus, in order to devise computer routines to handle this

contingency, we must use routines which give numerical approximations

to the constants k1 , � and k4 J we must estimate k , � and k4 by a 1 numerical technique of successive approximations, rather than an exact solution. This is the classic problem of "nonlinear estimation. "

A similar problem can even arise when dealing with more sophisticated linear models.

There are two well-known methods for dealing with the problem of

nonlinear regression. The simplest, and perhaps the best, is the method of .. steepest descent, .. (9 ) . When we try to maximize L,

as a function of k1 , � and k , we may not be able to solve for 4 lL � • O, � a ak! • 0 and a k • O. However , if we start off with k --z 4 1 reasonable guesses for all of the constants, k , � and k4, then we 1 can differentiate (2. 1 2), plug in our guesses, and see if the

Page II- 18

derivatives happen to equal zero; if so, chances are good that

we have guessed the best values, lWith classic regression, when we had two simple equations in two unlmowns,

lei and ½ ,

there would

only be one solution in the usual case, and there would always be a

minimum for L s therefore, we could be fairly certain that the

derivatives were zero only at the minimum, With very complex formulas, we simply have no wa:y of being sure about this. )

If the derivatives not zero, then we can guess new values for

our constants, values which will make L smaller , If �� is positive,

1 a L then we can decrease L by decreasing k ; if 0 1 k is negative, then we 1

can decrease L by increasing k 1 • If � is close to zero, then �k 1 k 1 is probably close to its best value s if 1!:_ is far away from zero, o k1 then � is probably further off, Thus we can create a new guess,

k ( n+l), better than the old guess, k (n), by changing k in 1 1 1 proportion to C}k , but in the opposite direction a 1

� (n+l) • k 1 (n) -

c1� ,

1 where C is some positive constant, and where we calculate the

derivative by using our old guesses, k ( n), � (n) and ¾ (n). 1 We also calculate the derivative and then the new guess for � and k4, each. Once we have our new guesses for k l '

Itz and k4,

we can go baek to ( 2. 12), to see if we really have gotten a smaller value for L. If we have, then we can start a.gain from our new

guesses, to check the derivatives, etc. If not , , . then C must be

Page II-19

too big. If C is small enough , the definition of the derivative

assures us that L can be predicted as well as we like by looking

at just the first derivative ; therefore , for some C small enough,

we know that our new guesses will have a smaller L than our old

guesses. If we find ourselves making C smaller and smaller , from

guess to guess , then we may eventually quit , when C is so small that

we aren ' t changing the constants very much. Hopefully , this will mean

that our approximations are very close to the ideal values.

In the Appendix to this chapter , we will suggest a few ways to ■peed up the convergence of this classical technique .

As we look back at equation (2. 12) , it is clear what our biggest

problem is in actually doing all this work s we have to calculate the derivatives of a very complicated-looking expression, L,

and we have to calculate the exact numerical values of these

derivatives for many different values of the constants k , � and k4 • 1 Even worse, there is the question of who the "we " is who will do a.11

the work, in most cases. Classical regression can be done automatically

for the social scientist, at a low cost, by a computer program ; the

social scientist need only load in his data, and specify his choice of variables . Who is to do the differentiating here? The social

scientist? In BMD, one of the biggest computer packages for use in

social science and biology, there is only one "nonlinear regression"

routine , added into the X-Supplement (10) available June 1972 ;

Page II-20

this routine requires the social scientist to write his own

FORTRAN programs � for the function A(t+1) and for all its

derivatives. It is reasonable to ask a social scientist to

understand the logic behind a formula like (2 , 11); it seems

rather unreasonable to ask him to carry out elaborate

differentiations , and write and debug his own FORTRAN programs,

for every such model he chooses to investigate , Also , this approach

could become expensive in terms of computer time, too, depending on the user ' s ability to devise low-cost ways of calculating his

derivatives, There is a second possibility a the user could be asked to specify a FORTRAN program to caloulate A(t+1) , as a function

of A (t) , U(t) , k , k and k 1 the program would then go on to 1 4 2 calculate the derivatives numerically, by changing k a little bit , 1 and seeing what happens to A(t+1) , At each time, for each constant , the computer would have to carry out calculations as expensive as

calculating A(t+1) 1 with many constants, this could multiply the cost

many-fold, In the two nonlinear regression routines easily available

in Cambridge, besides BMD - TSP-CSP(11) and Troll/1(12) - the social

scientist has a more convenient way to get his work done, In these two

systems , he need only specify his model in terms of a. "formula", like t

A (t+1) • k1 *A(t) + k2 *U( t) *( ((A(t)-U(t) )/ ( A ( t)+u(t) )) **k4)

This is the same as (2 , 11) , but with FORTRAN conventions used to

make it possible to put everything on one line, and with error terms

Page II-2 1

Variable Number ( Address )


Major Source

Minor Source




















�-U(t � +u ( t





A ( t )-U ( t)




A( t ) +u ( t )














A( t )


A ctual



A ( t+1 )

k A (t )


k'U ( t)


k f t �-U f t � ) 4 (A A t +u t

(t) U ( t ) ( A (t )-U ) A ( t ) +u ( t ) ) ( t } /4 A ( ( t -U A ( t ) +u ( t ) A(t

A( t





difference sum


Table II-1 1 Table of Operations for .Equation ( 2 . 1 1 )



Page II-22

left implicit. ( Single asterisk means multiplication, double means raising to a power. ) This formula then gets translated by the

computer into a list of simple expressions. 'Ihis list of

expressions would normally look something like Table II-1 ,

with the explanation column on the left removed , such a list is called a "Polish string. " The categories of operation allowed

on such a list depend on the arbitrary choices of the systems

programmers. In some systems, there are function names reserved

for user-supplied FORTRAN subroutines , in other systems, there are functions corresponding to model neurons , for use in statistical

pattern recognition , et cetera. It is already possible for a

computer to calculate the symbolic derivatives of a formula by

manipulating formulas which have been broken down like this a

however, that process becomes quite expensive, if we have many parameters to differentiate against.

'Ihe easiest way to calculate these derivatives is by a simple

use of dynamic feedback. Now we know that s L •

� ( b(t) ) 2 t

To calculate � , we need only calculate �( (b( t ) ) ) for each time, 1



and add up these derivatives over time. We want to know the effect

on the error , (b(t) ) , of changing, say, k , while we keep our data 1 2

Page II-2J

(i. e. A(t ),A(t+l ) , U(t) ) constant, and while we keep the other

parameters (k , k4 ) constant. In Table II-1, let us define the 2 "ordered derivative " of (b(t ) )


with respect to variable number i

to be the change we get in (b(t) )


in proportion to the change

in variable #i, � !!! hold all the previous variables constant. For � (b( t) ) , this definition doesn ' t give us anything new 1 1


the ordered derivative,


ok (b (t ) ) 2, is the same as the ordihary

2 partial derivative, � (btt ) ) • But for the other variables,

it gives us something new to calculate ,

Nows suppose we ask, in changing variable #? by a small amount,

what will the total effect be on (b(t ) ) ? Changing variable #7, 2

we will have a direct effect on only � variable, later in the system, variable #8 , (See Table II-1 . ) Thus ifs

X., • Xr,



where "d " is a small number, where

is variable #7, and where


"X ' " is the value a:fter our changes. We will produce the following direct effect on later variables s X' • 8

.:z_ X' • 6

• X


d + x6

(X6 held constant. )

If we are calculating backwards from the top of the table, we +

2 d aiready know � ((b(t ) ) ) 1 we already know the ratio between 8

(b(t ) • ) -(b(t ) ) 2 and 2

x8-x8 •

Let us call that ratio, or derivative,

Page II-24




2 b1


Thus we know, for small values of g , that if X =X8 +g , that 2 = b + s8g , Now we just found out, for small d, that if d d + d , that Xa = X s + X ; thus if we write g = X '


x? - Xr,

we know that this change in ( b(t ) ) 2 of

d . s8X 6





will lead to a total change in

(Before, when we measured

would be held constant. However , when we va:ry

s8 ,

x7 ,

we assumed that

and hold the


earlier variables constant, there is no way that this change can

a.:ffect anything later on , except by way of ss s - -


x8. )

Thus we deduce s


In more sophisticated language, this is an example of s

_r ( ( b ( t ) )2 )


u 7



) 2 (q- ( ( b ( t ) ) )

o e



) "X " ? '

Now let us consider = fs ( X , X6 ) • r, 6 a more complicated example . In Table II-1 , ¾ has a direct effect on where


is the function


three variables higher in the table -

x10 , Xr,

and x6 • When we start to

vary x2 , we have to account for the total effect of all three of the

changes it introduces directly on these other variables, Thus we get s

Of course,


is simply U(t)r the reader , differentiating ( 2 . 12 )

with respect to U(t) , would also arrive at three terms, equal to

Page II-2 5

the three terms here, but the work involved would be rather

tedious. To make it explicit how we begin this downwards calculation,

let me point out how to get

s 13 1

+ 2 � S13 • �(t l) ( { b ( t ) ) ) + + � • 1)-f{ t+l) ( ( A( t+1) • -2(A(t+1 ) - t(t+1) )

1( t+1 ) ) 2 )

One way to operationalize all this is to start from the top,

and, for every variable, look at all of its direct connections to variables higher up. An easier way, in practice is to pass gQ!.!l

from the top all the information to variables below them and also

directly connected to them 1 the effect is exactly the same, but the

order of computations is easier to deal with. We can start out,

in our example, by setting

s1 3 s13

as above. At


s 12

and to


s1 1 ,


through s1 2 to zero, and plugging in

we note that we have a "sum"; thus we add to account for the direct effect of

x1 2

and of

on ( b t t ) ) • Then we are done; we go down to s • At s , we know 11 12 12 that all the later effects of x 2 have already been added into s 2 , 1 1 x

and that our value for

we encounter a product,

s1 2

has been completely calculated. At

x3x1 •

Thus we add

s1 2 x 3


s1 ,


s1 2 ,

s 12 x 1


s • We go down to s • We encounter another product. We add s x to 1 1 10 11 3 s4, and s1 1 x4 to s 1 0 • We go down to s10 • And so on. At the end, we really look only at s , s4 and s , the derivatives we wanted for the 3 5

Page II-26

steepest descent method. The mathematical basis of these operations

is the theorem, for a set of ordered functional relations


?, Xi



d X df ___n. , � oX j dxi +


j-: l + I a theorem to be discussed in section ( xii ) . For each line

( 2 . 1 3)

of the list, as we go down, we have only two calculations to perform at most, one for the "major source N and one for the "minor source" ,

thus the total number of calculations needed for each time, t ,

will equal only 2n x ' where nx is the tota.l number o f variables

on the list. The total cost will be 2nxT• across all times, to get all of the derivatives we want, regardless of how many parameters

there are. Notice that to go !!E. the list , starting from U(t) and

A(t), requires one calculation per line of the list , thus the total

cost, merely to compute all the �(t +1 ) for a given model, will equal nxT ' the same order of magnitude. I assumed, above, that we had

already carried out this latter calculation, so that the values of

the "X " were already known , given that we have to find L for each i guess, not just the derivatives of L, there is no extra cost in

first calculating the "Xi" and L.

In practice, one can imagine three ways that a systems

programmer might want to use the generalized form Gf the dynamic

feedback method. First of all, he might simply write a subroutine, I I-27

to do the calculations specified above directly , on the table of

operations fer some model, Second of all, he might write one

subroutine to look at one table of operations, and to specify the calculations required by the dynamic feedback method for this

table, in another table of operationsr he might write a second

subroutine to prune away all unnecessary and redundant operations from this table. The relative advantages of these methods would

depend heavily on the characteristics of the model being studied ,

on the number of time periods and calculations of derivatives to be

performed, and even on machine characteristics. Finally , one might imagine the possibility that the operations on a table like

Table II-1 will someday be grouped into "strata , " groups of

operations that can be performed in parallel , on a computer

capable of parallel processing. On such a machine , one could perform the operations at a given set of "S1 II • in parallel , using the

same procedures as above , so long as none of the corresponding " X1 11

depend directly on each other as input sources J in short, one could

use any system of stratification which was adequate for calculating the

x1 •

This possibility is restricted , however, by the requirement

that several processors would have to be able to add something to the same machine word (S


for " 1 " on the next lower stratum ) , at the

same time, with the result that this word would be increased by the sum of all the numbers added.

Page II-28

(iv ) MODELS WITH MEMORY Nows with a firm mathematical basis for these procedures,

equation (2. 13), we can extend tham still further , The models we have

discussed so far have all been rather conventional "Markhovian"

models , in other words, they give us a prediction of A(t+l ) as a function of A(t ) and U(t ) , We could add in A(t-1) and U(t-1 ) as

dependent variables, without changing much, because we would still

have a distinct table for every time "t" giving 1(t+1 ) a.s a function

of a manageable number of variables , Suppose, however, that we have

a model with "memory, " In economics, for example, there is a model

of consumer behavior which states that consumers spend money, not in proportion to their current income, but in proportion to the

permanent income ( 13) which they expect to average in their lifetimes 1 the model states that the perceived permanent income is adjusted slightly, from year to year, in response to actual income ,

Thus we get a models

C(t ) = k1 Yp( t ) + b(t)

Yp( t ) • (1-� )Yp(t-1) + � YA (t),

( 2 , 14 )

where "C" is consumption, "Y " is permanent income, "YA " is actual p

annual income, and ''b(t ) '' is an error term , Note that statistics will normally be available here for "C" and for "YA " , not for Y However, this is still what we would call an "explicit" or


Page II-29

"phenomenological" model. Given estimates for Y (t ), and data P for YA (t), the model tells us exactly how to calculate Y (t+1 ) p and how to predict C{t) . To calculate the Y (t ), for all times t, p and to make predictions for the C(t), we need to start off,

at time t•1, with some estimate of Y (O); this estimate we can treat p as an external constant ef the model, like k and � • to be estimated 1 by the statistician ( us). ( From Y ( O ) and Y (1), equation (2. 1 4b ) p A tells u s how to calculate Y (1 ) ; then we can calculate Y (2 ) from p p Y ( 1 ) and YA(2), then Y (J) from Y (2 ) and Y ( J ), etc. ) A p p p To minimize the sum of the errors squared, L, is much harder

in this case than with our complicated-looking model in equation {2. 1 1). To calculate �


here, it is not enough to set up separate tables,

like Table II-1, for each time t, and add up the

2 _a_ ( b( t ) ) . k

o l F.g_uation (2 . 14b) establishes a connection between the unknown

variables, Y , at all different times. However, we .£!!!_ set up a p large table to include all the different values of Yp ( t ) and C(t) across different timesr this will be like taking the separate tables

for each time t, tables like those implied by Table II-1, and putting

them together into one large table . In this large table, we can show the relations that exist across time . Suppose that we have data

for "C" and "YA " from time 1 to time 4. We get a big table,

as shown in Table II-2, on the next page.

With a given set of constants - Y ( O ), k and � - and with a p 1

Page II-JO

Actual Variable L

{b {4))



b (4)aC(4)-k Y (�) 1 p C(4)

klYp (4)

Yp (4) ( see

Variable Number

2 . 14b)

� YA (4) Y (4) A



Operation Category sum







33 31






� YA ( J ) YA (J)


(1-k2)Yn( 2)














Y ( 3) ( see p



k Yp (J) 1








b ( 3)•C(J) - k Y ( J) 1 p



2 8 .20. 1 2





Minor Source ( s)


( 1 -�)Y (3) p ( b ( 3))

Major Source


difference input sum




















Table II-2 1 Table of Operations for F.quations ( 2 . 14). ( top section)


Page II-.31

Actual Variable

Variable Number

Operation Category

(b(2 ) ) 2


C (2 )



Y p (2 )



b ( 2 ) • C ( 2 )-k 1 Y p ( 2 ) k 1Y p ( 2 )




difference product



yA (2 )



( 1-k2 )Y p ( 1 )





k2 YA ( 2 )

(b( 1 ) ) 2

b ( 1 ) •C ( 1 ) -k 1 Y p ( 1 ) C{l )

k 1 Yp ( l )


10 9

difference input


Major Source

Minor Source

























Table II-2 1 Table of Operations for :Equations ( 2 . 14 ) . (middle section )

Page II-J2

Actua.l Variable Yp ( 1 )

k Y (1) 2 A

Y (1 ) A

( 1-k2 )Yp ( O ) 1-

y (0) �

k1 1


7 6

5 4


Variable Number

J 2

1 0

Major Source

Minor Source













Operation Category sum




parameter parameter input


Table II-2 1 Table of Operations for F.quations ( 2 . 14 ) . ( bottom section )




given set of data, we can calculate "forwards in time", or upwards

in this table, to calculate every one of the "actual variables, " including L, the total error. To calculate c)L ' we can calculate �kl backwards, just as we did before with Table II-1, from the top of the table to the bottom. This time, however, it is easier to see where to start s



ai. - �1 � - a½ 7

- 1

2 This time, with L itself on top, instead of (blt)) ,

we get a simpler result at the end s

exactly the quantities we need to apply the steepest descent method . For each line of the table, except for the . top, there are only two

sources, or no sources J thus to go back from the top line to the

bottom line requires only two operations per line, at the most.

For a very large table, with nX lines for each time t, and with T periods of time, this amounts to n T lines, and 2n T operations X


in all, to get all the derivatives of L. Remember that to go up

Page II-34

the table, to calculate L, we had to carry out one operation

per line - n T operations in all. No matter how complex the model, X

if the functional relations across time explicit enough that they can be put into formulas which the computer can translate

into a table , like Table II-2, then "dynamic feedback" can be used to calculate all the derivatives, in one pass.

As a practical matter, one may wonder just how explicit is

"explicit enough". In general, the procedure above allows us to

calculate the derivatives backwards down any ordered table of

operations, so long as the operations correspond to differentiable

functions. In order for us to use this method , then, the primary

requirement is that we be able to specify the model well enough

to construct such a table, This is the � requirement that applies

when we wish to use a model forwards in time, to make a prediction

of the future, without having to solve a complex set of nonlinear

algebraic equations in every time period , In general, in the existing computer packages (including FORTRAN compilers), any formula

expressed in the following form can be parsed into a table of operations

(a "Polish string " ) generating the variable Xi ( t ) from operations performed on the arguments 1

where "fi " is a function made up by nesting basic operations known to the computer package 1 for examples

X ( t) • W ( t-1 )*Y(t-1) + k + sin ( Z (t-1 ) ) . i

Page II-35

In order f�r a set of such formulas to be converted into a table

of operations, we need only find an ordering of the variables to be

computed, "X (t)", such that the arguments used in calculating X (t) i i

are calculated before X (t) itself is ; the table of operations to i calculate X (t) can simply be inserted on top of the table already i built up to calculate variables earlier in the causal ordering.

If the arguments of "fi " included only constants, parameters and

values of variables at "t-n", for all f , with "n" always greater i than zero for endogenous variables, then this requirement would be satisfied automatically . otherwise, an ordering of the variables

X (t) would have to exist, with the later expressed as functions of i the earlier . Global things to be calculated, such as the sum of


utility function or a loss function over time, can always be inserted on top of the table of calculations, so long as we specify formulas

for calculating them as a function of sums across time or the like.

Indeed, even if one had a set of implicit equations, so that one had

to use algebraic solution methods instead of explicit calculation

in order to carry out a simple prediction of the future from given

para.meters, then one could easily calculate the matrix of partial

derivatives for those equations, to be used in conjunction with the

algebraic solutions generated for prediction, to allow one to carry out

dynamic feedback estimate ; however, simulations of this sort are both expensive, and outside the major realm of interest here .

Page II-36

Parenthetically, one might note that there is a certain

difference between the operations needed to specify the generation of a random number, and the operations needed to calculate the

associated lo ss function. Estimation by dynamic feedback requires the specification of loss functions. In the case where the

unobserved random numbers are generated by a rather complex process, the translation between the two forms of specification may not be

easy. However, if the losses one is concerned with are the

discrepancies between the actual and predicted values of Imown

variables, the specification of an explicit loss function should

present no problem to the user of a computer package.


corresponding model would be suitable for predicting the future,

but may not be quite as suitable for stochastic simulations of

the future, in some eases. In such cases, however, the method of pattern analysis, to be discussed in section (ix), may help

reduce the distance between the two forms of specification.

(v) NOISE, AND THE CONCEPI' OF THE TRUTH OF A MODEL, IN STATISTICS Up until now, we have avoided one other aspect of statistical

estimation s the problem of noise models. In our old model, in

equations (2. J ) and (2.4), we assumed a simple equation to predict

A(t+1), and a simple bell-shaped curve for the distribution of the

Page II-J7

errors , b ( t ) . In the thirty pages or so, we have considered

more and more complex models to predict things. However, we have stayed with the old idea of minimizing the square of the error,

an idea based upon the old bell-shaped normal distribution.

We have begun in this way only with great reluctance, and only

for the sake of exposition. In fact, if we admit that most processes in human society and ecology do contain important elements of

randomness, then we must admit that equation (2 .4) is just as much a part of our original model as equation (2 . J ) . Equation (2 . 4 ) is

not an "assumption which must be proven true before we can use

classical techniques"; like equation ( 2 . J ) itself, it is part of a

simple , approximate model, to be evaluated for its predictive power.

Unfortunately, there has sometimes been a tendency in social science

and ecology to formulate ever more complicated models to predict

things, without an explicit model of the random element; the "errors "

are sometimes regarded as something unpleasant, that one faces up to

at the end of one's research, after one has formulated a model of

what is interesting.

Statisticians, on the other hand, have long since passed

the stage of ..minimizing least 'squares " or of "minimizing error" in general. We mentioned, earlier, the idea of measuring the

"probability of truth " of a model. We mentioned the problem of

how to estimate the probability of truth of a model, given

Page II-38

a set of data observations. The traditional "maximum likelihood" school of statistics , as represented by Jeffreys and Carnap(14 ),

and the more recent Bayesian schools , both agree that this is

simply a problem in conditional probabilities, how do we estimate

the probability of the truth of a model, conditional upon our

having made a certain set of observations? Formally, the conditional probability of A given B, p(A( B ) , is defined to equal p(A and B )/p(B ) ; from this follows Bayes ' Law s P(A\ B )


P(B\A)P(A) P(B)

Statisticians have applied this law to deduce s p(model( data )

) ) = E{ data podel p ( model p(data )

(2 . 1 .5 )

This equation does not say that the "data" and the "model " have to be expressed in purely mathematical terms ; as a result , the equation

has led to enormous controversy both among statisticians and among

philosophers. It is a general equation telling us how to determine

the probability of truth of any sort of theory; thus, its relevance

to social and natural science goes well beyond the question of

statistical methods proper. The calculations on the right involve

two terms of real interest - p(data\model ) and p(model ) . The term

p(data) is the same for all models , and does not help us to evaluate the relative probabilities of truth of different models, except

perhaps indirectly(15 ). The term p(data\ model) represents what

Page II-39

statisticians have traditionally focused on : how well the data "fit "

the model, defined as the probability that these particular data

would have been generated if the model were true. The term p(model ) refers to the probability of the model before any data have �

observed at all ; it is our apriori probability distribution, The philosopher Immanuel Kant long ago asserted that



induction " is impossible, without some system of apriori assumptions

(the "apriori synthetic " ) with real information content to them(16 );

the choice of "p (model ) " would constitute such a system of assumptions. More recent philosophers, such as Carnap and Jeffreys(17),

have tried to preserve the more popular attitudes of pure empiricism and positivism, by suggesting that p(model) should be "equal"

(apriori) for all different models . Thus p(datal model) would be the

only term left to consider, in measuring probabilities of truth.

Their suggestions have been carried over to the field of statistics,

where they are now orthodo x practice(18). This approach is normally

referred to as the "maximum likelihood approach, " In more recent years, however, many members of a new school of statisticians, the Bayesians,

have grown in their opposition to this orthodo x procedure. They have pointed out that "p ( model)•k", with the same "k" for all models,

is a very strong assumption, just as strong as any of the alternatives.

In most practical problems , the social scientist would have some

reason to expect some models to be likelier than others, even before

Page II-40

he runs his statistical analysis , Thus they suggest that a user of statistical programs should be asked to specify his apriori

probability distribution as the first step of any statistical

analysis(1 9 ) 1 then the computer program can account for both

p(model) and p(data model), in picking out the model with the

highest probability of truth, From a broader perspective,

one might say that the Bayesians are proposing a procedure for

allowing the social scientist to account for two different kinds

of data - statistical data, and verbal data he has from other

sources, This still leaves open the question of where his initial

p( model ) should come from, a question which we can avoid in this context, (20 ) ,

The Bayesians may be right in principle , but in practice

the orthodox procedures may remain a sensible way to design

computer statistical packages. The social scientist, when he reads the output of a computer , would normally expect that this output

reflects only the ability of different models to fit the actual data;

in deciding what he finally believes about the world, he can � account for his verbal data, This does require that he understand

what "standard errors " mean, in ordinary regression, so that he can

get some idea of the variety of models consistent with the statistical

data , It also suggests that a direct printout of the relative

probabilities of truth of different models , over the given data,

Page II-41

would be a u seful feature to have. In brief, it requires the

development of an intuition regarding the relation of mathematics

to social processes, an intuition strong enough to sustain the

balanced assessment of probabilities. This does place a burden on

the social scientist. On the other hand, the extreme Bayesian

alternative - to ask a social scientist to encode his intuition

into a few normal distributions, and to ask for a more complete faith in what cemes out of a computer - would seem to place a

much heavier burden on the social scientist. It would tend to de-emphasize the learning experience which usually occurs at

the end of a statistical analysis, when the social scientist tries to relate all the things which crune out of the computer to what

he knows in the real world; if this experience is what develops

a balanced intuition in the first place, it should not be given

a diminished role. In Chapter (V), we will discuss in detail the

importance of this type of experience to the actual application of

statistical methods in the social sciences. Furthermore, the verbal

knowledge of a social scientist will not normally fit a simple

distribution. Even if it did, few "intuitive decision-makers" can

express their intuition at all reasonably in terms of probability

distributions, even in simple cases, without extensive training

in that task. (21 ). While there is more that could be said on both sides

of this particular issue, the orthodox approach would seem quite adequate for the purposes of the present context.

Page II-42

The dynamic feedback algorithm, which we are discussing in this

paper, can actually be applied to Bayesian estimation a$ easily

as to conventional estimation , In our examples and in our applications

we will follow the more orthodox procedures, However, we will refer

back, on occasion, to the concept of "prior probabilities" when this is appropriate ,

In concluding this section , we might note , for the sake of the

mathematician , that equation (2. 15 ) , when used in statistics, is

normally used to give the probability distributions of a continuous

family of models , rather than a discrete probability , Strictly speaking, such distributions

not even functionsr they

actually "measures", and they would normally be written as the

product of a function times an explicit measure , like "d9" , where "9" is a parameter of the model, In equation (2 , 15 ) , however,

the choices of measure used for the data cancel out; thus it is

not of intrinsic importance , so long as we are consistent in the

units we use to record the data, The choice of measure for the

model is one more aspect of the problem of specifying p(model ),

an aspect discussed by the sources referred to above ,

In general,

however, one would not expect the maximum likelihood choice of

parameters to be affected very strongly by the choice of measure,

unless the standard error for these parameters indicated a large

uncertainty and probably a low statistical significance in any case,

Paga II-4)

(vi ) ORDINARY RECRESSION AND THE MAXIMU M LIKELIHOOD APPROACH Back in section (ii ), when we discussed how to measure

the "probability of truth" of the Deutsch-Solow model,

we glossed over the basic questions discussed in section (v). In section (ii ) , we found ourselves discussing two different

"measures" of the probability of truth , one for equation (2. Ja)

and another for equation (2. 3b) 1 all the rest of our discussion fo cused on equation (2. 3a ) , an equation to predict a single

variable, A(t+l). When discussing a single equation, to predict

a single variable from known data, it makes some kind of sense

simply to add up all the square errors ( b (t )) 2 across time, and use

the sum as a measure of how good the equation is. However, what do we do if there are two equations and two sets of errors?

How do we combine the two different error terms to measure the

validity of the model as a whole? With the Deutsch- Solow model,

we could pick out the best values for k and k without answering 2 1 these questions, because the two equations were essentially

independent, and because we assumed that the two error terms,

"b" and "c" , each had their own probability distributions

independent of each other. ( Equation (2. 4 ). ) But we had to

avoid the question of how to measure the validity of the model

as a whole. Now , using the concepts of section (v) , we can come back

Page II-44

to answer this question. Let us define the relative probability

of truth, P, of any model, as z

P • p(data l model),

the probability that we would have observed the data we have

observed, if the model were true. More precisely, this

"probability" is actually a probability density, as are the

other "probabilities " in this section. For simplicity,

let us assume, with the Deutsch-Solow model, that we only have

data for three years, 1958, 1959 and 1960, in one country.

Writing out the data explicitly, we are trying to measure s

p - p(A( 1960), U(1960),A(1959), U (1959),A(1958), u(1958) 1 model),

which, by classic probability theory, equals the product z

P = p ( A(1960),U ( 1 96 0), A(1959),U(1959),A( 1958), U(1958 ), model)

* * * *

p ( U(1960 ) 1 A( 1959),U(1959),A(1958),U(1958),model)

p (A(19 59 ) lu(1959),A(19 58 ), U (1958), model )

p (U {1959) I A ( 1958), U (1958 ), model) p (A(1958), U { 1 958)l model).

Now our model, equation (2. J), predicts A(1960) as a function of

A(1959) and U { 1959 ) , once these data are given, the other data. will · not affect the probability of A(1960) as given by the model.

Page II-45

Similarly for U(1960 ), etc , 1 thus we can simplify our expression s P = p(A(1960 ) 1 A(1959), U(1959 ), model )

* * * *

p( U ( 1 960 ) (u (1 959) , model )

p (A( 1 959 ) lA ( 1958 ) , U(1958 ) , model) p( U(l959) l u(1 958), model )

p(A(1 958) , u (1 958)l model).

Once we are given values for A(1 959) and U(1959 ) , how do we

determine the probabilities of the possible values for A(1960) ? The � likely value of A(1960 ), according to equations ( 2. 3 )

and ( 2 , 4), is the one with b ( 1 960)-0, i. e. A(1 9 60) = A(1 960 ),

which equals k A(1959 ) +� U(l959 ). But any value for A(1960 ) would be 1 consistent with equations (2. 3 ), for some value of b , Values of

A(1960) far away from A(1 9 60 ), however, would imply large values of b,

which, according to equations (1,. 4 ), are not as likely as small values of b. To determine the probability of any given A(1960 ), given

A(1 959 ) and U(1 959 ), we need only look at the probability of the value of b needed to generate the combination,

b(1 959 )=A(1960 )-k A(1959 ) -� U(1959 ) , Thus s 1

p(A(1960 ) 1A(1959 ), U(1959 ) , model) = p(b(1959 )f model)

= --

-½( b( 1E59) 2 J

Page II-46

And similarly for A(1959 ), 0(1 960 ) and 0 ( 1959). Thus we get, in summary 1

P = p ( b(1 959 ) \ model ) * p(b(1958 ) ,model ) * p(c(1 959 )\ model ) * p(c(1 958 ) \ model ) * p (A( 1 958 ) , U(1958 )\ model ),

which, by equation (2. 4 ), equals 1 _L e ( fflB

..-¼-( b( 1958 ) )2 __a,.(b( 1959 ) )2 B -12 B 1e 2 )( )( �C JzilB

1 * (-- e


-½< c(1B5s > >2

) * p ( A ( 1958 ), U(1958 ) , model)

What do we do about the last term, representing our earliest

data point, 1 958 ? The usual practice is simply to ignore the final

term on grounds that it is difficult to compute, and contributes

only one time-point worth of information; for long series of data, the importance of one e xtra point of information grows very small. In the social sciences, the argument for eliminating this term

grows even stronger. This term, as usually interpreted(22 ),

requires us to compute the probability that we would have started off

at a data point equal to (A(1958),U(1 958 ) ) , if this initial data had

been generated by the Deutsch-Solow model operating for an infinite length of time before the start of the available data. Normally,

in the social sciences, one picks the start of one's data series for

one of two reasons 1 (i) one is trying to find a model to describe events in a given historical period, and one does not expect the

Page II-47

mod.el to be valid before the start of one ' s data series 1

(11) the data are not available before a certain time , usually

implying that some aspects of the social system were different beforehand. furthermore , if one ' s model is not "stationary",

as few stationary processes are, then the usual procedure for computing this term breaks down in any case.

On this basis , we get a relative probability density , - e


-½tc( 1t59 ) ) 2

1 e -

-½ ( c( 1 f58) )2


' )

The interesting part of this formula is the exponent, the part which

depends en b and c, In order to maximize p with respect to k , k2 1 and k4 , we try to bring the negative number in the exponent as close to zero as possible , This number is essentially the � of

the errors squared, exactly what we tried to maximize before .

Once we have done this, it is well-known that we can maximize P

by picking B and C to equal the root-mean-square average of b

and c, respectively,

Page I I-48

(vii ) THE NEED FOR SOPHISTI CATED NOISE MODELS In general , there is little reason to believe that the classic

normal distribution , of equations (2 . 4 ) , will be a good model of

the noise element in all social :processes. Mo steller, for example , has pointed out that "fluke s" occur fairly often in real social

data. (23 ) . There may be many processes which normally plod along in a predictable sort of way , governed by a noise process b ( t )

which fits a normal distribution and which never gets to be very

large ; every once in a while, however, the process may be hit by

a fluke , which leads to changes much larger than one would have expected in the normal course of events. Suppose that





is the

probability , at any time , of getting a fluke, Then the probability

distribution for b ( t ) may actually fit thi s kind of equation s p (b ) • ( 1 -p )-- e 1 .,JznB 1

where B


2 _ !. 2 t E. ) \B


2 -½ (; >


i s much larger than B . This equation states that most of

the time - ( 1 -p ) of the time , to be precise - b will fit the same 1 bell-shaped curve as before ; however, when a "fluke " occurs ,

b will fit a much broader bell-shaped curve , leading to much larger

values for b. One way to account for these effects i s to use this

probability formula explicitly , instead of the usual normal

distribution , in one ' s noise model s it may be impossible to

Page II-49

e stimate "B i " accurately, but, if flukes are a serious problem,

it may still be possible to e stimate k and k2 more accurately , 1

and t o show when "P" i s larger for this kind o f model.

Another source of noise, rarely handled explicitly in social

science , is "measurement noise . " In our discussion above , we

talked about "A(1 959 ) " and "A(1960 ) " as if we had available exact

data for the true levels of assimilation in those years. We may have

data, but there is good reason to believe that errors of different

sorts occurred in the collection of this data . Even if the data were "perfect " , in the sense of giving us a perfect measure of who speaks

what language when, for example, they may still not be giving us a

perfect measure of the underlying concept of assimilation, as

governed by equations (2 , J ) . Let us define ''U( t ) " as the true size

of the unassimilated population, and ''U' ( t ) " as the measured size of

the unassimilated model. Then we might modify equation (2. Jb)

by writing :

U(t+l )


k U ( t ) + c(t ) 3

U ' ( t+l ) • U( t+1 ) + d( t+l ),

(2 . 16 )

where "c" is the noise going c,n in the process itself, and where

"d" is the measurement noise , The "process noise, " "c ", is a random

factor in the actual process ( top equation ) which determines the

objective evolution of the real variable we are interested in, U,

Page II-50

through time. The "measurement noise ", "d 0 , does not affect the

objective reality, U, but only adds a factor of distortion to our

measurement of U, our U ' ; U'-U, the difference between the measured value of U and the true value of Even i f



equals the measurement error, d .

did represent an objective variable in its own right, but

a variable different from the one we really postulate to govern the

dynamics, then this mathematical structure would still apply,

Given that we do not know the true value of U(t) at any times t,

this model is not an "explicit " model ; it does not tell us directly

how to estimate U'(t+1 ) from earlier data available. (Note that the

noise term, "c(t ) ", makes it impossible to calculate later values of

U(t ) from an estimate of U(O). ) However, Box and Jenkins(24 ) have

shown that this model is equivalent to the explicit model , U ' (t+1 ) = 1JU'(t ) + f(t+1 ) - k4f(t ) ,

where "f" is a noise process fitting a normal distribution.

This model has a kind of "memory term " in it, k4f(t), and may be

estimated by use of the dynamic feedback method, as discussed in

section (iv) . In Chapter (III ) , we will describe how this method can be specialized to deal with models of this general sort, with

any number of dependent variables. Economists, like Cochrane and Orcutt, have developed techniques to deal with some of the

secondary consequences of measurement noise, like the problem of

Page II-51

serial correlation; however, in their original article {2 5 ), these

authors have made it quite clear that the general problem of measurement noise is beyond the scope of their techniques ,

This idea can be taken even further, if we wipe out the

term for "process noise, " and allow for the possibility of

measurement ncise only. In equation (2 , 16), this would mean

eliminating the term c(t), while retaining d(t+l ) and the other

terms of the model; this would make (2 . 1 6) an "explicit" model,

with memory U(t) , similar to our example of section ( iv).

At first glance, this procedure sounds both unrealistic, and totally inferior to the procedures of the paragraph above, Process noise

does exist in most social and ecological processes ; as long as we can account for process noise and measurement noise both, why should we

limit ourselves to the second possibility only?

Let us begin by seeing what this process really entails .

If we assume that there is no process noise at all, then we can

start out from our initial estimates (or data) for our variables,

and solve our equations exactlz to yield a stream of predictions for later data, right up to the end of our data set , these

predictions account onlx for the data in the initial time-period. Notice that this is exactly what we were thinking of doing,

early in section {ii), before we introduced the more "sophisticated "

concept of ordinary regression , Also, note that this is what

Jay Forrester ' s techniques(26) for "dynamic systems analysis "

Page II-52

tend to involve, even though he prefers to judge the final fit of

his models by eye rather than by computer. Above all, note that

the practical value of social and ecological models usually lies in their ability to predict the situation at distant times, without

requiring knowledge of intervening times in the future . (This includes, of course, the ability to predict the results of different policies, )

U sing our new procedure, we evaluate models by their ability to

yield good predictions across long periods of time, not by their

ability to predict across the smallest possible period of times

therefore, we will generate models and coefficients better suited to

the practical demands which will be placed on them. In order to estimate such models, we will have to resort to the dynamic feedback

techniques of section (iv) above , The prior unavailability of such

techniques offers at least some justification for the apparent

disregard of empirical data in some of Forrester's more interesting

work{27 ) ; however, if these estimation routines should become

available soon at the Cambridge Project Consistent System, a more empirical analysis of these issues will become possi ble.

In short, the practical reasons for disregarding process noise

can be very strong, at times. From a theoretical point of view,

however, the reasons are not so obviou s. If we start with a given statistical process, governed by a given set of equations, then

those equations themselves are the best possible basis for prediction,



whether across one period of time or many . Thus "truth " implies

predictive power. When we have a given set of data , the maximum likelihood method allows us to make use of all the information

available - not just one measure , like long-term predictive power over the given data - to find the parameters and model closest to being "true . "

The difficulty in this argument i s that "closeness to truth , "

unli ke "truth " itself, can be measured in many different ways.

The Bayesian school of thought has begun to argue that this point ,

too , should be accounted for in practical statistical routines (28 ) r

however , their concepts of "loss fun ction " do not fully encompass the con cept of "long-term predictive power " here . As a practical

matter , most models in the social sciences and in ecology are

simplified , approximate models , which we do not expect to be "true " in any absolute sense ; we only expect tharn to approach truth , or,

more reali stically , we expect them to give us predictions similar

in a broad way to what we would predict if we knew the full truth ,

Even when these models contain a hundred variables or so , they will still be hundreds of times simpler than the complete systems which

they represent . If there were an infinite quantity of representative

data available , and if we had to choose between a limited set of

models, � of which � "true" in !!l absolute sen se , then the model

which performs best , on , say , predicting across ten years of time ,

Page II-_54

over this data, can also be expected to give us the best predictions

of the future ten years hence. In order to estimate such a model ,

we should indeed try to minimize the errors across ten years , in stead of following the conventional likelihood approach.

In other words , if we wish to carry out an estimation which is

"robust " - an estimation which will give us good predictions despite

the oversimplifications of one ' s model - then a direct maximization

of predictive power is appropriate ; that is exactly what our "measurement noise only " approach entails .

In reality , we will have to accept limits both upon our choice of

models and upon the size of our data . We will have two sorts of

information to use in evaluating the predi ctive powers of our models , ( i ) the long-term predi ctive power !.§. measured directly over the

available data J (ii ) general information about the "truth " of our

model, as given by the maximum likelihood formulas . The first

information is a direct measure of what we want to know . On the other

hand , we only have a certain limited amount of this kind of

information , in our data . The second information does tell us

something about how close our model is to "truth " , which in turn

tells us something about predictive power. When our total information

is limited , statistical theory recommends that we make use of all the

information at our disposal , including both the "hard " and the "soft . " Our problem, then , is one of a more practical nature s



which of the two sources of information should we emphasize , when we want to build a model suitable for medium-term and long-term prediction?

General guidelines for dealing with this problem will have to

come from experience , experience with both ordinary procedures

and with the new procedure suggested here. It should be clear, however , that the relative importance of process noi se and

measurement noise will vary from case to case . A direct comparison of the methods , say, in predicting the second half of one ' s data

from the first half, would probably be desirable , in most cases.

When the major flaw in one ' s exi sting model lies in its inability to describe measurement noi se accurately , then one would suspect the possibility that the unexplained portion of that measurement

noise would be organized enough to be partly "predi ctable " from

one ' s process variables and noi se , this would lead to di stortion of the parameters of the proce ss proper . For models which have

thi s problem , the best way to improve predictive power may be to

avoid this di stortion , by making sure that measurement noise i s not

falsely attributed to the process equation s (i. e. to process noise ) ; by falling back on a "measurement noise only " model , in which process noise does not exist at all , one can eliminate this

distortion entirely. Once again , as noted o.n the previous page •

the "measurement noise only " approach involves no distortion at all ,

Page I I-56

insofar as it maximizes long-term predictive power , directly ;

its weakness lies in the lack of formal statistical efficiency .

When process noise is very large , and the neglect of process noise

would appear to seriously weaken one ' s ability to make full use of on e ' s data , then it would still be possible to compromise ,

by the relaxation methods to be discussed in section ( xi ) ;

these methods , by allowing process noise , but by making it

much more "expensive " to attribute randomness to process noise than to measurement noise , may reduce the false attribution

of the latter to the former , while preserving an adequate level of statistical efficiency. It is conceivable that in social science,

as in hard scien ce , there will someday be a viable distinction

between "practical " statistical work , where prediction is most

important , and "theoretical " statistical work, where "truth " as such

turn s out to be a more effective guide to finding new variables and

terms to use in one ' s models. However, once again , the practical

values of these techniques will have to emerge from empirical work. The empirical work of this thesis, in Chapters ( IV) and ( VI ) , does

provide a strong indication that the "measurement noise only "

approach i s superior to the pure maximum likelihood approach, in the social sciences ; this indication has been strong enough to totally

reverse our own initial bias in favor of the classical approach .

Still , the empirical studies here are only the beginning of a long


Page I I-.5 7

Finally , let us con sider one other situation where the

conventional approach to noise is inadequate 1 the case of

"ideal types . " Very often , in social science , we run across

variables like "Republican President " and "Democratic President "

which do not tend to va:ry across a continuous spectrum , they

tend to be simply "true " or "false . " The error in predicting

such variables would not follow a normal distribution , but the

problem n eed not be overwhelmingly difficult . On the other hand ,

we often find societies falling into certain distinct

"ideal types " (29 ) , such as "traditional " , "developed " and

"tran sitional " J we may find that a whole collection of other

variables - political stability ( JO ) , aggressiveness , economic

growth , etc . - depend heavily on which ideal type a society falls into . As an extreme example , let us imagine that there are three "ideal types " a nation might fall into , and that we have been

studying four so cial variable s which are all really determined

by the current "ideal type " 1 Variable 1

Variable 2

Variable 3

Variable 4

Type 1

Type 2

Type 3








1 0




Table II-3 : Hypothetical Example of "Ideal Types " .

Page I I-58

If we predict that variable one will equal one , and discover that

it actually equals zero , then thi s last piece of information tells us

exactly what to expect for variables two , three and four. If we

already made predictions for variables two , three and four , then

we would also know now exactly what errors to expect in these

predictions. Thus there is a connection , though complex , between

the "errors " in predicting different variables . If our example

had been somewhat more complex , with a lot of nonlinearity , and

a certain amount of freedom to deviate from one's ideal type ,

it is clear that the correlations between the prediction errors

of different variables could become hopelessly complex.

According to maximum likelihood theory , as sketched briefly in

section (v ) , it is important to minimi z e the "right " measure of

error, even when we estimate the coefficients we intend to use in

making predictions of this process. The "right " measure is suppo sed to correspond to the actual noise process going on . If it does become

important, in practice, that we do have such an accurate measure of

error, and if the actual noise process is as complex as above ,

then we face serious difficulties in estimating any parameters at all . In our simple example , we could escape from these difficulties ,

by carrying out a simple factor analysis to detect the ideal types ; then we could go on to study the ideal types only , and disregard

Page II-.59

the original four variables. However , in the general cas e , a linear

technique like factor analysis may not be enough s also , we may still

want to consider the original variables, to account for whatever

independent variation they have . In any case , it i s clear that the conventional model of independent errors , following a normal

distribution , cannot deal effectively with this kind of situation . The "measurement noise only" technique could con ceivably reduce

the difficulties here , but one would still expect a better model to emerge , if one could account for the complex interrelations

of the process variables more explicitly .

In summary, i n order t o produce a "true " model o f a social

process , which is also capable of yielding good prediction s ,

one must have an accurate model both of the "predictive part "

( like equations ( 2 . J ) ) , and of the "noise part " ( like equations (2 . 4 ) ) 1

otherwise , the standard techniques of statistical estimation may

yield unrealistic estimates of both . If one pursues an unbalanced approach , giving more weight to the "predictive " part than to the "noise " part , one may soon find oneself 1n a situation where the

inaccuracies 1n one ' s noise model are so large that any improvements in the "predictive " part are reflected by little improvement ,

i f any, i n the statistical likelihood o f one ' s model. The "models

without process noise " di scussed above can , at the very least , serve as detectors for this kind of difficulty .

Page II-60



Suppose that we had decided to make the Deutsch-Solow model

for assimilation more sophisticated, not by working on the

"predictive " part , but by working on the "noise " part; suppose that

we decided to account for the possibility of "flukes, " as discussed

above , Then we might write the model 1

A(t+l ) • k A(t ) + k U(t ) + b(t+l ) 2 1 1


-½t E.B >


+ p * 1


fzn Bl

(2 . 1 7 )

Our problem is to try to maximize "P", which, as in the case of

ordinary regression , will equal the product :

P • p(b(2 ) ) p ( b(J ))p(b(4 ) ) • • • p ( b (T ) ),

where T is the last time period for which we have data , An easier way

to approach this is by trying to maximize the logarithm of P 1

L = log P = log p( b (2 ) ) + log p(b(J ) ) + • • • + log p ( b(T ) ) .

We are trying to pick out the best possible values for the parameters k1 ,


in f\ will lead to enormous derlvatives from Q21 again , destabilizing 2 the system again before there is a chance for "w " to build up enough to allow a large increase ln 912 • 11hus , at least when the scaling

problem i.s severe, the hope of imbalanced growth is not an answer to the danger of slow convergence.

The solution of this problem was rather straightforward for

ARMA estimation ; we simply scaled the variables of the problem

according to a common scale . More precisely, we achieved the same effect by setting :


Page III-35

where " d" " refers to the standal'.'d deviation . On a more sophisticated level, what we are doing here is trying to maximize the expected

progress per iteration, :i.n light of our prior probabilistic

knowledge about L. We do not expect the units of measurement

of the variables to tell u s anything about their relative influence

on each other ; therefore, we demand a choice of "g " which insures k that a change of units will have no effect on our algorithm.

More generally, these variances give us an idea of the expected

order of size of a coefficient, and we set "gk " to keep the changes in line with the expected ratios between sizes and derivatives.

To handle the case of Bk


but reasonable expression ,

a l , r ' therefore, we write a rough

g = max( T ( 1-P )A , A ) rr rr rr r

in By changing a 1,r by a certain amount, we are chang:i:ng at, r units proportional to 1 � thus our formula for g i s like our formula r for the g-· k· with 9rs·• except that ..,i__s .. i s replaced by " 1 ". If Prr equals one, then this effect will tak e place on all the at,r '

and the analogy i s exact , Otherwise, if P ls smalle:::�, the rr with respect to a1 , r of L will be much smaller, even when the optimal is still just as large ; thus we propose a large g ' 1,r r (O) ., is recalculated in every major in that case. Note, as "A

size o f a

iteration, the formulas above encourage us to recalculate the "g .. k

at the same time. In Chapters ( IV ) a.'1 d ( VI ) we have included

a brief discussion of the success of this general procedure with

Page III-36

the estimat.ions we have carried out ; in the Appendix to Chapter ( II )

w e have suggested ways of generalizing the procedure for use with general , nonlinear models.

The choi ce of "w" requires a similar exercise in prior

estimation . At each step, our program looks at three essential

pi eces of data - L(A (




) , L(A' ,1, ) and � (B k-B �O ) )t I(


Assuming that L is e ssentially quadratic , and that the current choice of



"right" ( i . e . that L(A ' ,'7!!) is the maximum of the quadratic

d:lstribution ) , the program " expects " that L ( A • ;rt') wi ll be better than L(A ( O )

;f. O ) ) by exactly half what the gradient would appear to

ind i cate . If thi s expectation is co:rrect. then the program concludes that B ' is not only acceptable , but also that there probably i s

little point i n exploring further i n the same direction ; i t sets

t( o )c�9



to A ' , and begins a new major iteration. If L ( A ' ,]"t )

i s worse , then, by the quadratic assumption , we have overshot by

at least a factor of two ; w should be cut in half, ari.d a new B...,- tried

accordingly . In o:cd er t,o be a bit more conservative have specified in our program that w will be reduced by 40%, if the actual gain

i s less than 2516 of what i s indicated by the gradient. If L ( A ' , �) i s

better than 75'f'/, o f what is indicated by the grad.lent , our quadxatic

assumption tells us to double w . In a.n earlier version , we were more conservative here , and required 100%; however , convergence was slow

in some cases, and w e reduced the requirement to 8756, which has proven e 9 In the intermediate range , when B ' i s d eemed acceptable for

Page I II-J7

the start of a new major iteration, w is still changed somewhat,

for the sake of the next iteration ; w is multl.plied ·by twice the

actual gain, L(A' ,1• )


L(A ( O ) ;i ( O ) ), divided by the gain predicted

by the gradi ent. At the other extr�,me, if w appears to be far off, the prog-ram will multiply w by

or by . 3 in each minor iteration ; ( ' _. more precisely, if w appears to be too small to let us set ➔B O 1 �B ' , Lf

even after w has just been doubled within the same major iteration,

or if w appears to be too large after having just been cut to 60%, then a larger change will be tried in the m,1xt minor :i teration.

Flags are set in the program, to force it to stop changing w,

as soon as it starts changing w in opposite directions within the


same major i teration ; in such cases, our procedures above insure that


either the last B ' or the one before it gave an L ( A ' , B ' ) much better than L ( A ( O ) ,B ( O ) ), and our program will set B ( O ) to this new


the next major i teration. For reasons similar to those mentioned


in the previous paragraph, w i s in1 tialized at 1/'1',

At each step, the program prints out L, as defined in equation

(3 . 38 ) , and the direction of change of w . After five major iterations,

or after L appears to have stabilized to within , 01, whichever

comes first, the program stops, and as k s the user :i f he wishes to

continue ; if not, it prints out the analysis so far, and transfers to another program to ca:rry out s:lmulation studies of his model.

In the current version described above, five major iterations have

usually been enough for a close approximation, for analyses of actual

Page III-38

social science data ; for safety, however, we have generally used ten in our own analyses . ( Once L is about 0, 1 away .from its

:maximum, then the current set of coefficient estimates has almost as high a. probabi li ty of exact truth - 90% as high - as the

maximum li kelihood set itself ; thus 0 . 1 is a conservative upper limit to how much accuracy it makes sense to ask for. ) Unfortunately, the changes above were made piecemeal over a number of runs on·

different data, with the final improvements ex isting only in the

basic subroutine incorporated into the MIT version. Thi s routine has

a more effective procedure for generating initial estimates than

we u sed with our earliest test data ; thus the direct comparison , before and after, would overstate somewhat the relative merits

of the current system, In the Appendix to this Chapter , we have

provided a numerical example of convergence results before and after these :procedures for convergence were introdu ced .

'U1e fundamental purpose in using ARMA estimation, as we have

described it, i s to improve upon classical multiple regression. Thus we have decided to use multiple regression i tself, to provide the

initial estimates, i ( o ) . Not only are these li k ely to be reasonable

estimates, in terms of their general order of size and in terms of

the size of the biggest terms ; they also provide us with an assurance that our ARMA model will either represent an improvement upon

multiple regression, or, in some cases, will confirm multiple regression.

Page III-39

Our subroutine has been written to allow other initial estimates,

but the main program now available at TSP does not make use of

this option. Originally, we used the regression coefficients that come from a standard model including regression constants ;

our results on Norway, i n section ( v ) of Chapter ( VI ) , were based on that system. With the constants , one introduces a greater degree of

freedom into the regression models, to offer a more interesting ( though perhaps axtificial ) comparison against the ARMA models,

which do not include that degree of freedom. However, in order to

insure convergence u�der all circumstances, we have eliminated the

regression constants in the MIT version ; users of that system still

have the freedom , in any case, to obtain regression constants based

on constant terms by using other modules in the same system or even by another run of the ARMA command, Both the MIT version, and our private version used on the Norway data, print out all the ARMA estimates and regression coefficients, along w:.tth the standard

deviations of the variables ( to assist in interpretation ) and

the likelihood values for both models. 1he significance of the

various coefficients can be estimated by looking at the likelihood of the models which result when the coef':ficients are removed from the model.

Several other options have been added, to extend this algorithm

somewhat, First of all, there is now provision for "exogenous

Page III --40

variables . " In the di.scu ssion above , the expression "fl�\- i "

could have been replaced systematicall.,v by "By - i " , where G is t

now a rectangular matrix , and where y t- i includes both z - i t

and a few other components ; none of the equations above would have

had to be changed in form. Our program, in its current form , allows both endogenous and exogenous variables ,

Second , there i s provision to allow the user to dictate

apriori that certain components of 9 will be constrained to equal z ero . 'I'.his is done simply enough , by setting thei:r initial values

to zero , and constraining ( 3 . 4 1 ) to apply only to the o ther

components of


Thus L is rrJ.aJdmized as a function of the other

components , subject to tM.s constraint. The basic subroutine allows

this for any coeffi cient, but , in the MIT version , we have limited this to those

. for which y . i s an exogenous variable. J J Th.ird , there i s provision to give the user some ability , Q


at least, to handle nonstationary processes . Box and Jenkins ( 19)

discuss at great length the prominence of nonstationary processes in practical statistics ; they point out the value of introducing some kind of careful procedure for dealing with nonstationazy

processes, even if the procedure must have a less rigorous

foundation than the u sual statistical processes , in order to give

the social scientist confronted with such :processes an alternative

other than either giv:lng up or using an inappropriate tool, However ,

the procedures they i ntroduce (20 ) involve processes which tend to grow

Page III--41

k as t , for some constant k. By contrast, the commonest processes of

growth in the social sciences would appear to be those of exponential growth, processes which may come out of a dynamic relation like that of equation ( 3 . 5 ), but with a choice of "G" large enough to allow

growth. The concept of maximum likelihood, as discussed in

Chapter ( II ) , does not require a "fl" that generates a stationary

process; thus at first glance, the special procedures suggested by Box and ,Tenkins might appear irrelevant , However, our estimation

procedure has depended on equation ( 3 . 6 ) , not ju st on ( 3 • .5 ) and the

likelihood concept. Equation ( 3. 6) implies that the average size of the random component of our process remains the same across time ,

If w e w ere analy z ing a two...hundred-yeax series of data on the US GNP,

for example, thi s would imply that a $10 billion error in our

predictions for 1 790 from 1789 should be treated as a smaller mat·ter than an $H billion error in our predictions of 1973 from 1972 ;

a $ 1 0 billion error would always be regarded as less significant

than an $ 1 1 billion error, regardless of the year in which the error

occurred. In practice, the measurement errors and random fluctuations both are likelier to be a fixed percentage of the variable itself GNP - than to be a fixed independent process. To handle this kind

of situation, we have introduced an option, ''AfillAWT", to deal with

a model o:f error slightly different from ( 3 . 6 ) :

1(21l)fi det A



Page III-42

which is simP,lY the normal distribution for the n-dimensional t vector e' , � , and which requires u s to calculate A 4 as the 1� lzt , 1 covariance of this vector. In practice, however, the simulation studies of Chapter ( IV) suggest that the ordinary ARMA co1111T1and

generally performs at least as well as ARJi'1A\iT, even for most of the nonstationary processes studied.

F"lnally , provi sion has been made for the possib:llity -

mentioned in Chapter ( II ) - that the available data would consist, not of one string of observations across time


our variable s ,

but of a. whole set of such strings ; a general model of the process of population growth , for example , might encourage us to develop

a :mod el for application to data,-series involving the same variables

across many dlfferent countries , In order to handle thi s possibility,


we can use ( 3 . l} l ) as before , but must note : (. i ) aG .lk.. and ap _iL • across rs rs all the data strings, will simply equal their sum across all the

individual data-strings ; ( ii ) each data-string, s, will require its own 'tl s ) for initialization ; ( iii ) �(s) will , of course , equz.l the aa�� �L value of ..i, calculated in string s. ( Note that this example 1 ,r might be a good candidate for the use of "AHMAWT" , to prevent the analysis f'rom being dominated by nations of large population • • •

unless such weighting is ac tually d.esin.x1 . )






All of the ARMA es t imations reported in Chapters ( I V) and ( VI )

were based upon the final form of our algo:ri thm, use of the special convergence procedures described in section ( iv ) . However,

before we installed these procedures , we carri ed out a number of tests on the preliminary version of the ARMA routine, on simple

made-up data se ts , in order to check out the accuracy of the routine . One of these sbnple test series was u sed before we introduced the

possibili t y of different '' gk " for different "B k " , as in equation

( 3 . 4 1 ) ; thus we can see the effect of adding our new convergence

procedures by comparing the old test results against a new analys:i.s

o:f the same series. 'I'h e data series in question is a simple

univariate series of length seven - 1 . 0 , 1 , 2 , 1 . 2 ,

1 . 3 , 1 . 5, 1 . 4 , 1 , 0 -

fit to the model z +l = Gz + a + Pa -l " This series does not fit t t t t

w ell to a simple ari thmet.ic progression ; thus the "distance " from

the regression model to the ARMA mod el turns out to be fairly great ;

the series i s a relatively severe test of convergence possibilities.

( The cost per iteration i s low , because the series i s short, but the

progress per iteration in log probability , as a percentage of the gap in log probabili ty , i s extremely slow . ) Our initial test output

Page III-44

was a string of numbers, which we may arrange in a table : Major Iteration Number 1




,, u




5 "" r t "'


uu uu

Theta i* o) B



( and 1 0 ) : P ( t ) = . OSA ; M( t ) = . l S R

P r oc e s s 5 ( a n rl 1 1 ) : P ( t ) = . 0 5 A ; M ( t )

P ro c e s s


( a n rl

12) :

P( t )


. OSR;

,� ( t )



. osc . o s r,

F i v e a n d f i f t e e n p e r c e n t e r r o r s w e re c h o s e n o n g r o u n d s

P a p.; e I V - 9

t h a t t h e y s e er, " t y p i c a l " ;

compu t e r t i me w a s no t

a v a i l a b l e t o r e p l t c a t e t h i s s t u d y fo r d i f f e r e n t v a l u e s

o f t h e s e pa rame t e r s . a ppea r a nce of


I t s h ou l rl b e n o t e d t h a t t h e

!\ 1 1 tw i �e w i t h P roce s s 2 , a b o ve , d o e s no t

me a n t h a t t he s ame r a n dor, n u�h e r , A ( t ) , wa s u s e d i n bo t h processe s ;

i n ge ne r a l , e ve r y t i me t h a t we n e e d e d

a r a n dom n umbe r f o r a n e w a p p l i c a t i o n , w e i n vo k e d a new c a l l t o " r a n dom " , t h e r a n dom n umh e r �e n e r a t o r o f t h e P roj e c t C amb r i d g e T S P - C S P s y s t e m .

I n e q ua t i on ( 4 . 2 ) ,

one s ho u l d a l s o no te t h a t a n u n r e a l i s t i c c h a n g e o f s i g n cou l d o c c u r i f P ( t ) s h ou l rl e v e r e q u a l - 1 o r l e s s ; t h i s

wo u l d b e a ve r y r a re e ve n t , w i t h t h e s ys t e ms we h a v e s pe c i f i e d, b u t eve n o n e s u c h i n c i d e n t wo u l rl pe r s i s t

t h ro u g ho u t a n e n t i r e s i mu l a t e d t i me - s e r i e s, m a k i n g i t to t a l l y u n r e a l i s t i c a s a r e p r e s e n t a t i v e of

so c i a l - s c i e n c e t i rne - se r i e s .

S i m i l a r l y , wh t l e

me a s u r eme n t e r ro r s a r e o c c a s i o n a l l y q u i t e g r o s s f o r

so c i a l s c i e n c e va r i a b l e s , i t i s u n r e a l i s t i c to i ma g i n e

s orieo ne g e t t i n � t h e s i g n w ro n g fo r s u c h v a r i a b l e s a s S N P o r pop u l a t i o n .

T h u s fo r P ( t ) a n rl M ( t ) bo t h we

i n s t i t u te d a c u t o f f of - . 7 5 ; va l u e s l e s s t h a n t h i s we r e

s e t e q ua l t o t he c u to f f ,

( No s i mu l a t i o n s we r e r u n

w i t ho u t a c u to f f ; t h u s i t i s pos s i b l e t h a t t h e c u to f f

P a r; e I 1 1- 1 0

wa s n e v e r a c t ua l l y i n vo k e rl . ) � mo r e e l e R a n t p ro c e d u r e , ma t hema t i c a l l y , ri i r; h t h n v e b e e n t o u s e � a n d


( l+ P )


( l+M)

I n e c:i u a t i o n

(4.2) .


1 fov,e v e r ,

i n s tead

ma j o r

a n d e n d u r i n g c r a s h e s do som e t i me s o c c u r i n t h o s e so c i a l s c i e n c e va r i a b l e s s ub j e c t t o e r r � t i c b e h a v i o r ; f o r

e x a mp l e , p h e n ome n a s u c h a s z � r o o r n e � a t i v e pop u l a t i o n s eem t o b e a vo i d e d a s a r e s u l t o f e x t r ao r rl i n a r y

p ro c e s s e s d i f f e r e n t f rom t h o s e o ne r a t i n g o n n o r ma l

pop u l a t i o n s . T h u s , o n g r o u n d s o f r e a l i s m, we d e c i d e d to use c u t of f s i n s t e a d o f a mo r A e l e R a n t a p p ro a c h . A l so , a

c u to f f o f - 1 5 . wa s u s e d f o r t h e v a l ue o f B ( t ) l n s e r te rl

i n to e q ua t i o n ( 4 . 5 ) , o n g ro u n d s t h a t t h i s wo u l d p r e ven t

t h e po s s i b i l i t y o f i n vo k i n g a c u t o � f s e v e r a l t i me s i n a row o n M ( t )


T h e c h o i c e s a b o v e , i n b r i e f , g i v e u s a c h a n c e to

l oo k at t h e fo u r pos s i b i l i t l e s o f n o me a s u r eme n t n o i s e ,

of s i m p l e me a s u r eme n t n o i se , o f me d i um - c omp l e x

me a s u r e me n t no i s e , a n rl o f c omp l e x mea s u r e men t no i s e , a l l i n t h e p r e s e nc e o f s i mr, l e p r oc e s s n o i se ;

t h e y a l so

l e t u s l oo k a t s i mp l e m ea s u r eme n t n o i se a n d com p l e x

me a s u r eme n t no i s e , i n t h e p r e s e n c e o f me d i um- comp l e x

p ro c e s s n o i s e . P ro c e s s e s 1 a n rl 2 c l os e l y re s emb l e t h e

p ro c e s s e s fo r wh i c h s i mp l e r e g re s s i o n a n d A RM A morl e l s ,

P a ge I V - 1 1

re s pe c t i ve l y , s h o u l d h e i d e a l , i n t h e o r y .

T h e e x t r eme

c a s e o f z e ro p r o c e s s n o i s e , t h P. c a s e wh i c h s h ou l d b e

mo s t f a vo r a b l e t o t h e " ro h us t a r p ro a c h " , wa s n o t i n c l ud e d , o n g r ou n ds t h a t we a r e i n t e r e s t e d i n

e va l u a t i n g t h a t a p p r o a c h u n d e r mo r e no r ma l , m i x e d con d i t i o n s .

I n o r rl e r t o a c c o u n t f o r t h e po s s i h i l i t y o f

mo r e c om p l ex p ro ce s s e s , w i t h o u t v i o l a t i n g h omoge n e i t y ,

v,1e i n t ro d u c e d P roc e s s e s 7 t h r o u g h 1 2 h a s e d o n t h e

fo l l ow i n g e q ua t i o n s , w h i c h y i e l d a g r ow t h r a t e o f 1 . 6 % ( f r om t h e 1 i n e a r i z e d d i f f e r e n ce e q ua t i o n )

X ( t + l ) = ( . 3 8 X ( t ) + . 3 5 X ( t - 1 ) + . 3 X ( t - 2 ) ) ( 1+ P ( t ) ) Z( t ) = X ( t ) ( l + M( t ) )


T he c h o i c e s o f P ( t ) a n d M ( t ) he re we re i d e n t i c a l to t h o s e w i t h P ro ce s s e s 1 t h ro u � h 6, a s i n d i c a t e d i n

e q u a t i o n s ( 4 . 6 ) , a n d t h e s a me c u t of f s we r e u s e d .

F o r e a c h o f t h e twe l ve p ro c e s se s d e f i n e d a bo v e ,

t e n s a m[) l e t i me - s e r i e s , " Z l " t h r o u g h " Z l O " , we r e

g en e r a t e d . T h e n , i n o r d e r t o e s t i ma t e " c " i n e q u a t i o n ( 4 . 1 ) , f o r e a c h of t h o s e s a m p l e t i me - s e r i e s , we u s e d

t h e t h re e gene r a l te ch n i q u e s d i s c u s s e d t h r o u � ho u t t h i s

t h es i s : ( i ) c l a s s i c a l r e R re s s i o n ;

( i i ) t he II A R M .i\ 1 1

Pap.:e I V - 1 2

;::i p p r o a c h ; ( i i i ) t h e

conven t i o na l

W i'l. Y


roh u s t a p p ro a c h " . T h e mo s t

of e s t i ria t i n g " c " , f o r t h e mo d e l

( 4 . 1 ) , i s to u s e a s t a n d a r d r e g r e s s i o n p r o g r am to do a

ma x i mum l i k e l i hood e s t i ma t i o n o f t h e r e l a t e d mo d e l : Z ( t+ l ) = cZ ( t ) + k + a ( t ) ,

(4. 8)

\'I/ h e r e " k " i s a co n s t an t to h e e s t i ma t e d , a n d "a ( t ) " i s a no rma l r a n dom no i s e p r o c e s s .

I n p r a c t i ce , t h i s

a mo u n t s to do i n g a l e as t - s q u ;::i r e s e s t i r1a t i o n , a s i n s e c t i o n ( i i ) o f r. h ape r ( I I ) ; o n e hope s t h a t


k" , wh i c h

i s e x p e c t e d to be ze ro , w i l l he e s t i ma t e d a s sor1e t h i n g

c l o s e to ze ro .

con s t a n t t e rm,

T h l s te c h n i q ue , r e g r e s s i o n w i t h a

i s a b b r e v i a t e d a s " r e g + k " i n o u r ta b l e s .

A h e t t e r wa y of u s i n g c l a s s i c a l r e g r e s s i on , t o

e s t i r, a t e


c I I i n ( 4 • 1 ) , i s t o u s e a s i mp 1 e r mo d e l ,

w i t ho u t t h e me a n i n g l e s s c o n s t a n t te rm : Z( t+l) = cZ( t ) + a ( t)

( 4. 9)

T h i s k i n d o f r e g r e s s i o n c a n he pe r fo r me d a u toma t i c a l l y i n t h e T i me - Se r i e s - P ro c e s so r s y s t e m . T h i s t e c h n i q u e ,

r e g r e s s i o n w i t h ou t a co n s t a n t te rm, i s a b b r e v i a t e d a s

" r e g " i n o u r t ab l e s .

r o r r e s po n d i n g to t h e s e two s i mp l e r e g r e s s i o n

P a p; e I V - 1 3

mod e l s a r e two si m p l e A RMA mo d e l s .

Acco r d i n g to o u r

r e s u l t i n t h e b e gi n n i n g of C h a p t � r ( I 1 1 ) , t h e

c o r r e s po n de n ce i s mo r e t h a n j u s t o n e o f s i mi l a r i t y ; t h e ARMA mod e l s b e l ow a r e t h e gen e r a l i z a t i o n s o f ( 4 . 8 ) a n d ( 4. 9 )

t o ac cou n t fo r t h e po s s i b i l i t y o f "whi t e n o i s e "

i n t he p r o ce s s o f d a t a me a s u r e me n t . Mo d e l ( 4 . 8 )

co r r e s po n d s to :

Z ( t+ l ) = c Z ( t ) +a ( t ) + Pa ( t - l ) + k ,

( Li .

I O)

w h i c h c a n b e e s ti m a t e d d i r e c t l y i n t h e P ro j e c t

C amb r i d g e Ti me - S e ri e s P roc e s s o r , h y u s e o f t h e comm a n d "ARMA" , d e s c ri b e d t n t h e l a t e r p a r t o f C h a p te r ( I 1 1 ) .

T hi s t e c h n i q ue fo r e s t i m;i t i n � " c " we a b b r e vi a t e a s 11

a rm a + k 1 1 i n t he t a b l e s a t th e en rl o f t h i s c h a p te r .

Mod e l

( 4. 9 )

co r r e s po n d s to : Z ( t+ l )


(4. 11)

cZ( t ) +a ( t ) + Pa ( t- 1 ) ,

w h i c h c a n a l so be es t i ma te d b y t h e comm a n d


/\ R M l\ " ; t h i s

te c h n i q ue fo r e s t i ma ti n g "c " we w i l l a b b r e vi a t e a s

" a rm a " i n t h e t a b 1 e s a t t h e e n d o f t h i s c h a p t e r .

F i n a l l y , t he e s t i ma ti o n a l go r i t h m d e s c r i b e d i n C h a p t e r ( I I I ) a l l owed u s to w r i t e a no t h e r comma n d , ARM AWT , to

e s t i ma t e t h e mod e l :

Pa ge I v - 1 1�

Z( t+ l )


cZ( t ) { l+a ( t ) ) +Pa ( t- l ) z ( t- 1 )

( 4 . 12 )

T h i s comma n d , me n t i o n e d b r i e f l y i n C h a p t e r ( I l l ) , i s

e s s e n t i a l l y eq u i v a l e n t t o e s t i ma t i n � ( 4 . 1 1 ) , w i t h t h e

a s s um p t i o n t h a t t h e no i s e p r o ce s s , " a ( t ) " i n ( 4 . 1 1 ) , i s de t e r m i n e d a s a ne r c e n tage of t h e a c t u a l v a r i a b l e , a s i n ( 4 . 2 ) , r a t h e r th a n a p ro c e s s o f co n s t a n t Me a n a n d

va r i a nc e .

T h i s t e c h n i ci ue f o r e s t i M.-3 t i n g " c " , we

a b b r e v i a t e as " a rrv'lw t " i n th e t a h l e s a t t h e en d o f t h i s

c h a p te r .

F i n a l l y , we h a d t o f i n d a " r o h u s t p r o ce d u r e " fo r

e s t i ma t i n g c , d r a wn f rom t h e d i s c u s s i o n o f s e c t i o n ( v i i ) o f C ha p te r ( I I ) .

I n t h e pu r e c a s e of z e ro p r o ce s s

no i s e , t h e s e p r o c e d u r e s r e q u i r e u s t o e s t i ma te t h e

i n i t i a l " u n d e r l y i n g " v a l ue s o f t h e v a r i a b l e s me a s u r e d ,

a n d t h e coe f f i c i e n t s o f t h e mo d e l , b y d i r e c t l y

m i n i m i z i n g t he a ve r a ge e r ro r s o f l on g - t e rm p r e d i c t i o n s

ma d e \'-J i t h t h i s mo d e l .

I n o t h e r wo rd s , f o r t h e e s t i ma t e d

i n i t i a l v a l ue s , a n d a g i v e n s e t o f coe f f i c i e n t s , o n e

ma k e s a f u l l s e t o f p r e d i c t i o n s f o r t h e v a r i a b l e s o f i n t e r e s t , w i t hout e v e r m a k i n � u s e o f t h e me a s u r e d

v a l u e s o f t h e v a r i a b l e s a t i n t e r me d i a t e t i me s ; u s e s t h e a ve r a g e e r r o r i n t he s e p r e rl i c t l o n s ,

on e

p r e d i c t i o n s w h i c h a r e ge n e r a l l y l on � - t e rm p r e d i c t i o n s ,

Pa ge I V - 1 5

a s o n e ' s c r i t e r i o n o f f i t ; o n e u s e s t h e me t h o rl o f

s t e e pe s t d e s c en t , o r a r e l a te d p ro c e d u r e , i n o r d e r t o

p i c k c oe f f i c i e n t s a n d i n i t i a l e s t i ma t e s t o m i n i m i z e t h e to t a l e r ro r i n t h e s e p r e d i c t i o n s . ( A " r e l a xe d " ve r s i o n

of t he se p r o c e d u r e s wou l d a l l ow a l i t t l e h i t o f

a l l ow a n c e t o b e m a d e fo r i n t e r med i a t e mea s u r e d v a l u e s , i n p r e d i c t i n g mo r e d i s t an t t i me pe r i od s . )

T h e f u l l mu l t i v a r i a t e , no n l i n e a r ve r s i o n o f t h i s

p ro c e d u r e , b a s e d o n t h e d y n am i c f e e d b a c k a l go r i t h m o f

C ha p te r ( I I ) , w a s no t a v a i l a b l e a t t h e t i me t h e s e s i mu l a t i o n s we r e ca r r i e d o u t .

Howe v e r , fo r e q u a t i on

{ 4 . 1 ) , t h e r e i s a me a s u r e me n t - n o i s e - on l y mo d e l wh i c h i s

muc h e a s i e r to e s t i ma te t h a n i s u s u a l l y t h e c a s e : X { t+ l ) = cX { t ) Z { t ) = X { t ) ea ) � (t

{ 4 . 13 )

X{ t) { l


a{ t) )

I f t h e me a s u r emen t n o i s e , a { t ) , i s o n t h e o r d e r o f 1 0 %

o r l e s s , t h e a pp r o x i m a t e e q u a l l t y h e re w i l l b e ve r y goo d , a c co r d i n g t o t he T a y l o r e x p a n s i o n o f ea_


. In

o r d e r t o e s t i ma te X { O ) a n d c i n t h i s mod e l , o n e c a n t r a n s fo r m { 4 . 1 3 ) a l ge h r a i c a l l y t o d e d uce :

P a ge I V - 1 6

l o g Z ( t)


t l o� c


l o g X ( O)

I n o r d e r to p i c k t h e con s t a n t s , l og c

( 4 . 1 4) +

a ( t)

a n d l og X ( O) , to

ma x i m i z e t h e l i k e l i h ood of t h i s mo d P. l , a c c o r d i n g to s ta n d a r d ma x i mu m l i ke l i h ood t h e o r y , o ne n e e d o n l y pe r f o rm a s i mp l e r e g r e s s i o n o f

l og Z ( t) a ga i n s t t h e

i n d e pe n de n t va r i a h l e " t " a n d a con s t a n t t e rm . A s pe c i a l

ro u t i n e to pe r f o r m t h � s o r,e n� t i n n , ca l l e d " G R R " ( G Row t h

Ra t e) , wa s a d d e d t o t h e P ro j e c t C a mb r i dge T i me - S e r i e s P ro c e s so r i n J a n u a r y 1 9 7 4 . N o t e t h a t t h i s r o u t i n e

e s t i ma t e s " c " a n d X ( O ) i n e x a c t l y t h e s a me w a y a s t h e o l d ro u t i n e

C ha p t e r


1: X T R.t\ P" r:l i d , i n t h e wo r k r e po r t e d i n

( 1/ 1 ) .

F i n a l l y , a f t e r t h e s i mu l a t i o n p r oce s s e s a n d t h e

e s t i ma t i o n t e c h n i q ue s a r e d e f i n e d , we f a ce t h e p ro b l em of me a s u r i n g how we l l t h e e s t i ma t i o n t e c h n i q u e s

a c t u a l l y pe r f o rm . W e h a ve u s e d two d i f f e r e n t c r i t e r i a . F i r s t , th e r e i s t h e c r i t e r i o n o f p r e d i c t i v e powe r ,

me a s u r e d ex p l i c i t l y .

Fo r e a c h c om b i n a t i o n o f

t i me - s e r i e s ( o u t o f 1 2 X 1 0 = 1 2 0 s amp l e t i me - s e r i e s)

a n d o f e s t i m a t i o n t e c h n i q w� , we u s e d t h e v a l ue of " c " ,

a s e s t i ma te d fo r t h e f i r s t 1 0 0 t i me - pe r i o d s , t o t r y to

P ap.; e I V - 1 7

p r e d i c t t he v a l ue s o f t h e v a r i a h l e Z ov e r t h e r e ma i n i n g 1 0 0 pe r i od s . F o r e a c h s uc h se t o f p r e d i c t i o n s we

c a 1 cu 1 a t e d f o u r me a s u r e s of e r r o r :

( i ) r . m . s . ( r oo t

me a n s q ua r e ) a ve r a ge pe r c e n t a ge e r ro r I n p r e d i c t i n g

pe r i od s 1 0 1 t h ro u g h 1 1 0 ; ( i i ) r . m . s . a v e r age pe r c e n t a g e

e r r o r i n p r e d i c t i n g pe r i o d s 1 0 1 t h r o u g h 1 2 5 ; ( i i i )

r . m . s . a v e r ag e pe r c e n t a ge e r ro r i n p r e rl i c t i n g p e r i o d s

1 0 1 t h ro u g h 1 5 0 ; ( i v ) r . r, . s . a v e r a ge pe r c e n t a ge e r ro r

i n p r e d i c t i n g p e r i od s 1 0 1 t h rour;h 2 0 0 . ( l\ l s o , a s e t of

p r e d i c t i o n s w a s ma d e , f rom pe r i o d 1, to pe r i o d s 1

t h r o u g h 1 0 0 . ) T he ex a c t r e s u l t s o f t h e s e te s t s a r e

s hown i n T ab l e I V - 4, fo r e v e r y s amp l e t i me - s e r i e s . L e t u s d e f i n e i n a h i t mo r e d e t a i l h ow t h e s e

p r e d i c t i o n s we r e a r r i ve d a t . F o r t h e r e g r e s s i on mo d e l s ,

( 4 . 8 ) a n d ( 4 . 9 ) , we i n se r t e d t h e k n own v a l u e o f z ( l O O ) ,

a n d t h e e s t i m a t e s o f c a n d k , a n rl t h e mo s t p r o b a b l e

v a l ue fo r a ( l O O ) ( i . e . z e r o ) , i n o r d e r to p r e d i c t

z ( l O l ) ; t h i s p r e d i c t i o n fo r Z ( l O l ) w a s r e i n s e r t e d ,

a l ong w i th a ( l O l ) = O, t o g i ve u s a p r ed i c t i on of Z ( l 0 2 ) ,

wh i c h i n t u r n wa s r e i n s e r t e d , e t c . F o r e q u a t i on ( 4 . 9 ) , t h i s h a s t h e s ame e f f e c t a s i n s e r t i n g t h e e s t i ma t e o f


c 1 1 i n to ( IL l ) , an d u s i n g ( 4 . U t o ma ke t h e f o r e c a s t s .

l ! i t h t h e AHMA mode l s , e11 u a t i o n s ( 4 . 1 0 ) , ( 4 . 1 1 ) a n d 1

Pa r;e I V - 1 8

( 4 . 1 2 ) , a l mo s t t h e s ame p r o c e rl u r e wa s u s e d . T he

me a s u r e d v a l ue o f Z ( l O O ) wa s i n s e r te d to g i v e t h e f i r s t p r ed i c t i o n, the p r e d i c t i on o f 2 ( 1 0 1 ) ; a ( l O O ) t h rough a ( l 9 9 ) we n? s e t t o z e r o .

HO\\le V e r , t h e A R H /\ e , it was decided that priority should be given to the simpler formul ation at this stage.

At any rate, the model written out above -

(G. 2 9 ) - did not perform especially wel l , with our

initial Norway data, according to the usual statistical

tests based on short-term prediction .

In terms of

statistical 1 ikelihood, the regression model in this

run did no better than the regression model of our

earlier run, without communications terms . (Gaps in likelihood l ess than one point, as discussed at the

base of Table Vl-11, were not recorded, due to the

impl ication of no significance in such differences in

apparent performance . ) In other words, the extra terms did not appear to add anything. The estimate of


c3 1 1

here, as in all the other runs carried out on this

model and its analogues, was too smal l for the computer output formats to cope with .


c 1 1 1 , however, was on the

order of 1 %. Again, the regression model was superior

to the ARMA model , with a gap of likel ihood scores of

Page Vl- 1 19

5 . 6 5 , ind icating odds of 2 8 0 to 1 favoring the

regression mode l . The ARMA communications model had a slight l y higher like l ihood - b y 1. 5 - than the simpl e

mode l (6. 2 8 ), indicating odds of 4. 5 to 1 in its favor;

however, these are not e xact l y overwhe l ming od ds. The R

of the ARMA mode l was . 994 8 , versus . 9949 for the

regression mode l , just as before.

In l ong-term prediction, however, from 1939 to

1967, the ARMA mode l d id increase its margin of

superiority; its R. M. S . average percentage errors were 1 3. 4 %, versus 14. 7 % for regression, whil e its absolute

errors were 27% versus 39%.

™ li.

The communications term,

poor l y estimated, c l ear l y added something ..tQ

l onger- te rm pred iction.

With a different estimation

approach, oriented towards predictive power instead of maximum like l ihood, the gain provided b y the

communication terms might have been considerab l y l arger.

Al so, with the original model, (6. 2 1 ), the

intermed iate provinces of Norway, instead of the

minimum- Nynorsk provinces, might have been sing l ed out more effective l y as likely areas of l arge - sca l e

assimilation; again, the pred ictive power of the mode l might have b een enhanced.

In the third run, as an a l ternative hypothesis, we

considered the possibi l ity of using real income as a

Page Vl- 120

variable to predict l anguage: A(t + l )


c1 A(t) + c�Y (t),

(6. 3 0 )

where Y (t) represents the real income in a region, and

where we have not written out the noise terms.


terms of likelihood theory, the performance of this model was exactly the same as that of (6. 29), as

described above. In l ong-term prediction, however, it did not do quite as wel l . The ARMA R. M. S.


percentage errors were 1 4. 1 %, and absol ute errors 2 8 %;

the regression percentage errors were 1 4 . 9%, absol ute errors 39%. These resul ts are cl oser to those of the

univariate models, (6. 27) and (6. 2 8 ), in qual i ty , t han

to the resul ts with (6. 29). In principle, however, the rel ative potential of the two model s in l ong-term

prediction wil l not be cl ear until a new type of

estimation system is avail abl e.

In the remaining runs on our original data, we

decided to expl ore communications indices other than that of simpl e migration. Two other measures of

communication - tel ephone cal l s and vol ume of mail -

were avail able; however, these were onl y avail abl e on a province-by-province basis.

We faced the problem of

how to reconstruct the matrix of communications from

each province ..tQ each other province. This probl em has

Page Vl- 12 1

often b een faced el sewhere in regional science (3 5 ) and

in sociol ogy, and resol ved b y way of a "gravity" model . The original grav i ty model, proposed by Stewart, woul d

estimate province-to-province tel ephone communications,

for exampl e, as follows : C•• C..,)

= C

T, T • £ J


o ·r••

(6. 3 1 )


where Tt and � are the total volumes of telephone

communication (or o ther communications variables, such

as migration) in each province, and where r4j is the distance b etween provinces.


modified version, studied

by Gal l e and Taueber (36), and discovered to have a

multipl e correlation of between 89% and 93% b etween prediction and real ity, is as fol l ows: C• .


(6. 32)

where k is an unknown exponent to be estimated. (Note that the correl ation here, with a cross-sec tional

model , is stronger in its impl ications than a 90% wou l d be in a predictive time-series model , insofar as a

cross- sectional study makes sense here . )


enough, whil e equation (6 . 32) has been successful in

empirical tests, the parameter "k" has varied a great

Pa ge V1-122

deal i n i ts est i mated valu e; Galle and Ta ueber report

that k was equal to . 62 for i nterurban m i grat i on i n the US i n 1935-1940, but only . 42 for the same data as

measured i n 19 5 5 -1960.

I n order to est i mate the l i kel i est value of "k"

for i nterreg i onal commun i cat i ons i n the case of Norway, i t was necessary to f i t the model (6. 32) to the only

reg i on-to-regi on commun i cat i ons data a va i lable - a ga i n,

the m i grat i on data . A d i rect f i t of ( 6 . 32) would have requ i red the use of nonl i near regress i on; however, equat i on (6. 32) can be transformed as fol lows : 1 og C · · - 1 og T • - 1 oab TJ• C.,) I,


a - k 1 og rC.J ·• ' (6. 3 3 )

where the ent i re left s i de of the equat i on forms the

dependent var i abl e, and where "a" and "k" can be est i mated by mul t i ple regress i on.

As long as we were carry i ng out such a regress i on,

however, i t seemed appropr i ate to test out a new

expl anat i on for the reduct i on i n "k" from . 62 to . 42 as

measured i n the Uni ted States by Gall e and Ta ueber. I t i s fundamental to the commun i ca t i ons theory of

nat i onal i sm, a s descr i bed i n sect i on ( i v), that there

has been a h i stor i c r i se i n the strength of

long-d i stance commun i ca t i ons, rela t i ve to

Pa ge Vl- 1 2 3

shorter-distance communications, a t least for the a vera ge man . Using equation (6. 32), we may compute the

ra tio of communication across a l ong distance, R 1 , to the communications across a shorter distance, R 0 ,

between regions of equal siz e (� the same for al l



(R1) k 1

For given distances, R 0 and R , the terms invol ving c0 1 a nd Ti , etc. , cancel out; thus � .Q.Il]_y wa y � ratio

� .&!1.t l a rger, for a given comparison of distances,


k g� smal l er.


(e. g. A smal l varia bl e to the

zeroth power wil l equal 1, which is the maximum this

ratio can a pproach under the stated conditions . ) Thus

it is critical to our communications theory that k

shoul d tend to decrease in time, as the resul t of some aspect of "moderniz ation"; the most obvious aspect of

"moderniza tion" to consider is the economic factor, the

increasing income of peo pl e rel ative to the cost of

communication .

Thus we decided to test the model :

k( t)

= c



c Y ( t) , l.

(6. 3 5 )

Page V1-124

where Y represents the real income of a region . Al so, we decided to consider the possibil ity that a term

representing social proximity (urban vs . rural

simil arity of regions) shoul d be accounted for .


in the final regression equation, we decided to test: l og M i,i


l og M .;; - l og

s� -

l og Sj

r, . + 4.)


u '. l•

is defined to equal one if both regions are

rural or both urban, but zero if they differ ; a measure of distance was obtained from the Worl d A tl as (37);

s '.

is defined as with (6 . 29) . Note that Y (t) - real income

in the subregion from which migration occurs - was .D.Q.t

a surrogate for time in this regression, since the

regression was based on a combination of two 36 X 3 6 matrices of total migration

and 196 8 ;

in the cl ose-by years 1967

the primary variation in real income was

between subregions . The covariance matrix produced is

Page V1-12 5

shown in Table Vl-2 5:

u C,J. .



.25 1


y l og r ' • &.l • l og r '

. 01 7

-. 0 12


. 072

l og M '. • 'J

y log r • .

. 017

1. 2 9 9 . 286

. 13 5

.• log r1-J

1 og M ,j

-. 0 12

. 072

.87 5

-. 54 9

. 286

-. 5 4 9

. 135

1. 062

Table Vl-2 5: Gravity Model Correl ations Inverting the three-by-three matrix in the upper l eft of this table, and multipl ying the inverse by the

vector formed by the three upper numbers of the

rightmost col umn, we can compute the standard

regression coefficients for (6. 3 6 ): C3 C 1,

= =

. 24


c , = . 7 0,

al l with the expected signs, and all clearl y very

significant for the large N we have considered and for the variances d isplayed in Tabl e Vl-2 5 .

Thus our

income hypothesis appears to have been val idated rather

strongl y. This same regression anal ysis was al so used

to construct an approximate measure of communications

for one of the other communications variabl es availabl e in Norway, te 1 ephone communications ; however, c 2. was

Page Vl-126

del eted from the regression equation, and c \ thereby

reduced to . 67, due to the computational difficul ty of

cal cul ating the ful l index, as based on different val ues of the income variabl e across time .

In the fourth run on Norwegian l anguage data,

equation (6 . 31) was used to reconstruct the matrix of

tel ephone communications, C i j , to repl ace M .: ; in equation (6 . 2 9 ) .

The years 19 3 9 to 19 57 were used as a

d ata-base . Once again, the regression model did better than the ARMA model in terms of l og l ikel ihood, with a gap of 3 points, impl ying odds of 2 0 to 1 in favor of

the regression model . The R"l. of the regression model

was . 9 941, versus . 9 9 4 0 for the ARMA model ; this was

substantial l y worse than our earl ier runs . On the other hand, this was substantial l y worse than our earl ier

univaria te runs, encompassing a subset of the

independent variabl es here; this signal s us that the

d ata in the period 19 3 9 to 19 5 7 average out to be more

d ifficul t to pred ict than the previous data-base, 1 9 3 9

to 1967 . Indeed, these years contain al l of the wartime "outl iers" mentioned above, in Finnmark . In l ight of

these d ifficul ties, the model did rel ativel y wel l in l ong-term prediction .

The R . M . S . average percentage

errors were 1 3 . 7 % and 14 . 1 % for the ARMA and regression

model s, respectivel y; the R . M . S . average absol ute

Page V l -127

errors were 3 5 % and 40 % for the two model s,

respectivel y.

Perhaps a l ater run without outl iers

woul d h ave given a much b etter picture.

In the fifth run on Norwegian l anguage data, we

studied the same model as in the fourt h run; however,

this time we used equation (6. 3 6 ), adapted to predict

tel ephone communications, to reconstruct a matrix of tel ephone communications. The R� and t he l ikel ihood

scores turned out to be the same as in the fourth run, except that the ARMA model gained very sl ightl y in

likel ihood - b y one point; the odds against this being

a coincidence are onl y 3 to 1, according to l ikel ihood theory - not a substantial confirmation .

of this run seemed sufficientl y bad, with

that simul ations were not carried out.

The resul ts


stil l l ow,

In the next run on Norwegian l anguage data, postal

data were used, with equation (6. 3 6 ), to construct an index of communications, to repl ace M ,i in equation

(6. 2 9 ). The regression model performed better than the

ARMA model , with a gap in l ikel ihood of 9, impl ying

odds of eq = 810 0 to 1 in favor of regression. The R� of the ARMA model was . 9958, versus . 995 9 for the

regresson model .

At first, these h igh val ues of R�

seemed rather encouraging.

However, the data period

used for this anal ysis was 1 9 4 5 to 1 9 67, due to the

Page V 1 -128

absence of postal data in three of the war years; this impl ied that the outl iers in Finnmark were avoided . In our seventh run, as a corrective, we re-eval uated the

simpl e univariate model , equations (6 . 27) and (6 . 2 8 ),

over the same time-period ; the results were the same as

with the postal model - implying that nothing was

gained by adding these communications terms - except

for an insignificant one-point decrease in the likelihood of the ARMA model.

After these seven runs were compl eted, a careful

review was carried out, first of the ARMA models, and

then of the communications models. Another run was

carried out on a diffe r ent set of data - on births,

deaths and marriages as a single set of variables . In

that case, the ARMA model outperformed the regression

model by 3 3 9 points, by the usual l ikelihood measures, impl ying an astronomical ly high probability of its

superiority. In conventional l anguage, this gap of 3 3 9

points impl ies that, 1 1 the ARMA model was confirmed with - 100

a p less than 10 �

an R

. " Concretely, the ARMA model had

of . 975 in predicting the marriage rate, versus

. 8 5 for regression; also, the variance of the errors in

predicting the death rate was reduced by 1 0 %.

Unfortunately, the computer refused to calculate a full

table of predictions for this case, because the table

Page Vl-12 9

was too long; however, the ARMA mode l did not seem

notab l y superior to the regression mode l in that

portion of the predictions which the computer did print O Ut .

In order to exp l ain the mediocre performance of

the communications models, we looked very closely,

province by province, at the direction and s hape of the

errors made by the ARMA communication models (from run

number 2 ) in long-term prediction.

There did not seem

to be a notable tendency for errors to be biased in one direction in any special group of provinces, except for

the "intermediate province" group which the

communications terms should have been able to

distinguish; however, there did seem to be a very strong tendency for the predictions to be

systematically low, everywhere. With constant terms, of

course, this wou l d not have been expected. Stil l , the

independent variables in equation (6. 2 9 ) were close enough to being able to represent constant trends,

upwards, that it seemed very strange that such a bias

would develop. A series of intuitive arguments

convinced us that the outliers in Finnmark, extending for a handful of years in both urban and rural

Finnmark, could add a degree of apparent randomness,

enough to bias the coefficients substantially . Given

Page Vl-1 30

that Nynorsk had never been used at a 1 1 in schoo 1 s in Finnmark, we fel t it woul d be best to change the data,

so that in tl1 years the varia bl e "A" (percentage assimilated) woul d equal 100% in Finnmark.

A fter these changes, two runs were carried out,

both highly successful.

The first run dupl icated our

original first run, based on equations (6. 2 7) a nd (6. 28).

This time, the R4 for the A RMA model was

. 9988, versus . 998 7 for the regression model.


model had a l ikel ihood score of 3 5 . 46 points higher

than the regression model , implying odds of 2. 5 mill ion

bil l ion to 1 aga inst its superiority being a coincidence.

Both model s, of course, were doing

astronomical l y better than any of the model s discussed above. In simul ation, however, the picture was a bit

mixed, though stil l improved on the whole. The R. M. S.

a verage percentage errors were 9. 9 % for the A RMA model

a nd 9. 4% for the regression model ; the absol ute errors

a veraged to 20 % a nd 2 7 % , respectivel y.

The second new run dupl icated the second old run,

in using equa tion (6.29) a s a model .

The superiority

of the A RMA model grew larger, when a more compl ete

substantive model was used , just as we ha d hoped

earl ier ; the gap in l ikel ihood grew to 4 2 . 5 1, impl ying 1

odds of 3 X 10 ' to 1 in f avor of the ARMA model . With

Page Vl-1 31

the regression models, the addition of communications

terms produced only a slight gain in likelihood - 2

points - but with the ARMA model, the gain in

likelihood was 9. 0 3 points, implying a very significant improvement •

(S ign if icant at the 1 eve 1 of

. 0 0011, " in conventional terminology . )




One may note

that the coef ficient of the main communications term was . 0 067 for the ARMA model, versus . 00 54 for the

regression model, both about right for a recurrent

feedback term. With a better substantive model, the

ARMA model improved much more in its long-term

predictive power, too, than the regression model did. The average percentage errors were 8 . 6% for the ARMA

model, versus 9 . 0 % for regression;

the absolute errors

averaged to 17% for the ARMA model, versus 25 % for

regression . Between the two measures, it is reasonable

to say that the ARMA model here, as elsewhere, displays on the order of 10-1 5 % less error in long-term

prediction than the regression model does . Also, as we pointed out at the beginning of this section, when

ou tliers are removed, the ARMA models are very much

superior to the regression models in terms of formal statistical likelikood. These statements remain true

despite the "ratchet" effects - similar to outliers, but persistent - which we mentioned earlier in this

Pa g e V 1 - 1 3 2

s ec t i on .

Page Vl-1 3 3


(1) Deutsch, Karl W. , Na t i onal i sm .9.ll.Q. Soc i al Commun i ca t i ons, MIT Press, Cambr i dge, ·Mass . , 19 66, rev i sed second ed i t i on . Chapter 6 conta i ns the ma i n argument l ead i ng up to the mathemat i cal model ; Append i x V conta i ns the mathemat i cal model , and a verbal descr i pt i on of i t. (2 )


(3) Hopk i ns, Raymond, "Project i ons of Popul at i on Change by Mob i l i zat i on and Ass i m i l at i on, " Behav i oral S c i ence .. 1972 , p. 2 5 4 . The programs were made ava i l abl e to us by Prof. Deutsch at the Harvard Department of Government .

(4) Deutsch, Karl W. , op . c i t . , Append i x V . Note that several vers i ons of th i s model have appeared i n pr i nt. The vers i on here, i n al l fa i rness, was actual l y taken d i rectl y from Hopk i ns, Raymond and Carol , "A D i fference E quat i on Model for Mob i l i zat i on and Ass i m i l at i on Processes", 1 9 6 9, unpubl i shed ; a copy of th i s paper was prov i ded to us by Prof. Deutsch, and descr i bed by h i m as conta i n i ng the f i nal rev i s i on of the model . Th i s rev i s i on appears, i n d i fference equat i on form, i n Hopk i ns, Raymond, 1 1 Project i ons of Popu 1 at i on Change by Mob i l i zat i on and Ass i m i l at i on", Behav i oral Sc i ence, 1 972, p. 2 5 4 . The reasons for the rev i s i ons to earl i er vers i ons are descr i bed i n Hopk i ns, Raymond, "Mathemat i cal Model l i ng of Mob i l i zat i on and Ass i m i l at i on Processes", i n Ma thema t i ca l Approaches .t.Q. Pol i t i cs, ed i ted by Hayward Al ker, Karl Deutsch and Ant i one Stoetzel , El sev i er Publ i sh i ng Co. , New York, 1973, p. 381. (5) S ee note 1.

(6) Hopk i ns, Raymond, "Mathemat i cal Model l i ng of Mob i l i zat i on and Ass i m i l at i on Processes", i n Ma thema t i cal Approaches .tQ Pol i t i cs .. ed i ted by Hayward Al ker, Karl Deutsch and Ant i one Stoetzel , El sev i er Publ i sh i ng Co. , New York, 1 973, espec i al l y p . 381.

(7) See note 3 . More prec i sel y, we used the Hopk i ns rout i nes d i rectl y, on sampl e cases suggested to us by Prof. Hopk i ns and on a few others.

Page Vl - 1 34

(8) I n Chapter (V), we noted that the correl at i on across n i nterval s of t i me, when both process no i se and measurement no i se m i ght be present, woul d equal � 2r" , where � i s the correl at i on i nvol ved w i th measurement no i se, and r i s the true correl at i on across t i me of the process underneath. When th i s f i gure changes very l i ttl e w i th i ncreases i n "n", but i s substant i al l y d i fferent from 1 for var i ous val ues of n, then i t woul d seem that r i s very cl ose to 1 and that � i s not.

(9) The approx i mat i on here i s that exp (l +a) i s approx i matel y equal to l +a , w i th onl y about . 5 % error when "a" i s about 10 % . More prec i sel y, i f we take the exponent i al funct i on of both s i des of the equat i ons (6 . 10), and subst i tute i n from (6. 12) , the approx i mat i on rul e c i ted here bri ngs us back to (6 . 9 ) •

(10) Str i ctl y speak i ng , there i s one maj or qual i f i cat i on one m i ght make to th i s statement . When mak i ng a pred i ct i on, one usual l y starts from a g i ven base year, and appl i es the d i fferent i al equat i ons to that year as an i n i t i al cond i t i on . Th i s woul d correspond to ad i ust i ng k 1 and ks here, to f i t a g i ven year exactl y. One can expect to do better , i f one somehow averages d i fferent base years to get an est i mate of the underl y i ng real i ty, and uses that est i mate to make pred i ct i ons from . Adm i ttedl y, part of the advantage i n our "ext l " (E XT RA P) extrapol at i on probabl y l i es i n do i ng just that . I f there were a cons i stent change i n the rate of growth of these var i abl es , through t i me, and i f one were us i ng extrapol at i on models to pred i ct the same per i od of t i me as the one they were f i tted to, th i s woul d l ead to an unfa i r advantage for the extrapol at i on model s; the extrapol at i on model s woul d be centered at the m i ddl e of the process , but the ord i nary model s woul d be centered at the i n i t i al extreme. However , every one of the extrapol at i on runs here was accompan i ed by a run test i ng the pred i ct i ve power of the hypothes i s of a t-squared term i n (6 . 1 0); these runs gave no support to the i dea that factors i nvol v i ng a s i mpl e second der i vat i ve coul d be respons i bl e for the advantages of extrapol at i on . The ab i l i ty of extrapol at i on to average out extreme val ues measured i n the same, earl y per i ods of t i me i s not "unfa i r, " i nsofar as i t refl ects an advantage ava i l abl e to those try i ng

Page V l - 1 3 5

to pred i ct the future from an extens i ve data-bank from the present and past.

(1 1 ) The computer pr i ntout here was l eft i n the custody of Prof. Karl Deutsch, Harvard Dept . of Government . The computer pr i ntouts from sect i on ( i i ) were al so l eft i n h i s custody, i n 1971 . These two groups of output are approx i matel y one foot th i ck, put together . For every run reported here, they i ncl ude pred i ct i ons and real i ty for every year, past and future, for wh i ch pred i ct i ons were made .

0 2 ) Deutsch, Karl W. , op. c i t . , Chapter 6.

{ 1 3 ) L i eberson, Stanl ey, Language .a..rui Ethn i c Rel at i ons in Canada, W i l ey, New York, 1970. See p. 47, 4 8 , 1 8 3- 1 87 for rel at i vel y smooth graphs emerg i ng from seatter-pl ots.

(14 ) i b i d . L i eberson focuses strongl y on the i ssues of l anguage "retent i on, " by those brought up i n one l anguage, as opposed to "demograph i c factors . " On p . 3 5 , he states that, 1 1 1 t i s far more correct to descr i be the Canad i an scene as an eq u i l i br i um based on counterbal anc i ng forces . " On p . 5 0 and p . 5 1 he emphas i zes, f i rst, that Engl i sh has been dom i nant i n terms of "retent i on" or ass i m i l at i on, but then, that the "revenge of the cradl e" has been central to French l anguage ma i ntenance. On p. 22 5 , he def i nes a var i abl e, "commun i cat i ons advantage, " qu i te s i m i l ar i n sp i r i t to the "l anguage pressu re i n commun i cat i ons" d i scussed here ; i n the subsequent verbal d i scuss i on, he i mpl i es that th i s var i abl e i s central to "retent i on" phenomena .

0 5 ) Deutsch, Karl W . , "Mathemat i cs of the Tower of Babel ", i n "Nat i on and Worl d", i n Contemporary Pol i t i cal Sc i ence: Toward Emp i r i cal Theory � McGraw- H i l l , 1967 .

(16 ) Str i ctl y speak i n g, one must al so try to expl a i n the or i g i ns and convergences of such d i al ects, i nstead of merel y the dec i s i on by i nd i v i dual s to jump from one d i al ect to another . E quat i on { 6 . 26 ) , wh i ch can deal w i th the i dea of d i al ects gett i ng cl oser or further away as a resul t of commun i cat i on, i s conceptual l y q u i te cl ose to the model here . Yet one i s st i l l faced w i th the

Page Vl-1 3 6

problem of explaining how divergence in dialect can come about. If the speech in each region were subject to random drift, or to systematic pressures based on interregional dif ferences in speech equipment, then a low level of interregional communications, in our model, would imply little damping of such drift. An increase in communications would imply a greater pressure for convergence, and a damping out of future drift. Note that such phenomena would also apply if there were a dramatic increase in communications between two regions with enough internal communications to resist assimilation as such; the growth of "franglais" is an interesting example.

(17) Schelling, Thomas C. , The S t ra tegy of Conflic t � Oxford U . Press, New York, 19 6 3. (Copyright 19 6 0. ) p. 1 04: "But where do the patterns (of potential compromise) come from? They are not very visibly provided by the mathematical structure of the game, particularly since we have purposely made each player ' s value system too uncertain to the other to make considerations of symmetry, equality, and so forth, of any great help . (i. e. of help in analyzing Schelling ' s parad igms for games of mixed conflict and common interest . ) Presumably, they find their patterns in such things as natural boundaries, familiar political groupings, the characteristics of states that might enter their value systems, gestalt psychology, and any cliches or traditions that they can work out for themselves in the process of play • • • " p. 1 51: 1 1 • • • the introduction of uninhibited speech ma y not greatly alter the character of the game, even though the particular outcome is different • • • " In short, tacit norms, before the introduction of explicit bargaining, are crucial to the existence of possible "patterns of convergence. " On p. 11 3-114, Schelling hammers home the point that mathematical "solutions" to nonzerosum games do not provide a realistic alternative to his own theory of tacit norms, discussed on p. 9 9 -1 11. In a sense, one might argue that the idea of "solving" for a unique or optimal static equilibrium may apply only to games similar to those originally discussed in such terms by Von Neumann and Morgenstern (note 3 9 of Chapter (V)); as in economics, there may be situations where d ynamic factors cannot be easily encompassed within such a static description. At any rate,

Page V1-1 3 7

Schelling ' s discussion, applied elsewhere by him to real political analysis, strikes us as fairly convincing.

(18) Deutsch, Karl W . , National ism and S ocial Communications, M IT Press , Cambridge , Mass. , 1966, r evised second edition, especially p. 96-9 7. Also, Deutsch et al , " Political Community and the North Atlantic A rea" , in In ternational Poli tical Communities, Anchor Books , New York , 1966 , p . 1 7 . In the latter reference in particular , the conce pt of "communit y " is defined in terms of an ongoing ability to communicate and res pond in decision-making processes of a continuous sort.

(19) S ee note 1 7 .

(20 ) See note 18 , particularly the second re ference.

(21) See note 16.

(22) Deutsch, Karl W. , Nationalism and Social Communications, M IT Press , Cambrid ge , Mass. , 1 966, revised second edition, p. 26.

(2 3 ) Feierabend , lvo K. and Rosalind L. , and Gurr, Ted R. , eds. , Anger, Violence and Politics: Theories fillQ Research, Prentice-Hall, Englewood Cliffs, N. J . , 19 7 2 .

(24 ) K ravitz, Sheldon, A Theore tical Model .E.Q.r. .t.h.e. Analysis and Comparison of Ideologies, Ph. D. dissertation, May 19 72. Available c/o Widener Library , Harvard U. , Cambrid ge , Mass.

(2 5 ) Minsky , Marvin and Selfrid ge , Oliver G. , " Learning in Random Nets " , in Information Theory , Fourth London S ymposium published by Butterworths, 88 Kingsway , London W. C. 2. , U. K. , p. 3 39. In discussing this formula, Minsky and Selfrid ge consider only the cases E = 1 or E = O per e pisode, but the generalization d oes not ap pear very diff icult . These authors, in turn , refer to Bush, R . R . and Mosteller, F. , S tochas tic Models Learning ., Wile y , New York, 19 5 5 , as a basic source.

(26) " Groupthink" as a small-group phenomenon has been widely d iscussed as a result of J anis, Irving , Victims of Groupthink, Houghton-Mifflin, New York, 19 7 3 .

Page V l - 1 3 8

( 2 7 ) H a u ge n , E i n a r , L a nguage C o n f l i c t a n d L a n gu age P l a n n i ng : � Ca se of Mode rn N o rweg i a n , Ha r v a r d P re s s , C a m b r i d ge , Ma s s . , 1 9 6 6 . Map on p . 2 2 9 .


( 2 8 ) T he p r i ma r y s o u rce f o r t h i s d a t a , a s w i t h t h e da t a r e po r t e d be l ow , wa s t h e N o rweg i a n O ff i c i a l S t a t i s t i c s s e r i e s , common l y a v a i I a b l e i n U . S . 1 i b r a r i e s . T he N o rwe g i an n ame i s "N o rge s O ff i s i e l l e S t a t i s t i k k 1 1 ; w he n au t h o r d e s i gn a t i o n s a r e r e q u i r e d , t h e " C e n t ra l B u r e a u o f S t a t i s t i c s " o r " Se n t r a l . • • 1 1 i s u s u a l l y a pp r o p r i a t e . S c hoo l d a t a f o r J a n . 19 7 0 m ay be f o u n d i n t h e 1 9 7 2 11 11A r bok ( Y e a rboo k ) , wh i c h a l s o a p pe a r s a s Re k ke - X I I , n um b e r 2 7 4 . Da ta o n t h e u s e o f l a n g u a ge s i n e l eme n t a r y e d u c a t i on w e r e u s e d , a s on p . 3 3 5 of t h a t copy of t he A r b o k . T he l a n g u age u s e d a t a fo r e a r l i e r ye a r s we r e t a ke n f rom t h e e a r l i e r A r b o ks , b a c k t o 1 9 3 9 . I n s om e y e a r s , w h e n t h e u r b an / r u r a l b r e a k d own wa s n o t a v a i I a b l e , we u s e d t h e S ko l e s t a t i s t i k i s s u e s o f t h e N . O . S . E ve r y n umbe r ( i s s ue ) i n t h e N . O . S . s e r i e s i n c l u de s a l i s t o f t h e n umb e r s a n d t o p i c s o f o t h e r r e c e n t i ss u e s ; a l s o , o n t he f ro n t o r b a c k cove r a r e l i s t ed t h e n u mbe r s of p r e v i o u s i s s u e s on t he s ame t o p i c . ( T h u s , i n t h e 1 9 7 2 A r b o k a re I i s t ed t h e Re k k e a n d n um be r o f a l l p r ev i o u s A r b o ks . ) L a n g u ag e u s e i n e l e me n t a r y s c hoo l s i s e s s e n t i a l l y a ma t t e r of l oc a l c ho i c e ; H a u ge n , op . c i t . , g i v e s a few d e t a i I s o f t h e p r oc e s s o f l a n g u a g e c h o i ce . We d e c i de d , i n ou r c omp u t e r r un s , t o r e c a l i b r a te t he t i me pe r i od s o f t he da t a ; t h u s , l a n g u age u s e i n fo r c e i n s c hoo l s i n J a n u a r y 1 9 5 0 wa s t a ke n to b e a n i n d e x o f ac t u a l l a n gu age u s e i n 1 9 4 9 , g i v e n t h e l a gs i n v o l v e d i n c h a n g i n g po l i c y i n t h e s c h oo l s . ( 29 ) Sou rce s : N . O . S . , op . c i t . , Re k ke X I I , No . 2 3 3 ; R e k ke A , N o . 2 4 4 ; Re k ke A , N o . 2 9 2 . O r i g i n a l s t a t i s t i cs we r e f u r t h e r b r o k e n down b y sex , b u t ag g r �ga t e d fo r t h i s s t u d y .

( 3 0 ) N . O . S . , o p . c i t . , Re k ke X I , N o . 2 9 8 f o r t h e mo s t r e c e n t d a t a , a n d p r ev i ou s i t ems i n t he s ame t o p i c s e r i e s . N o te t h a t R e k ke X I I , N o . 2 3 2 , wh i l e n o t con t a i n i n g t h e a p p ro p r i a t e b r e a k down s b y u r b a n a n d r u ra l , does p ro v i de d e f i n i t i o n s i n E n g l i s h . ( 3 1 ) IJ . O . S . , b a c k wa r d s f rom R e k ke X I I , N o . 1 9 8 . A g g r ega t e d a c c o r d i n g t o t h e u r b a n / r u r a l de f i n i t i on s o f t own s h i ps s pe l l e d o u t i n t h e S ko l es t a t i s t i k k s e r i es .

Page V l - 1 3 9

( 3 2 ) !J . O . S . , op . c i t . , b a c kwa r ds f rom R e k ke X I I , N o . 2 2 0 . Some a gg r e g a t i on r e q u i r ed i n e a r l i e r ye a r s ov e r s e x , e t c . I n 1 9 G 7 , R e k ke X I I , N o . 2 4 4 was used .

( 3 3 ) N . O . S . , op . c i t . F o r i n co me , R e k k e A , N o . 3 6 3 , u s e d f o r 1 9 6 8 o n b a c k . ( 1 1 Mun i c i pa I Tot a 1 I n co me , 1 1 f rom T a b l e I A . I n e a r l y ye a r s , some a g g r e g a t i o n req u i r e d ; howe v e r , t h e co l umn n a mes we r e u n i f o rm e n o u g h t h a t i t wa s n o t too d i f f i c u l t to rec on s t r u c t t h e s a me a gg r e g a t i o n s a s u s e d b y t h e N . O . S . au t ho r s i n l a t e r yea r s . ) I n Re k ke X I I , N o s . 2 4 5 a n d 2 5 2 , a n i n d e x o f con s ume r p r i c e s w a s f o u n d , fo r t h e e n t i r e pe r i o d . ( i . e . A n h i s t o r i c a l t a b l e w a s a v a i I a b l e . ) T he a ve r a g e s i z e o f t h e r a t i o , no rma l i z e d to u n i t s a p p ro p r i a t e fo r s t a t i s t i c s , wa s on t he o r d e r o f u n i t y .

( 3 4 ) T h e s e d a t a i n c l u d e d a t a on c r i m i n a l con v i c t i o n s e a s i l y a va i l a b l e I n t h i s pe r i o d , s t a r t i n g b a c k f r om t he A r b o k ; d a ta on he a d s of h o u s e h o l d s o n we l f a re , c on t i n u e d b a c k i n t h e S t a t i s t i c a l Mon t h l y ove r t h e e n t i r e pe r i o d ; d a ta on p o p u l a t i o n , n o t r,e ne r a l l y a v a i l a b l e i n r e c e n t ye a r s w i t h t h e d e s i r e d b re a k down , b u t i n t h e A rb o k w h e n av a i l a b l e . ( 3 5 ) l s a r d , Wa l te r , M e t h o d s o f Reg i on a l A n a l ys i s : A n I n t ro d uct i on t o Reg i on a l Sc i ence1 M I T P re s s , C a m b r i d g e , M a s s . , 1 9 6 6 , C h a p te r 1 1 . O n p . 5 0 0 , re fe r en ce i s m a d e to S t ewa r t a n d Z i p f , t h e t wo f a t h e r s o f t h e i d e a ; on p . 5 0 6 , a co n c e p t o f s oc i a l d i s t a n c e i s men t i o n e d , s l m i l a r i n s p i r i t to " U .,j " ; o n p . 5 0 7 - 5 1 0 , emp i r i c a l r e s u l t s a r e d i s c u s se d . See a l s o De u t s c h , K a r l W . a n d l s a r d , W a l te r , "Towa rd a Ge n e r a I i z e d C on ce p t o f D i s t a n c e " , B e h a v i o ra l S c i ence , N o v . 1 9 6 1 .

( 3 6 ) G a l l e , Owe n R . a n d T a u e be r , Ka r l E . , "Me t r opo l i t a n M i g ra t i on a n d I n t e r ven i n g Oppo r t u n i t i e s " , Ame r i can S oc i o l og i c a l Rev i ew, N o . 3 1 , F e b . 1 9 6 6 , t a b l e on p . 8 . N ote t h a t t h e s e a u t ho r s a r e e s s e n t i a l l y c r i t i c s o f t h e g r a v i t y mode l ; t h u s t h e i r r es u l t s a re p a r t i c u l a r l y i n t e r e s t i n g . F o r o t h e r wor k i n t h i s a re a , s e e n o te 3 5 .

( 3 7 ) Wo r l d A t l a s , Mo scow, 1 9 6 5 , p . 5 7 . A l a r ge m a p o f N o rwa y , w i t h ma j o r r o a d s i n d i ca t e d , w a s u s e d . D i s t a n c e was me a s u re d w i t h a c e n t i m e t e r r u l e r , fo r t h e mo s t d i r e c t maj o r ro u t e b y r o a d ; howeve r , i f

Page Vl- 140

this s hou l d exceed the abs o l ute distance by 40% or more, then the direct dis tance p l us 40% was used. For distances from an urban area, either there wa s one maj or city, or severa l which 'cou l d be averaged. For rura l di stances, it was as sumed that popu l ation den sity was even throughout each region ; averages were es timated on that bas is. A l l of the data here was punched on cards, and read Into the M IT Mul tics machine ; the punched cards and code s heets may be made avai t ab l e to future us ers through the office of Prof. Deut sch, if there Is interes t J n s o doing.

Page 1/- 1

CV) G E N ERA L A P PLICATIONS OF TH ES E IDEAS: PRACTIC A L HAZA RDS AND N E W P OS SI BI LITI E S ( i ) I NTROO IJCTI ON ANO S UMMA RY Fierce debates continue to rage between those w ho

would stud y beh avior "with mathematics" and those w ho would stu dy it "by traditional means. " T hese debates,

by drawing attention to the extremes, h ave o bscured many of the serious h azards and many of the most

important application s of mathematical approaches in government and in psycholog y. In ex tending the

mat hematical approaches further, we h ave a special

responsibility to discuss the new application s and the

continuing h azards w hich may result.

We will begin, in section ( i i ) , by presenting the

viewpoint of the practical decision- Maker, w ho h as not

used mathematical met hods so far, for good reasons. All of this chapter will be organized around the

difficulties w hich h e faces; oth er possible u sers of

our ideas - th e social scientist, the psychologist and

the ecologist - will be mentioned within more limited

contexts. In section Ci i i ) , we wi 1 1 su g gest a common

framework for evaluating verbal and mathematical tools,

both, based upon the co�Mon goal of prediction; within

Page V- 2

this framework, the mathematical methods h ave a role to

pla y, at least in principle, both for what they tell us directly and for what the y tell us a bout common a b uses

of verb al methods .

For those w ho pre fer to d e al in

concrete e x amples, rather than a bstract generaliz a tions a bout methodo 1ogy, 1t1e have I nc 1 u de d a number of

relevant e x amples, mostly in the footnotes.

Also, at

the end of this section, we will d e scribe in d etail how this framework has motivated the development of new

mathematical proce dures d e s cribed in the other chapters

of this thesis .

In section ( iv ) , we will go f rom principle to

practice ; we wi 1 1 d iscus s specific w a y s in which

statistical methods may be used, in close relation with

ver b al me thod s, and be of signif icant value in re al

prediction e f forts . This discussion will not be b ased upon the well- known philoso phy of logical positivism, but on the more recent philosophy of Ba yesian

utilitarianisri ( see section ( v ) of Chapter

( I I )),

philosophy which underlies the actual mathematical

developments we have d iscussed ;


at any rate, the

utilitari an approach helps keep us focuse d on the value

of our methods to seriou s policy -ma kers .

In this

section, we h ave also trie d to crystallize out our own

experience with the numerous w a y s in which statistical

Page V- 3

research can turn ou t to be usel ess and misl eading for the pol Icy-maker, if i t is done in a caval ier manner ;

our suggestions are no t a definitive answer to al l of t hese d ifficul ties, bu t at l east they may hel p.

Final l y, in section (v ) , we point ou t th at the

central rol e of h uman psychol ogy in pol itics seriousl y l imits the possibil ities of naive empiricism , bo th

verbal and math ema tical .

Statistical studies, l ike

verbal research of th e purel y empirical variety, may be unabl e to transcend these l inits. However, the

mat hem at i ca 1 id eas

d i s c u s se d

in C h a p ter

(I I ),

a 1 on g

with o th er offsh oo ts of the Bayesian a p proach , can be applied in a d ifferent way, to h el p overcome these

l imitation s in a way wh ich words al one canno t ;

il l ustrate this point, we wil l mention specific


possibil ities fo r using these ideas in the fu ture to

cope wi th and ex pl ain the p henomenon of intel l igence, whether in h uman socie ties o r in h uman brains .


Let u s star t ou t b y reconsidering o u r tacit

assump tion th at rnethematical methods do h ave some use, af te r al l , in pol itical science and in pol itical

Page V- 4

decis ion-making.

Many peop l e have ques tioned this idea

in the past few decades, and many more may have fe l t

strong private reservations about the idea. Whi l e we

c l earl y can be ex pected to reaf f irm the va l ue of

mathematica l methods, on the who l e, we a l so bel ieve

that the traditiona l com pl aints against mathematical

methods do contain rea l information which the user of such methods s hou l d not i gnore.

I n particul ar, l et u s try to ex pres s the

reservations about mathematical methods which might be

hel d by the active po l it ical actor. Practicing

dipl omats and pol iticians have often found that their

margin of succes s depends on their abi l ity to seize

upon uniq ue twists in the po l itica l or p sycho l ogical

environment, twists which a l l ow the individua l to

escape the seemingl y uncontro l l ab l e tide of events that

one woul d expect a mathematica l model to extrapol ate. Sometimes this invol ves the abil ity to establ ish

channel s of serious communications between dif f erent

pol itical group s, channe l s which can grow in importance

once they have been es tab l is hed.

Sometimes this

invol ves the abi l i t y to seiz e u pon an economic or

mil itary advantage .

Caes ar ' s G al l ic Wars are a c l a s s ic

exampl e of the l atter sort of imagination, evading

Lanchester 1 s Laws at every turn (l ) ; Lid de l l Hart, in

Page V- 5

his clas sic text on military strate gy (2 ), h as

emph asized that such imaginative approac hes h ave bee n

d e cisive i n wars thro u g hout h is tory . I n both cases, one

achieve s a greater "be nefit" within a given "cos t

con straint", not by being tig ht an d pre cis e about

bud getin g one ' s re sources, but rather by pre serving the detachment an d the free dom one will need i n order to

seiz e upon w hole new option s, whic h may ope n up a whole new fron tier of pos sibilitie s .

Political creativity in

this form is d ifficult e nough to e n compas s within an y s c holastic conte x t, let alone the context of

mathematical mod els ; therefore, politic al s cie nti s ts

who h ave a stron g attachment to this proce s s would

n aturally ten d to be s ke ptical of mathematical models .

More ge nerally, succes sful political actors, like mos t succes sful profe s sion als, would te nd to b elieve that

they stretch their min d s to the limit, f n order to

arrive at their pol icy d e cis io n s ;

they may con clude

t hat the s heer complexity of their own de cisio n - makin g

militate s again st the prediction of its outcome by

math ematic al s ys tems which acco u nt for f ar le s s information conte nt.

Furthermore, f t is al so likel y

that a l arge part of this information, even when

ac ce s sible to the pol itical s cie ntist, may be e n cod ed

in a verbal form whic h militate s against its bein g

Page V- 6

accounted for by mathematical mod els. Ciii) P R E D I C T I O N : A C OMMON G OA L F O R V E R R A L A rm M A T H EMAT I C A L S O C I A L S C I E N C E

T h e s e dif ficultie s can b e dealt with o n several

dif f erent levels . Le t u s be gin on the simples t lev el .

T h e dif f iculties above point to th e impos sibility

of constructing math ematical models w h ich will predic t

exactly what will h anpen in politics, in detail, in the s hor t- term and in th e me dium- term .

However, the s e

dif ficu l ties h ave also been enou gh to make i t

impos s ible for any h uman being, political actor or

oth erwis e, to pre dict ex actly what will h appen in all

of politics, in th e s hor t- term or medium- term. A

traditional political s cientis t or math ematician mig h t

deduce at this point that " true pre diction", in the

sens e of ex act pre diction, is impos s ible in political

science. T h er efore, in ord er to as s ure h im self that h e

is involved in s eriou s work, h e may res trict his

attention to pro posi tions w h ich meet an Aris totelian tes t of " tru th ", s uch as s tatements abou t h is torical

documents (3 ) or abs tract theorems w h ich he can prove by

rigorou s deduc tion.

Very few political actors,

h owever, feel that they WO LJ ld want to turn away from

Page V-7

the d i ff i cult but pr i nary q uest i on of pre d i ct i ng the

d i fferences i n outcome b etw e en the d i fferent act i ons

the y could take. These pre d i ct i ons may always i nclude

factors of uncerta i nty , but the pol i t i cal actors would

f i nd i t i nterest i ng enou �h to reduce th i s uncerta i nty

as much as poss i bl e , i n an y poss i bl e way. Thus, i nsofar as pol i t i cal sc i ent i sts are concerned w i th develop i ng

object i ve i ns i ghts of the max i mum poss i bl e value to

the i r consumers, the pol i t i cal dec i s i on-makers at all

levels, the i r

ul t i mate concern would be w i th the

de vel opment of effect i ve probab i l i st i c theor i es to

pred i ct pol i t i cal and soc i al systems.

There are f i ve po i nts worth not i c i ng about our

emphas i s on pred i ct i on here. F i rst of all, th i s

emphas i s does not restr i ct i tsel f to the overtly

mathemat i cal phases of pol i t i cal sc i ence.


dev el opment of "pre d i ct i ve model s" - math er,at i cal QL

verbal , or even anal ogue for that matter - i s a general

concept, wh i ch can be used to gu i de h i stor i cal research

as eas i ly as f t gu i des stat i st i cs . Many of the "grand t heor i es" of pol i t i cal h i story, i nclud i ng especi ally

the theor i es of Spengl er (4 ), Toynbee (S ) , Turner (6), H e gel (7 ) , and Marx , were desi gned to hel p people

"understand" h i story i n terr,s of a verbal dynam i c model

wh i ch coul d al so be used to pred i ct the future.

Page V- 8

T raditio n al ap p roaches to r esearch in political

science, h0\-1eve r , mi ,r,ht te n d to


de ve 1o p" such theo r ies

by addin g com ple x strings of qualificat f on s, an d b y

fo r ci n g an elabo r ate , pe r f ect A ristotelian fit o f the

we ake r , more specialize d p ro oositio n s which eme r ge . Ou r own app roach would ask that political scie ntists

contin ually retur n to the main questio n , to the abi l ity of their theo ries, with the dis claime rs remove d, to

p re dict the b ro ad fi r s t-o rde r tre n ds r n the maj o r , most

o b vious variables of political h isto r y.

Seco n d, our emphasis o n p redictio n can be

j ustified on dee pe r ,r, ro tJ nds than those of satisfyin g

those who p ay fo r the bulk o f the po l itical research .

Followi n g the philoso phy of utilitarianism , o ne might

simpl y regard political science itself as o ne

particular phase of po litica l activity; one might even

suggest that its maj o r justif icatio n fo r existe nce , in the lo n g te rm, is its ability to co n tribute to

co nstr uctive political


to the p rimar y nee d of the

This takes us b ack

de cisio n-make r to p r e dict

the results of his actio n s , at least o n a p ro b abilistic

b asis.

On the othe r han d , even if o ne we re willin g to

accept the ethical p ri n ciple that truth should be

pursued fo r its own sake, as an ultimate goal equal to

o r highe r than the go al that o f human welfare , o ne

Page V-9

still f aces the p robl em of d e fining what this "ultimate

truth" would consist of .

One might s ug � e s t that

ultimate tr uth, if it does exist in political science, l ie s not in the changeable f acts of cu r r ent

happens tance, b ut r athe r i n the l e s s changeable d ynamic law s which l ead f rom one s et of circumstance s to

anothe r ; in phy sics, f o r exampl e, the d ynamic f ie ld

e quations are consid e r ed the highe s t scientific t r u th,

whil e the codif ication of the wave-f u n ction of the unive r s e is not an o b j ect of s e riou s s tud y.

Admittedly, our k nowledge of the d ynamic law s, unlik e

the laws them s el ve s, is l ikel y to b e changeab l e for a long time, in pol itical science as in phy sics ;

howeve r , it woul d b e meanin�l e s s to speak about the advancement of knowl edge as a wo rthwhile goal, we r e

the r e not such a pos sibility f o r change and expansion in the state of knowledge .

The r e ar e thos e who woul d

que s tion, in var ying d e g r e e s , the p rimacy of the mo s t

ab stract d ynamical e q u ations e v en i n a field l ik e

phy sics ; howeve r, even the "phenome nological approach", in that field, involve s the construction of powe r ful, gene r aliz ed p r edictiv e s tatements, s tatements about

what to expect af te r s etting up expe riMents of

dif f e r ent type s (8 ) .

Many times in political s cience, the concepts of

P a ge V- 1 0

"c ausa tion" and "ex planation" h ave been cited a s fo rms

o f truth worth pursuing (9 ) . The sta tement th at

"A caused R" r1a y be translated roughly into the

sta tement th at, "A occurred before B, and th e dynamics

of the system were such th at B would not h ave occurred if A h ad not occu r r ed w h en it did, ceteris paribus. "

Once a gain, the critical q uestion to answer is th at o f

th e dynamic laws w h ich govern political systems.

Beyond the goals o f so cial utility and "ultimate

truth", the pol itical scientist might also pursue the

go al s of cultura l enrichment and entertainment. T hese

goal s are o f ten cited as a jus tification for extreme

traditional ism in pol itical science. Whether th ese

goal s are now being pursued effectively by all o f those

who cite them is a dif f icul t m atter to judge, wel l

beyond the ran�e o f the present discussion . However, a

l arge part of the "cultural enrichment" involved woul d appear to involve the l earning o f l essons abo u t h uman

psychol o gy, about w h at patterns o f thoug h t and beh avior one migh t predict on the part o f human beings or h uman groups in unusu al circumstances, in other cultures.

T h ird o f al l, one should note th at our emph asis on

prediction as the ul timate goal o f political science

does no t imply th a t wor k o f a more descriptive na ture

Page V-1 1

should simply be abandone d .

Using s tatis tical method s ,

for example, one must f ir s t collect a s et o f d ata,

b e fo re one can f it or te st a mod el . After one has fit a verbal or statistical mo d el to a given s et o f

fir s t-order d ata, one can then g o b ack to the original source s of inf o rmation to s pecif y the s trengths and

weakne s s e s of one ' s mo del J n more d etail, with greater

accuracy ; even if one cannot modif y one ' s mod el e asily

to handle the exce ptions, one can try to expre s s the information embod ie d in the exception s in a more

compact, more ab stract f orm, to mak e lif e easier for

tho s e who wis h to make pre dictions o r to modif y the

current mo d els in the f u tu r e .

In brie f , w e are s u � g e s ting that d escriptiv e

re s earch b e viewe d as a means to an end, with the end being pre diction . A direct and total as s ault on the

objective of pre diction may ind e e d b e a poor s trate g y

for achie ving this end . Howe v e r , o u r s ucces s i s likely

to be even le s s, if we do not keep the basic objectiv e fixed firmly in our mind s .

Every once in a while, it

is im portant to bring to�ether the vario u s

pro positions, mathematical and verb al, which one b elieve s to be u s e f u l in pre diction, and s e e how

e f f ective (and consis tent ) the y really are in co ping

with the overall picture. When there are major new

Pag e V - 1 2

defects o r pos sibilit ies appa r ent at the gene r al level, it is import ant to ta ke note of then, at that level, so

that they can be u sed a s a g uide fo r mo re specializ ed resea rch in different b r anches of po l itical s cience. Descriptive wo r k may not only help provide the

ba sis fo r evaluating d ynamic models of politics ; it ma y also help those who wish to p redict polit ics, b y

telling them w h at the cu r r ent s tates of the s y stems a re to which they would lik e to a p ply the d ynamic law s.

The longer the policy horizon, howeve r, the mo re

impo rtant it is to u se mo r e gene ral d ynamic models,

instead of ass uming some sort of simple ex tension of

p resent trend s and condi t ions a s g a u ged b y descriptive stu dies.

Finally, while p rediction may be advocated a s

the p r im a r y goal of o b j ective political science,

normative political science r emains another m atte r .

One may note, in this connection, that the attempt

to max imiz e accu r acy in description, b y itself, lea d s nat u r all y to a numbe r o f uncoo rdinated, special i zed

effo r ts, focu sed in depth on d ifferent p r im a r y so u rces of info rm a tion ( 1 0 ). In p red ic tin g comp l ex d y n amic

s ystems, in contr a st, one finds oneself le d to focu s

fir s t of all on the inter actions between the p rima r y s ub s y s tems, at a n agg regate level. Thu s In or der to

ma ke "inter dis ciplina r y r esea rch" a r eality in the

Pa�e V- 1 3

social sciences, it is es sential that the ideal o f

predictive power gain a t lea s t a s much credence a s the

ideal o f descriptive f ines se, In the detailed conduct

of actu al s tu dies .

Fo urth o f al l, one should n ote that o ur emphasis

on prediction does not req uire a reduction in the rigor of tho u ght, even if it do es require that we go beyond

the Ari s totel ian conce pts o f truth vers us f a lseho o d a s

a scertained b y traditional u ses o f dedu ctive lo gic . The

mathematical theory o f pro b a bil ity , and the B ayesia n theory o f

inductive lo gic, have long provided a

rigoro u s ba sis for handling model s which do predict the f u ture bu t which a vo i d the determinis t ' s pretense o f absolute certainty .

Aris totelian s tatements , which

tell u s that a pro po sition is s im ply true (probability

one ) or f alse (pro b a bility z ero ) are simpl y a s ubset o f the statements which can be ex pres sed in rigoro u s

probabilis tic f a shion. In either case, the s tatements

that we ma ke may well be Inaccurate, If they are fo unded on f a ulty information; the l ang u a ge o f

probability, however, a t l eas t l ets u s expres s

precisel y how

much confidence we do have in a

pro po sition, ins tea d o f forcing u s to s ay nothing or to

e xclude totally a real b ut les s pro bable contingency.

P a ge V-14

When socia l trends wil l d epend on long-l asting but uncertain and novel phenomena, the language of

prob ab il ity encourages u s to escape the fal l acy that

definite optimistic or pessimis tic predictions are

s omehow more infornative than the truth . (See, for

e x am pl e, footnote (6 ) . )

To the statis tician, all these concepts have l ong

b een ob viou s .

In verb al rese arch, however, the

cl assical Aristotel ian procedures have remained dor.1inant .

Onl y in recent years has the "B a yesian

school " begun to e d ucate ver b al de cision-ma kers in the

use of pro b a bil ity theory as a general ize d l anguage of

thought ( l l ) . In section (v ) of Chapter (11 ) , we have

emphasiz ed the point that the B a yes i an approach to inductive re asoning can be a ppl ie d to ind uctive

rea soning as a whol e, not merel y to re asoning a bout quantitative variabl es.

When, in verb al research, one

finds onesel f deal ing with the behavior of quantitative

variabl es, such as the degree of po p ular d iscontent,

etc . , one m a y even go so far as to d iscuss the "de gree of fit 1 1 , the "de grees of free dor, 11 and the "exogenous

variabl e s" of one ' s verb al model , on the understanding

that one is e xpressing one ' s model in verb al terms onl y beca u se of the l ac k of hard d ata; even then, one m a y

want to draw together the el ements of one ' s model , and

Page V- 1 5

e xpre s s them in incre a s ingly mathematical term s , even if the parame ters canno t be e as ily meas ure d, in order to make i t s meaning mor e and more e xplicit , and in

order to impro ve i t s "coh erence", i. e. I t s comp 1 e tene s s and i t s consis t ency.

Finally, and mo s t impor t an tly,


empha s is .Q!l

Qredic tion � been the driving force behind bo th


empirical rese arch, and Q!!L conclus ion tha t

conventional ro u tines f or time-series analysis � inadequate.

The empirical wor k on poli tical s cience

in this the sis was mo t ivat e d almo s t entirely b y the

at temp t to conver t the Deu t s ch-Solow equations ,

mentione d in Chap ter


I), int o a u s e f u l tool for the

pre dic tion of national as s imilat ion and polit ical

mo biliz a tion. We s tar t e d o u t years a go b y t e s t ing o u t the Hop kins ro u tine s ( 1 2 ), which try to e s timate the

coe ff icien t s o f the Deu t s ch-Solow mo del f rom only three da t a poin t s , on the as s um p tion that t h e Deut s ch

e q uations are to tally "true" in the Aris t o telian s ens e ; it came as no s ur pris e to u s that the re s ul t ing

predic tions we re rather poor.

Our ne x t s te p was to try o u t time-s erie s mul tiple

regre s sion, the mains t ay o f


econorie trics 1 1 ( 1 3 ), of

"path analysis 1 1 ( 1 4 ), and o f "cau s al analysis 1 1 ( 1 5 ) and

Page V-1 6

so on .

It was eas y enough to meas ure as similation and

mobiliz ation coefficients w h ich were significantly different from z ero, and multiple correl ation

co efficients larger th an ninety percent; these res ults were wel 1 with in the ran�e of w h at are regarded as

"succes sful conclusion s " in mo s t quantitative pol itical

research tod ay ( l 6 ) .

However, our emp h asis on

prediction led us to loo k a bit more clo sely at these

resul ts;

we wrote a new pro gran, S E RI E S , to estimate

the regres sion nodels, and then to test their ahill ty

to predict d ata acro s s long intervals of tine,

interval s comparable to tho se tested w i th the Ho pkins pro gram . T he error s , w h ile l es s than tho s e of the

Ho pkins programs, we re s tl l 1 unacceptably lar �e . A

sim ple curve-fitting procedure, b y contras t, was able

to make predictions with les s th an h al f as much erro r,

averaging to about 4 % error over perio d s of time on the

ord er of a century ;

this average encompas ses a number

of cases w herein the mo del was fitted to d ata in one perio d of time, and used to predict d ata in later period s .

\falter lsard, in h is cl as s ical s tud y of

methodol o g y I n re�ional s cience ( 1 7 ) , h as made s trong

statements against the ability of regres sion mo dels to

predict th e futur e ; while h is argument is p hrased in

Pag e V-17

t h eore tical t erm s , t h e wide cov erage of his s tudies

v✓ould impl y an empirical b asis for h is con clusion s . T h e Broo kin g s I n s ti t u t e h as al so repor t e d major

difficul tie s in t h e u s e of regres sio n i n forecas tin g ;

t h e y found t h at t h e s e difficul tie s cou l d b e reduce d b y t h e u s e of an "ad jus t111 e n t" factor not too diff ere n t in

spiri t f rom our simple curv e-fi t tin g procedure (l R ) .

Thus t h e empirical basis of t h e s e con clusio n s goe s well beyo n d our ow n e x am ple s.

In our recen t p h as e of empirical poli tical

re s e arch, we b egan wi t h t h e hope t h at t his weakn e s s of

re g r e s sion, in e s t f matin � pre dictive models, could b e un ders tood wi t h in t h e clas sical an d elegan t framework

of nax imum likelihood t h eor y, as d e s crib e d in s e c t ion (v) of C h ap t er

( 1 1 ).

In s tead of que s tio n in g t h e

clas sical procedures of s ta t is tic s, w e hope d to apply t h e s e proce dure s to more sophis t icat e d models.


Chap t e r (III), we h av e not ed t h at "wh i t e nois e" in t h e

proce s s of measuring d at a can tu r n an ordinary

"au to r egre s siv e proce s s " in to a "mix e d au t o r e gre s sive

moving-average proce s s. "

Accor ding to s t atis tical

t h eory, mul t iple regre s s ion Is a good way to e s t imat e

t h e former proce s s, b u t a b ad way to s tud y t h e lat t er . A simple diagram can s how how bad t his problem

migh t b e come, in practice . Giv e n a sin gle variab le, z,

Page V- 1 8

which has a true correlation of ¢ with itself


time (i . e. z (t) with z (t+ I )), a n d a correlation of r

with the me asu r emen ts, x, that are m a de for z, we find

that the correlatio n be twee n x (t) a n d x (t+ l ) l s due to a n in dire ct path of correlation s:

F igure V- 1: Pathways of Correlation Wit h Noisy Data If we m a ke the simplif yin g a s sumntion (a big one) that

the process is not a n y worse, that there is n o

cor relation between x (t) a n d x (t+ l) in depen dent of this pathway, the n classic al theo r y tells us that the

correlation between x { t) a n d x (t+ l ) will equal r times

w t i nes r, A.


nt , If regressio n were use d to predict . e . rl.'f-J

x (t+ l) from x (t) (the obseryed d ata), the regression coe f ficie n t wo uld e q u a l the simp l e correlation

coe f f icie n t, r�¢, inste a d of the n umber ¢ ; yet whe n predictions are made over lon ger i n terv als of time,

the n ¢, the coe f f icie nt of the u n derlying process I n

the real world, is the proper basis for pred iction (1 9 ) . This e x ample wou l d also a ppear to point to the idea

that sim ple "path coe f ficients" which are not e f f ective

in prediction are not like l y to re present the true

Page \/-1 9

underlying relations, either.

If r - the correla tion between the true v ariable

and the measurenents o f the varia ble - were a bout 9 5 % ,

then the o b serve d regre s sion coe f f icient (r"l. ¢ =

( . 9 5 ) ( . 9 5 )-) would be a bou t 1 0 � snaller in size than

the right re gression coe f f icient (¢) for use in

long-range pre dictio n . Furthernore , since this 1 0 %

error wo uld re present a general shif t In the value o f a coeff icient, one wo uld ex pect that the use o f the

regression model wo tJld le a d to errors which accumulate at the rate o f ten percent pe r time perio d;

it is easy

to see how this phenomenon alone co uld v itiate the predictive power o f regression.

In the case o f a

single v aria b le , this 1 0 % error a p plies to a single

large correlation coe f ficient; therefore , one can ho pe

that re gression will at least preserve the sign o f this co e ff ic Ien t intact . In the case o f many v ari ab les,

howe ver, the 1 0 % error wo uld a p ply to a cor relation

matrix ; small b u t critical cross-terms, on the order o f

�5 % , might conceiv a bly have their signs reverse d , d ue

to the spurio us e f fects relate d to other, larger terms in the s ame matrix .

A fter all , the importance (and

much o f the detectability ) o f such "fee d b ack terms"

lies precisely in their a bility to accumulate and determine the long-term behavior of the system;

it is

Page V- 2 0

precisely the lon g-term behavior which we find poorly

accounted for h y regre s sion .

In order to accou nt for s uch e f fects , within a

clas sical statis tical f ramework , we have devised a new

algorithm to e s tiMate mixed autoregre s sive

movin g - average proce s se s ("A RMA" proce s se s ) at a

manage able cost. We have applie d this new algorithm to the old Deutsch - Kravitz data on as siMilation and

mobiliz ation in a dozen or so nations , and we have also applied it to new d ata on ling uis tic as similation in IJorway . In both case s , s tatis tical theo r y indicated

that the A RMA model was better than the old regres sion

model ; it indicated onl y a sMall prob abilit y , f ar le s s than 1 % in almo st every run, that the improvement was due to coincidence .

However, when lcl.e. went


to a pply

� .t,g_il of predictio n , we were q uite dis appointe d. The A R MA model d id indee d re d uce pre dict ion errors , in

comparison with regre s sion, by about 1 0 % of the

original root- mean- s q u are average of the errors , in the

case of o ur large st data s ample ; yet this is still far le s s than the 5 0 % re d uction achieve d earlier with


The succe s s of extrapolation would appear to

indicate that the underlyin� proce s ses are still more deterninistic than either the re .r;re s sion or the A RMA

Page V - 2 1

mo d e l s w e r e a b l e to d i s co v e r . A p p a r e n t l y, t h e

me a s u r eme n t no i s e a n d t r a n s i e n t f l u c t u a t i o n s w e r e too comp l i ca t e d f o r a "wh i t e n o i s e " mo d e l t o cope w i t h .


r e t ro s pec t, i t s ee ns c l e a r t h a t on e s h o u l d h ave

expected preci s e l y s uch a s i tuat i o n , i n the case of

b o t h t h i s a n d mo s t o t h e r d a t a i n po l i t i c a l s c i e n c e . T h e

so l u t i o n to s u c h d i f f i c u l t i e s, i n t h e c l a s s i c a l

p h i l o so p h y o f ma x i mum l i k e l i h oo d, i s f o r u s to po s e

eve r mo r e conp l i c a t e d h i g h e r - o r rl e r mo d e l s o f p r o ce s s no i s e a n d me a s u r eme n t no i s e . (W i t h some o t h e r

d a t a - s e r i e s, h owe v e r, t h e A R MA mo d e l , o r eve n t h e u s u a l r e g r e s s i o n mod e l , m i gh t b e a d e q u a t e . )

How ev e r, t h e

mu l t i v a r i a t e A R MA mo d e l a l r e a d y con t a i n s a l a r ge e n o u g h n umb e r o f d e g r e e s o f f r e e dom ; to d o u b l e o r t r i p l e t h e n um b e r o f coe ff i c i e n t s t o e s t i ma t e wo u l d p u t a h e avy b u r d e n o n a l l b u t a f ew ve r y l a r g e d a t a s e t s, wh i l e

s t i l l compe n s a t i n g fo r o n l )t mo de r a t e con p l i c a t i o n i n t h e no i s e p r o ce s s .

T h e s ucc e s s o f s i mp l e ex t r a po l a t i o n po i n t s to



pr a c t i ca l app ro ach .tQ. p r ed i c t i o n . I t po i n t s to t h e

p o s s i b i l i t y o f " ro b u s t " es t i ma t i o n, o f e s t i na t l o n t e c h n i q u e s w h i c h ca n p e r f o r m w e l l d e s p i t e a n y

ove r s i n p l i f i c a t i o n s i n o n e ' s o r i R i n a l mo rl e 1 (2 0) ;

good pe r f o r m a n c e, i n t h i s co n t e x t, me an s t h a t t h e


co e f f i c i e n t s o f t h e mo d e l a r e e s t i ma te d i n s u c h a w a y

Pa � e V-22

that the mo del wi 1 1 have maximum pre dictive power .


s ection (vii ) of Chapter ( I I ) , we have suggested that

simple e x trapolation Is "ro bu st" - more ro bust than

A R MA estimation, even a priori - beca use it is base d on a sinple "ne asurement noi s e only mo del, " a s tatistica l

model huilt to e xtra ct the deterministic underlying

trends (if they e x is t) from a process a ff licte d by a

comple x pattern o f trans i ent noise and me asurement

error ; we have pointe d out that the general dynamic f eedback pro ce dure o f Cha p t er ( I I) can be u sed to

estimate mor e general mo d els o f this t ype ,

econo mically . ( It is also possible to m a k e some

allowance for process nois e in an a d hoc w a y (2 1 ) , but

the best way to ma ke such allow ance while preserving

ro bustne ss is uncle a r ; there Mi�ht be no general

theoretical answer to this q u e s tion . ) We have also

discussed another new techniq u e in Cha pter ( ! I),

"pattern ana lysis", to draw o ut more direct

me asurements of the unde r lying dynamic variables.

In brief : Ql1L emphasis on prediction has led .Y.1i to

� conclusion .that. sta tistical metho ds based



concent of maximum lik elihood alone are inadequ a te 1n. pract i cal emnirical research . .Lt. � led � .t.Q. the

theore tica l conclusion .that. predictiv e power its elf

needs to be maximize d more e xplicitly 1n. the model

Page V- 2 3

P.Stima tion techniques avail a bl e to beha vior al

scientis ts. In Chapter (II ) , we have su ggested wa ys to

build new s ys tems on this principle. Thes e s ys tems are

schedul ed to be av ail a b le as part of the Time-Series

Processor pac kage, designed for social s cientis ts, in

1 9 74 at Proj ect C ar,bri rl .�e, M. I. T.



Now l et us come back a n d l oo k more closely at the

questions we started from in sec tion (I i );

let us

reconsider the worries of the political actor about the

use of mathematical approaches.

We have deal t with

these worries so f a r o n a very basic l evel, o n the

level of defending the concept of predictive theories in the social sciences .

We have emphasized the point

that the explicit s tatistical techniq ues we have

proposed are merely one tool among many in cons tructing

such theories. We have implied that the choice between

these techniq ues for con structin � theories, a n d the

verbal a n d B a yesian techniq ues, s hould be decided o n a

case- b y-case a n d even s tudy-b y - s tudy b asis, based o n

Page V-24

the ideal of pre dictive power, rather than d e cided by

any apriori fiat in f avor of one approach or the o t her;

we have implie d that statistic al analyses should merge into other analyse s, mathematical and nonmathematical, in s u ch a way that political science is divided up

ac cordinR to interf aces b etween substantiv e issu es

rather than interfaces hetween methodolo gical schools

of thou�h t.

All of these comments, while controversial within

the domain of political science, would se em r ather

bland and basic to many r e al political actors. Most

political actors would b e q uite willing to try any methodology that


\-,mr ks", at any time, withou t getting

too committed to one methodolo gy or another. What

worries them is a question on another lev el: can we

expec t statistical methods to "work" very of ten, In

practice ?

In principle, this question can only b e answered

af ter the f act, in e ach c ase . Howev er, there are a

num b er of reasonable guesses one might mak e, b ase d on past experience, as to the most lik ely areas of

fruitf ul statistical r esear ch in the f uture, in

political science .

First of all, w e may e x p e ct to b e surprised in the

f u tur e, by statistical methods having a large r range of

Pap;e V-2 5

app l ication than one might ex pect apriori.

Given that

our pre sent emotiona l ex pec tations are b ti i l t u pon

ver b a l techniq ues which have been thorough l y u sed and

thorou gh l y dev e l oped, whi l e our statis tica l techniq ues

are on l y now b eing perf ected and have bare l y begun to

b e u sed to cons truct predictive mode l s , we may expect

that the deve l opment of s tatis tics wi l l demonstrate a

much greater va l ue than one ' s intuition wou l d indicate today.

Second, we may expect statis tics to provide the

basic "real ity testing" for o perationa l po l itical

theorie s of the q uantitative type or the verb al

anal ytic type. A sim p l e re�res sion ana l ysis may make a poor te st of a verbal "hypothes is ", if interpreted

naive l y. However, a f u l l s tatis tical ana l y sis of a

given set of v ariabl es wi l l give a m uch l ar ger q uantity

of inforMation, information which the ana l y st shou l d

not g l o s s over, eithe r in his wo r k or in his written repor ts ;

if , in f act, it is dif f icu l t to make a

connection b etween one ' s ver h a l theorie s and the

aggregate, s tatistica l behav io r of the variab l e s these

theories pretend to exp l ain, then one has much to l earn in trying to exp l ain the d if ficu l ty .

Oftentimes , an

"obviou s " verbal theory wi 1 1 turn out, thou �h true in the ab stract, to req uire maj or q u a l ification in terms

Page V-26

of what it tel l s us to ex pect to find in concrete d ata.

When governme nt pol icy is concerned with the concrete

resul ts themsel ves, these qual ifications m a y turn out

to be of cen tr al inporta nce.

The cl assical theor y of f ul l -empl oymen t

equil ibrium, f or e x anpl e, depended o n the ide a that the su ppl y of savings woul d increase with hig her interest

rates, just as the sup pl y of a n y other commodity

incre ases whe n a higher price is of fered ( 2 2 ) . Empirical

studies h a ve not ref uted this Ide a;

however, the y have

shown that the predictive power of interest r ates in

predictin g s avin gs is extrenel y smal l , whil e the ef fect

of income variabl es, cited b y Key nes, has turned out to be very l ar ge ( 2 3 ).

Ke y nes himsel f w a s a bl e to observe

these e f fects b y a n al ytic method s al o ne, but major

governme n ts were ver y sl ow to chan ge their establ ished

viewpoints de spite his argume nts ( 24 ); the statistical

studies, b y con frontin g peopl e d irectl y with the trends

which had persisted up to the curre nt d a y, may have

been a crucial form o f "real ity testing" on this issue. Third, one ma y hope that statistics wil l hel p

il lumin ate the sl ow and stub born trends which underly

so cial phe nomen a. It h as of ten bee n suggested that the

most visibl e varia bl es i n pol i t ics - the turmoil , the

yearl y ups a n d down s in economies, al l ia nces and w ars -

Page V-27

a re a ll s u r> e r f i c i al r i p p l es r i d i n g on a dee pe r cu r re nt • Econom i s ts have often d i scuss e d a "technolog i cal"

i ncreas e i n product i on ca pac i ty per cap i ta, an i ncrease

wh i ch cont i nues dur i ng rece s s i on and boom at v i rtu ally

th e same rate, an Incre ase wh i ch i s not f i x e d but wh i ch

var i e s slowl y accord i ng to uncer t a i n cau ses over long per i ods of t i me . Almost any d i scuss i on of the

"product i on f unct i on" i nclu des r e f erence to th i s

"au tonomous" or "technolog i cal" term (2 5 ).

Soc i olog i sts

l i k e Max Weber (2 G ), ph i loso r,h ers l i k e H e gel (2 7 ) or

Marx, and h i stor i ans l i k e S p engler (2 8 ), Toynbee (2 9) and

even MctJ e i l ( 3 0 ) and E i sensta rlt (3 1 ) hav e all d i scu ssed s u ch trenrls.

E v en i n our s i mulat i on stu d i es i n Chapter

( IV) , we found that errors i n short-tern pred i ct i ons

were r e duced f ar less by soph i st i cate d anal y s i s than

were errors i n med i um-term and long-term pre d i ct i on .

Pol i t i cal actors, i n the short-term, are com pelled

to i mmer s e th emselv es i n deta i ls too small and too unpred i cta ble to be d ealt w i th e f f ect i v ely b y

stat i st i c i ans . f3 u t th e e f f ect i v eness o f pol i t i cal

actors also dep ends on the i r "h i s tor i c al v i s i on", on

the i r ab i l i ty to ju dge the res u l ts of the i r l i f e ' s wor k on the su bsequ ent t i d e of e v ents ; lon�-term trends may be more determ i n i st i c, and mor e s usce pt i ble to

mathema t i cal analys i s . In ord er to sort out these

Page V-2 8

underl ying trend s , one mu s t soMehow ad j us t f or the

exis tence o f a great deal of shor t-term fl uctu a tion .

This s hort- term f l uctu a tion woul d ty pica l l y have a very

com pl ex and changeab l e p a ttern of a u tocorrel a tion; thus a compl ete model o f the "measurement noise" proce s s is

doubl y infeasibl e, and one mu s t f ace up to the need for

"robu s t" proced u res, a s des cribed in s ection (iii) above and in s ection (vii) o f Cha p ter ( I I ) .

Af ter a "robu s t" anal y sis, one m a y find tha t some

v ariabl es tend to fo l l ow de terminis tic l aws , over time,

bu t tha t o thers s til l invol v e a grea t deal o f a p p arent randomnes s. I n this ca se, one migh t ex pect tha t the

pol itical actor woul d have his grea tes t personal ef f ect

on his tory b y trying to change the l a t ter v a riabl e s -

1,-\/hich � be changed - and by air1ing onl y indirectl y a t the former v ariabl es. Al s o, by knowing where events

woul d be hea ded if he acted 1 ike the average pol i tical actor, an informed po l itical actor may jud ge the

impor tance of ta king unu s u al l y intens e actions to break

out o f the exis ting trend s.

Fur t h ermore, if there

shoul d turn out to be a "cro s sro a d s " o f po s sibil itie s

ahead ("bif urca tion", in m a thematical l anguage), such tha t the choice o f p o s sibil ities ahea d wo ul d produce

very l ong-l ived ef f ects , whil e o ther impl ica tions o f

Pa g e V- 2 9

present policy wo ul d b e w ash e d o u t in time b y random

noise, one might well choose to or�anize one ' s entire policy aro und the go al of moving down the right ro ad,

even if this mean s f o cusing one ' s attention on v ariables which are har d er to aff ect .

In practice, there are serio us dif f icul ties in

using statistics b y themselv es in d e al in� with this

third obj ectiv e . Statistics mi�h t d o well in predicting

the stress which will pull at the f abric of various societies; it wi 1 1 not do as well in predicting the ab i 1it y of 1o ca 1


po 1 itica 1

1 ea de rs to cope w ith the

Still, to know the causes and the m a gnitude of

the stress would be l nterestinB in any case .

But there

d e e per his torical trends can h.e. anal yzed best


is a bigger dif f iculty with using statistics he n�.

ma ke � of




l onges� possible relevant d a ta series ;

yet much of our historical d ata base, a s d escribed b y

Toynbee and McNeil, involves informa tion about

civiliz ations which have not left us a large suppl y of statistical d ata series.

In sone cases, a l arge su pply o f recent d ata m a y

b e enough.

It m a y even seem superior to historical

d ata, on gro unds th at it re flects exclu sively mo dern

phenomena which one wo uld ex p ect to continue in th e future ;

for ex ampl e, certain asp ects o f po pulation

Pag e V-3 0

d ynamics m a y b e d ealt with reasonably enough from

recent d ata, e specially insof ar as the f u ture wi 1 1 d epend heavily on the impact of recen t phenomena such as large-scale f emale literacy. On the other hand, if

there is any tru th at all in the finding s of Toynbee

and Spengler, then the central trend s of his tory may includ e re gular ris e s and d ecline s (3 2 ) , r egular

curvatur e s, w h ich would b e much harder to observe in short d ata-series - even s hort representative data-series - than in very lon� d ata-s eries .

Thus the

long data-series do much more than increas e the number of observation s and improve the accuracy of our

estimate s of parameters ; the y give us the power to d e al

with important qualitative e ff ects, with whole new

terms in the mod el, which might otherwise be mis s e d . F urthermore , onP. might e x p ect that the f u ture

would represent a d ynariic domain j ust as dif f erent from the pre s ent, as the pre s ent is from the past ; in ord er

to pre dict this domain, it may be more rational to loo k for regularitie s which hav e e x ten d e d from the distant

past to the presen t , and extrapolate them, rather than extrapolate mod els specific to the d ynamic domains of

the pre s ent.

If, in some cas P. s , his tor y were dominated

by lar ge, infre quent and apparently irreversible

changes, as in technolog y , then a longer d ata-base

Page V-3 1

would h e eve n more e s s e ntial, to give us an adequate

s ampl e to r e pre s e nt the varietie s o f such chan g e s and

co n f iguratio n s in the pas t;

o nly in this way can we

ho pe to d eal w ith the f urth er chan g e s which, u n d er this as sum ptio n, would rloninate the f uture .

(This do e s .!lQ..t.,

o f cour s e, e ntail buildin g mo d els tailored to s ho r ter,

wel l d el imited perio d s in an cie n t his tory, as

unre pre s en tativ e an d r e s tric t e d as recent his tory. )

Also, to d eal wi th the pos s ibility that the human race might be e nteri n g a n ew d omain of experie n ce, totally

dif f erent from an y o f its pas t his tory, o n e might

simply exte n d the bas ic context o f o n e ' s an aly sis s till

further, to in clud e the more g e n era l his tory of s pe cie s

on this plan et an d the patter n s o f evolutio n revealed therein. The biolo gical example of the trilo bite s ,

which b e came totally extin ct after overs p e cializ atio n,

do e s not have a f ull- s c ale parallel I n the human pas t ;

this particular example has be e n mentio n e d o f ten i n the popular pre s s, but there may be other as pects o f

biolo gical his tory more g e n eral an d eve n more r elevan t

to our own f uture (3 3 ) . In g e n eral, it wo uld appear

futile to tr y to predic t the f in e d etails o f a compl ex, n atural s y s tem such as human s o ciety be fore we c an

co n s truct f irs t-ord er mo d els which c an co p e with a

ge n eral review o f th e aggre gate behavior of this s y s tem

Pa�e V-32

across the whole of the avai lable data-b ase .

All of these v i s i ons of h i stor y emphas i ze that

rap i d per i ods of expan s i on, last i n g for deca des or

m i lle n i a, have e x i sted often e nou gh i n the past, and

have b ee n terM l n ated ofte n e nou �h by the growth of

coun ter tre n ds; the y empha s i ze that an anal ysi s of

soc i al dynam i cs, base d o nly o n the data ava i lable i n

rece nt decades, m a y l ea d to a totally f alse p i cture of

the poss i b i l i t i es w h i ch l i e ahe a rl.

Th i s d i ff i culty

certa i n l y appl i es to verbal theor i z i n g, just as much as to sta t i st i cs;

w i th verbal theor i z i n g, howeve r, the

h i stor i cal data are far more e x ten s i ve for those who

are w i ll i n g to e xam i ne them. Wh i le we wo11ld not agree

w i th the exact de ta i ls of the theor i es o f S pe n gler a n d

Toy nbee, we wott ld con s i der i t all the more i m portant to

descr i be and expla i n the phe nome n a the y have d i scussed . F i n all y, stat i st i cal methods may hel p o n another

level - as a parad i gm to gu i de verbal rese arch, both i n gen eral a nd i n spec i f i c cases. We h ave emphas i z e d

throughout th i s chapter the value of verbal research, conce i ved as an attempt to do w i th verhal data what

stat i st i cal research doe s w i th mathemat i cal data. Yet

eve n i n stat i st i cs, where the methods use d are spel l ed out expl i c i tl y i n adva nce , we h ave seen that the

Pa P, e V - 3 3

po pu l a r m e t ho d s o f a n a l y s i s c a n b e q u i t e u n r e a l i s t i c

and decep t i ve .

I t wo u l d s e em u n r e a so n a b l e to e x p e c t a n y

rio r e , a p r i o r i , f r or, v e r b a l r e s e a r c h .

I n d e e d , v✓ e h a v e

b e e n a b l e to com p a r e t h e two f o rr,s of r e s e a r c h i n

r e s pe c t to t h r e e i s s u e s : t h e emph a s i s o n p r e d i c t i o n ,

t h e w i l l i n g n e s s to d e a l w i t h t h e l o n g e s t po s s i b l e

t i me - s e r i e s , a n d t h e w i l l i n g n e s s to a c c e p t no i s e a s

pa r t o f p r ed i c t i ve t h eor i es ; i n a l l t h ree cas es ,

e s pe c i a l l y t h e f i r s t a n d mo s t b as i c , c l a s s i c v e r b a l po l i t i ca l s c i e n ce - w i t h a h a n d f u l o f no t a b l e

e x c e p t i o n s - d o e s no t a p pe a r t o h a v e g ro u n d s to c l a i m

s u p e r i o r i t y i n i t s a t t i t u d e s , a t l e a s t no t i n t h e p a r t s

of t h e l i t e r a t u r e w i t h w h i c h w e a r e f am i l i a r (3 �) .

I n t h i s s i t u a t i o n , t h e me t h o do l o � i c a l a d v a n ce s i n

s t a t i s t i c s - wh i c h a r e we l l - d e f i n e d , a n d w h i c h c a n b e

co n so l i d a t e d - c a n b e o f r, a j o r v a l u e i n e d u c a t i n g t h e v e r b a l l y - o r i e n t e d po l i t i c a l s c i e n t i s t .

T h e co n c e p t o f

s to c h a s t i c p r e d i c t i v e t h eo r i e s m a y s e em r e a s o n a b l e i n t h e a b s t r a c t to t h e v e r b a l po l i t i c a l s c i e n t i s t ;

howe v e r , w h e n h e t r i e s to t r a n s l a t e t h i s i d e a i n to a s t r a t e gy f o r h i s own r e s ea r c h , i t wo u l d no t b e

s u r p r i s i n g i f t h e d i f f i c u l t y o f do i n e so b ro u g h t h i m to

w i t h d r aw b a c k to A r i s to t e l i a n p ro c e d u r e s .

I n dee d , t h e

c l a s s i c a t t emp t to t r c'! c k down t h e l o n g- t e rm " ca u s e s " of h i s to r i c a l e v e n t s i s , a s we h av e rien t i o n e d a bo v e , v e r y

Pa ge V- 3 4

c l o sel y l in ked to the search for dy n amic mo dels. However, the concept o f

11cau se 11

is a \'leak enou gh

paradigm that historians have o f ten fo u n d themselves

forced to admit


riulticau sality 11 (3 5 ), and then to

withdraw into more descri ptive, more "o b j ective"

q uestion s. Furthermore, historians have o f ten adriitted

themselves to be intrigued by the "great ' what I f ' q uestio n s o f history", such as, "What \1ould have

hap pen ed if the S panish Arriada had wo n in

15 8 8 ?" ;


such q uestio n s - which clear l y call for the u se of some kind o f predictive mode l - h av e been dismissed as

speculative (3 6 ).

I n short, there would ap pear to be a n eed for a

more durab l e paradi�m in analytic verbal research.

verbal social scientists can becone more and more


f amiliar, o n an I ntuitive level, with the co ncrete

methods o f statistics, in cor1in g to grips with co ncrete data, then they may be able to develo p a clearer and

clearer picture o f what it mean s to search for ro bust predictive models, mather,atica l or verb a l . A l so , they

may be ex pected to learn to appreciate the va l ue o f

treatin g q uantitative variables as such , even i n verbal discussio n , rather than re d u cing them to such possibilities as "high" and

1 1 low 1 1 ( 3 7 ).

The value o f statistica l rietho rl s as a paradigri f or

Pa ge V- 3 5

v erbal re s earch m a y b e e s pecially gre at in those ca s e s

where s tatistics can d eal ex pl icitl y with so�e, but not all , of the his torical d ata of int ere s t . One can

imagine two w a y s in which this v al ue woul d b e felt, a s

s tatistics are bro u ght to b e ar on thos e d ata which are

a vail a bl e.

F irst of all , tho s e people doing th e s tati s tical

research woul d h a v e to choos e a s et of varia bl e s to

use, in d e vel oping pre dictive models .

When predicting

a s y stem l ik e a mis s ! l e, made u p of fiv e ma j or

sub s y stems or so, one ' s pr i mary concern in ma king

medium-term pre dictions is with the "overall s y stem, "

with the s y stem made u p of the In teractions b etween the

five major sub s y s tems. Similarl y, when a s king for

l ong-term prediction of a s tatistical system, m a d e u p

o f fiv e cl usters of heavil y intercorrel a t ed s y stems of

v ariabl e s, one would normal ly s tart out b y a ggregating e ach of the cl usters, b y u s e of f actor anal y sis,

pattern anal y sis or o ther proce dure s, a nd then s tud ying the rel ations b etween the a g gregate varia ble s .

To try

to pre dict wh ere a mis sil e will go, b y pre dicting what

each of the s ubsystems woul d d o In total isol a tion from e ach other, is to ignore the mo st Important functional

rel ations.

Statistical analy s is, b y drawing us aw ay in

concrete ca s e s from a fix ation on the internal d ynamics

Page V- 36

o f special ize d subsystems, may hel o bring us b ack to

stud ying the bro ad structure o f interf aces and mul tipl e

sub systems which is crucial to even a f irs t -order prediction o f h uman societies .

Then, when researchers

go b ack to l oo k inside the subsystems, we m ay hope th at

they wil l f o cus mo r e attention on those questions about t he interf aces w h ich appeare d important at the gl o b al

l evel .

The "unity of science" may b e a deb atabl e

pro position when appl ie d to predictive model s o f, say, biol o gical s ystems and astronomical systems. However,

when one is trying to pre dict a singl e , h ighl y

integrate d system, th e need for interd iscipl inar y unity becomes overw hel ming . When smal l f ee d b ack terms from

one sub s ystem to ano t her can h ave o verw hel ming e ffects

in de termining system b e h av io r, it is essential to try to measure the aggregate be h av ior directl y.

In the next p h ase, af ter the statistics h ave

pointe d to concrete interdiscipl inary e f fects, human

verbal knowl e d ge can go o n to expl ain and to qual if y t hese concl usions.

One migh t, on verbal grounds,

regard the concl usions as misl eading, as

oversimpl if ie d, or as o ne - side d

in their emphasis; in

any case, h owever, even to d iscuss these co ncl usions

intel l igentl y , one must try to discuss, on the b asis of

ve r bal knowl e d ge, w h y one woul d expect certain

Pa g e V- 3 7

co r r e l a t i o n s a n d ce r t a i n d y n am i c pa t t e r n s to vJO r k o u t a s t h e y do . O n e wo u l d h ave t o d i s c u s s t h e " t r u e " d y n am i c r e l a t i o n s b e tvve e n d i f f e r e n t s u b s y s t er,s ,


o r d e r t o e x p r e s s w h a t o n e f e e l s a r e w ro n � w i t h t h e s t a t i s t i c s . One wo u l d b e t emp t e d t o


h y po t h e t i c a l




e n ga g e i n

an a l y t i c" mo d e l l i n � - to s u g g e s t

wh a t wo u l d h a v e h a p p e n e d to t h e s t a t i s t i c s i f o n e h a d i n c l u d e d , a s h a r d d a t a , c e r t a i n v a r i a b l e s fo r w h i c h

h a r d ma t h ema t i c a l d a t a d o n o t h a p p e n t o e x i s t .

\"IO u


l d f o c u s o n e ' s a t t en t i o n o n t h e v a r i a b l e s of c en t r a l

i n t e r e s t , r a t h e r t h a n l o s e o n e s e l f i n a mo r a s s of

u n r e l a t ed h i g he r -o r d e r v i c i s s i t u d e s .

I n b r i ef , o ne

m i g h t a cq u i r e t h e rnone n t ur, n e ce s s a r y to l a u n c h i n t o a f u l l - s ca l e ve r b a l d y n am i c a n a l y s i s , w i t ho u t c r a s h i n g

back unde r t h e we i gh t of p u r e c l a s s i ca l t r ad i t i on s .

Befo r e we c l o s e t h i s d i s c u s s i o n o f t h e va l u e o f

s t a t i s t i c s t o po l i t i c a l a n a l y s i s , i t m a y b e wo r t h

n o t i n g t h a t i m po r t a n t a pp l i c a t i o n s m a y a l so ex i s t i n

po l i t i c s p r o ne r .

E a r l y i n C h a p t e r ( I I ) , w e me n t i o n ed

t h e po s s i b i l i t y o f a g row t h i n eco l o g i c a l a n d

s o c i o l og i c a l mod e l s , b a s e d o n t h e v a s t acc ur,u l a t i o n o f

d a t a b y e a r t h s a t e l l i t e s . Ma n y o f t h e d i f f i c u l t i e s

c i t e d a bove - pa r t i c u l a r l y t h e dom i n a n c e of ve r b a l d a t a

ove r q u a n t i t a t i ve d a t a - wou l d n o t a pp l y i n t h i s c a s e ; a l so , o u r emp h a s i s o n f ee d b a c k ef f e c t s a n d

Pa.P; e V-38

interdis ciplinary r e s earch wou l d apply more than ever.

Given the im portance o f world b alance s o f agricultural pro duction and po pulation, given the dangers o f

ecolo gical catas tro r>he thro 11gh high s tre s s and

imb al ance in s u ch s y stems, and given the pos sibil itie s

here f or treatie s to set up a coordinate d glo bal s y stem to monitor and help control the s e s y s t em s from s pace,

the practical political scientis t wo u l d have goo d

reason to think abo u t thes e ap pl ications . This is dou bly true, Inso f ar as the develo pne nt o f the s e

ap plications may b e f ar f ror, automatic. ( v)


Now let us return once more to o ur s tarting point,

to the worries o f the political actor about u s ing mathematics .

(S ee s ection

( i i ).)

the s e worrie s on two l eve l s :

We have dealt with

Ci ) the leve l o f

def ending the no tion o f prediction ;

( i i )

the l evel o f

d e s cribing the practical applications o f s tat is tical

model ling to politic a l pre diction . We hav e emphasiz ed

the empirical ap pro ach, in bo t h verb al and s tatistical

Page V-3 9


On another level, however, the political actor

might q u e s tion the idea that e mpir ical approaches are eno u gh, when the s u b j ects o f one ' s inve s tigat i on are

intellig ent human b eings. In particular, the political actor wo uld h a v e to reconcile the u s e of o b j ective

metho ds to pre dict other actor s, while pres erving the s en s e of free wi 1 1 in making his own d ecisions. On a

primitive level, this paradox pos e s no difficu l tie s at all to the political actor ; it is eas y to conj ure up

the image o f a f as t- d ealing political hack, working for a city machine, gle e f ully pushing people aro und as if

they were b uttons on a pin-ball machine. A t a more

advanced level, politicians f ind that they can pre dict

people better and influ ence their actions more

constructiv ely by exploiting emr,athy, by u sing their

own reaction patterns as a kind o f analo g u e mod el to

pre dict the reactions o f others . Thu s there are the

old, persis tent ad ages : "lf yo u want to predict what a man \>Jill do, try to put yo urs elf in his s ho es, " and,

"If you w ere a • • • what vmuld


do ?"

This proce d ure

is particularly ef f ective when the political actors

come froP1 the s ame b ackgro und as the people they are

pre dicting, when their b �c k gro u nd is cosmopolitan, or

when their reaction patterns are d ef ined at a general

Pap;e V-4 0

enou g h level to make it easier to imagine how another person would re ally re s pond to a s itu ation very

dif f e r ent from th at of the political actor h im self. (Empath y may al so be u se d , o f cours e , as a tool in

t h inking of approac he s one might borrow from others in

coping with one ' s own prohlems . ) W hen a political actor

os cillat e s between thinking about o t h ers

" s ubj e ctivel y", in terms of empath y , and th inking about th er, "ob j ectively", in terrr,s of pre dictive empiricis m, a conflict emerge s , long be fore th e u se of s tatistics

as s u c h aris e s .

Rationality, and th e ac knowle d gement

of oth er s ' capacity for rationality , wou ld ap pear to

allow no e s cape f ron this conflict; as long as we h ave two distinct source s of information from w h ic h to

predict the b e h avior of people , we mu s t live as be st we

can with the conflictin� pre dictions , wh ile tr:x:ing to

reconcile ..t.bJ1m 1Dl concre te improvements .lo. ..t.b.!:t concepts

Yl..ft use on both s ides .

Pre dictive s tatis tical morl e l s , like empirical

v erbal models , cannot dire ctl y expre s s the ins ig h ts

derived fror,


emf)ath y 1 1 ; in particular, the y cannot

e x pre s s the ins ights derived from ac knowle dging th e

intelligence of oth er h uman being s (3 8 ) . This limitation may be of enorrm u s importance to the practical

nolitical acto r . On the othe r h and, the re late d

PaP;e V - li l

mathematica l con ce pt of max imiz in g a cardin al utility

f u nctio n expre s s e s the idea of human Intel l igence, more

vividly an d more precis ely than the u s u al verbal

formulation s . The argw'lents of Von N e uri a nn C3 9) an d of

Raif f a (4 0) in f avor of this con cept req uire little more than lor.;ical con sis ten cy on the whole, I n the ul timate

value s that the in divid u al purs u e s ; while the s eriou s

political actor would normally admi t that he sonetime s

acts s tu pid, an d some times acts at cro s s - purpo s e s

again s t him s elf, e s p e cially when limitation s on time

an d on kn owledge con strain his detailed

d ecis ion-makin g, he would rarely con sider s uch mis tak e s

a s a matter of f ixed


d eliberate policy.

Often, when

the political an aly s t would accu s e him of in d ul r.;in g in

irration ality, he wo uld hav e a counterargument of his

own, b as e d on the knowle d g e an d concepts available to him at the time of his d ecision.

If we agree with Raif f a, then, that the

max iriiz ation of cardin al u tility is "va l id" as a

fou n d ation for mos t political d ecision-making, w e find

our s elve s le d to important conclu sions about political an al y sis too. First of all, we fin d ourselve s

re-emphasizing the po i n t that verb al r e s e arch may h e regard e d as an attempt to perform valid statistical

in f ere n c e, accou nting for d ata which is le s s s tructure d

Page V-4 2

and l ess managea hl e than the u su al statisti c al

time-seri es.

Within Raiffa ' s framewor k, the has l c

q uestions one asks are q u antitative in nat ure ; e . g. -

" If 1;\l e carry out action A , how � wil l it cost, .b.rui


do we gain, and \l'J hat are the proba b 1 1 i ti es that we

wil l su cceed ? " ask, "lf


More generall y, Raiffa woul d have u s

carry o u t action A , startinr, from situation

B, wha t is the distrib ution of probabil i ties attached to the different possibl e level s of cost and to

different possibl e outcomes?

How much do we e xpect to

gain from each of the possible outcomes, If our

subseq u ent strate gy is optirial ? "

In each case, we d o the best we can to estimate

these q u a nti ties on the basis of the

avail able verb al

Information ; thus the rese arch carri e d out on that

information, is carri e d out for the pur pose of

extracting the most acc urate possible statisti cal


We also ac count for the intell l gence of

other actors . We al so ac count for more direct

q uantitative evid ence, whenever we c an find it .

most so urces of information, w e e x pect to get probabilisti c indi cations of various kinds, certainties.



Through practice, we may hope to learn

more and more the art of formulating accurately the

Inter r elated patterns of statistical Im pli cations of


v- � 3

our verb al knowled ge, and to reduce the lo s s e s in

transl ation which al ways Intervene in going from r aw

obs erv ation to decis ion.

Second - and more important - the concept of

util ity maximiz ation of f ers us an ideal iz e d model of

intel l igent decision-ma king, for usn In the prediction

of other pol itical actors .

At fir s t gl ance, this

concept may sound r ather culture- bound.

However, the

concept of utility function i s very g eneraliz e d in the range of concrete behavior it can includ e . One can

ima gine all sorts of different utility functions. One can imagine many different l ev els of knowle dge and

aptitude brought to b e ar in maximiz ing utility

functions . One can ev en imagine differ ent levels of

b a sic cognitive s tructure, a s s u gg e sted b y Piaget { � l )

and b y ego p s ychiatri s t s { � 2 ), levels which one may hope

either to rememb er or to a dvance to . One can ima gine states of short- term p sychological diseq uilibrium,

where a pol itical actor doe s not yet take the actions b e s t suited to m aximiz inr, his utility function,

because, on some l evel , he has not y et b ecome aware of

the po s sibi 1 i ties. { Th e detection of s uch dis equi 1 ibria

is particularly important to pol itical actors whose job is to persuade othe r s to change thei r course of a ction. )

T hus, s tarting from th e concept of utility

Pa g e V- 4 4

ma ximiz ation, one can approa ch empirical re ality bit by bit, by u sing both empathy and empirical d ata to a d d

qualifications to one ' s view of o ther actors a s i d e al

d eci sion-m a k ers . Even a s one a d d s qualif ications ,

however, one can continue to insis t th a t all

interpersonal differences in pers onality be analyz e d in terms of the current s tate parameters of a s y s tem which

obe y s the s ame gen eral d ynamic laws a s one ' s own mind , and which Is capable of chang i ng its s tate parameters

a s a result of the general le arning capability s hared

by all humans ( 4 3 ) .

From a theoretical point of view, the choice of

s tarting point is not a matter of m ere boo k k eepping ; it

defines one ' s implicit


prior probability"

distribution, a s d e s cribed in s ection ( v ) of Chapter Cl I). From a practical point of view, this proced ure

can help us avoid rigid s tereotypes of other political actors ; it can help us rememb er that the y , too, have a

capacity for change , and that the likely directions of

change are not entirely random. In any ca s e , this

proce dure allow s us to m a ke u s e of both major source s of information , infornation d eriv e d from empathy and information d erived from more obje ctive d ata. This

proced ure also sugge sts that the procedur e s mentioned

in section ( x ) of Cha pter (II) for utility ma ximiz ation

Page V- 4 5

m i ght b e u s e d, not j u st as a techniq u e fo r analyz i ng decis i on-mak i ng s y stems, b ut as the b as is fo r s u b stantive mod e l s of s u ch s y stems .

The u s e of u t i lity max i m i z ation as a model of

dec i s i on-mak i ng has al r e ady le d to a numb e r of

p ract i cal applicat i ons, notably J n game theo r y and i n

microeconom i cs.

The conce pt of i deal u t i l i ty

max i m i zat i on p r ed i cts the b ehav i o r of an actor

cond i t i onal u pon the i nfo rmat i on that he has ava i lable

to !l.J..m.; thu s i t may lead to an l mplicl t morl el, a model d ef i ned i n te rms of variable s which a r e not d i r ect l y

o b s ervable to othe r acto r s.

In te rms of b ehavio r i s t

att i tudes, th i s i s a maj o r l i ab i lity, i n sofar a s i t

make s i t much mo r e d i f f i ct1lt to p r e d i ct b ehav i o r

conc r etely ; on the oth e r hand, s u ch i mpl i cit mod e l s may

a l low u s to i nf e r someth i n � about the h i d den variables

from the ove r t b ehav i or.

In the cas e of m i croeconom i cs, one doe s not

attempt to p r e d i ct the actu al levels of steel

p rodu ct i on, etc . , at least not i n the earl y s tag e s of

r e s earch ; i nstead, one de f ends th e p ropos i t i o n that the

levels of steel p rod uct i on w i ll b e equal to whate ver level i s nece s sa r y in o r d e r to max i m i z e soMe k i n d of

utility funct i on, i f the decis i on I s mad e b y an economy

wh i ch enj oys p e rfect compet i t i o n (4 4 ) .

(Str i ctly

Pa.r.;e V-li 6

spe aki n g , howeve r , e conomis ts now te n d to avoirl the

con ce pt of "so cial utili t y fu n c tio n ", o n g ro u n ds th at

th ey can dedu c e sinila r co n clu sio n s f r om we ake r

ve r sio n s of the s ame c r ite r io n. )

O n e might well h ave

atta c k e d th is th eo r y , in its e a rly s tage s , a s an

un s cie ntif ic - tho u g h mathema tical - exe r cis e in

p ro pa g an da. Howe ve r , a s th e t h eo r y was develo p e d, it

tu r n ed o ut to be a powe r f ul f r amewo r k f o r evaluating

the .inef ficie n cie s p r o d u ce d by situ atio n s o f r f e ct

competitio n in t h e r e al wo rld (4 5 ) ; it h a s b e e n u s e d to a n alyz e the eff e cts of taxes a n d labo r laws (4 G ) o n

econonic eff icie n cy ; it h a s le d to t h e develo p me n t o f Liebe rman ' s p rin ciple s o f e co n omic o r g aniz atio n (4 7 ),

n ow a main stay of th e Soviet e c o n omy (4 8 ).

Mic r o e co n omic s, initially a n isolate d a n d e s s entially u nempiric al th eo r y , has tu r n e d into a powe r f u l

mathematical f r a mewo r k f o r a n aly sis , a f r amewo r k allowin g th e u s e f u l b r i n �in R toge t h e r o f vast

q u antitie s of empirical data, a f r a mewo r k impo rtant to bo th th e p r e di c t io n a n d th e comp r ehen sio n o f e co nomic p h e nomena.

Yet all o f this s uc ce s s was b a s e d o n a s tatic

co n c e pt of u tility maximiz a tio n , a con c e pt of o ptimal

equilib rium, r elated to th e cla s s ic co n ce pts o f

l a g r a n ge (4 n ). 3in ce th e n, No rbe rt t\lie n e r (5 0 ) h a s

Page V- 47

dis cus s e d t h e more mod ern, more powerful d ynamic

t h eorie s of m ax imiz ation, w h ich h e would consider a

s ubs tud y of "cybernetics . " H e h as suggested that this

bod y of th eory be ap plied to t h e h uman brain (51). Karl

Deutsch (5 2 ) h as gone on to s 11 g g e s t that cybernetics may

also b e applie d to po l itical science . Cons idering t h e

power that a primitiv e, s tatic conce pt o f maximization h as h ad, over d ecades, in economics, th ese s ugges tions

would appear to make a great deal of s ens e .

Unfortunately, th es e id eas are caug h t between th e

"mighty op posite s " of mod ern methodology - t h e

be havioris ts, w ho would d e mand q 11antitativ e empirical

proof that th e initial mod el predicts all t h e variables

in d etail, and the traditionalists, who would not have patience with t h e math e�atlcs. Also, in th e last f ew

years, t h e relevant p h a s e of "cybernetics " h as been

renam e d "control theory." He h ave d is cus s e d th e value

of "control th eor y" (i . e . of o ptimization techniques)

in C h apter (11) as a tool in analyzing social s ys tems ; however, control theor y may also be us e d its elf as a

normativ e mod el, of th e proce s s e s w h ich allow h uman societies - or even t h e h uman brain its elf - to

function .

(See note (5 3 ) for mor e concrete

pos sibilities . ) As with microeconomics, one will e xpect to f ind that th e real s y s t ems inv olv e imperf ections and

Page V-4 8

approximations to the optimum. (5 4 ).

However, one may

also f ind It intere sting to be abl e to see where these imperf ections are , and to appreciate the capacities

that human beings and political societie s - like

e conomies - do have , to co pe with d ata on a scale far

beyond the capacity of pre s ent- d ay computers .

I nso far

as one agree s with the traditional ist that the human

being is stil l the mos t releva n t uni t of analysis in politics , one can try, in the future, to e xpand the

interf ace b etwe en cybernet i cs in p s ychology and

c ybernetics in political science.

I n summary , we have conclud e d that the

mathematical method s o utlined in Chapter (I I ) can

ind e e d be applied to political science , but that the y s hould always b e consid ere d as only o n e branch of a

more comple x , inte grate d system of analysis , oriented toward s the goal o f prediction;

the Raye sian

phil o s o ph y of util ity max imiz ation and condit ional

probabil itie s could play a central role in organizing

this s y stem of analy sis , but the behaviorist philoso phy

of total empiricism doe s not hav e the pow er to account for major parts o f this sy stem, parts which appear

e ssential in th e l as t part o f this chapter.

Page V - 4 9


C l) One might take up considerabl e space discussing this particul ar point. Traditional Lanchester ' s Laws for ancient warfare indicate an equal number of deaths on both sides, regardl ess of concentration ; thus, they tend to impl y no possibil ity of strategy in such warfare. On the other hand, the "Laws" usual ly have the discl aimer "ceteris pad b us" attached, impl ying equal l evel s of material and social "technol ogy. " I f "social technol ogy" includ es superior strategic abil ity on the part of a commander, l ike Caesar, then the l aws become a poor guide for the would-be superior strateg l st. We have l imited our statement here to the cl aim that Caesar won his victories by evading Lanchester 1 s Laws, by avoiding the necessity for attrition or perhaps by expl oiting the l oopholes in the laws, rather than inval idating them; this much, at l east, seems fairl y clear to us from . Caesar ' s account itsel f. Liddel l -Hart, in his classic mil itary textbook, S tra tegy ( Praeger, NY, Second Edition, 1967), cites (p. 3 3 8 ) Caesar ' s llerda campaign, Cromwel l ' s Preston campaign, and a few others, as the cl assic bl oodless victories; he goes on to write, on p. 3 3 9: "Whil e such b loodl ess victories have been exceptions, their rarity enhances rather than detracts from their value - as an indicator of latent possibil ities, in strategy, and grand strategy. Despite many centuries ' experience of w ar, we have hardl y begun to expl ore the fiel d of psychol ogical warfare. From a deep study of war, Cl ausewitz was l ed to the concl usion that - ' Al l mil itary action is perme ated by intel l igent forces and their effects. ' 1 1 ( 2 ) Liddell-Hart general l y prefers to talk about the "indirect approach" and the "unexpected" more than the use of imagination, b ut clearly the former require the l atter. Hart writes (ibid) on p. 3 42 : "A more profound appreciation of how the psychol ogical permeates and dominates the physical sphere has an indirect val ue. For it warns illi Q..f � f al l acy .aru!. shal l owness tl a tt emp t ing .t.Q. ana l yze a.!151 theoriz e a bou t s t ra t egy 1n terms of Ma them a t ics. To treat it quantitatively, as f f the Issue turned onl y on a superior concentration of forces at a selected pl ace, is as f aul ty as to treat it geometrical l y • • . " Also, on p. 162, In the

Pap;e V- 5 0

sec tio n " Co n c l u sio n s f rori Twe nty - F ive Centu rie s " , Lid d e l 1- H a rt discu s s e s the c h a r acte ri s tie s o f the rr10 re u s u a l , b l oo d y victo ri e s : " • • • sca n ning, in tu r n , the decisive batt l e s of h is to r y , we f i n d th at in a l mo s t a l l the vic to r h a d h is o ppone nt a t a p s ycho l o�ica l d is a rlva ntap;e be f o re t h e c l a s h took p l ace • • • mo st o f the examp l e s f a l l into o ne of two catego rie s • • • de scribe d in the wo r d s ' l u re ' a n d ' t r a p ' . " '.';ee a l so n ote 1. T he f u l l we l �ht o f these o bj ectio n s w i l l n ot be dea l t with u n ti l sectio n (v) o f o u r tex t.

( 3 ) In wr iting thi s , a n d remembe rin g some of the inte re stin g gene r a l izatio n s we h a d h e a r d f rom p u r e ve rba l po l itica l science , it w a s dif f icu l t to overcome the se l ective mem� r y a n d f a ce u p to the o ve r a l l rnethodo l o �ica l views sti l l p romine n t in the f ie l d . But a quick r eview soon set o u r riemo r y s tr aig h t. F o r e x am p l e : "It s ho u l d be noted that the err1 D h a s i s he re I s on deductio n , not on i n d u ctio n. In the wo r d s o f a nothe r p a r tici p a nt in the semi n a r , P rofe s s o r S. E. Fine r , we a re makin g a n attempt a t ' de sc ribin g the po l itica l po s sibi l itie s. 1 C o n side r a h l e emp h asis s h ou l d be put o n the wo r d ' desc ribing ' : we remain in the h um h l e s ph e r e o f de s c riptio n a n d do not attempt to r ise to the m o n� l of ty o ne of s pecu l atio n. " (p . 4 0 of " Ge ne r a l Metho do l op;ica l P r o b l en s " , b y G u n n a r Hecksche r , in C omQa rative Po l itic s , Eckstein , H a r r y a n d A pte r , navid E. , e d s. , F ree P re s s o f G l e ncoe , 1 9 6 3 . ) I n his to rica l re sea rch , the p ro b l em is no re s e r io u s , a s : " My p rincip l e s a n d metho d s of resea rch a n d w r i tin g we re l a r ge l y 11o r ke d o u t u nco n s cio u s l y , th r o u g h l is tening to exce l l e n t te ache r� a n d f o l l owin g the be s t mode l s • • • T h e h is torian h a s both t he right a nd the d uty to make mo r a l j ud �enents. He s ho u l d not attempt to p ro p he s y , but he may o f fe r ca ut f o n s a nd is s ue wa r n i n e s. " ( p. 1� 1� - h S , Vista s of Hi sto ry, S amue l E l iot Mo ris o n , Kno p f , NY, 1 9 6 4 . ) O ne m a y a sk w h at the wa r ni n g s a r e s u p posed to b e b as e d o n , if not o n p ro b abi l it i e s o f u n de s ir abl e even ts co n ditio n a l upon ce rtain po l icy decisio n s ; a l s o o ne may q ue stio n w hethe r methodo l o gica l decisio n s o u gh t to be b a sed o n u nco n s ciou s f acto r s. See a l s o n ote s 1 0 a nd 3 6. ( 4 ) S pen g l e r , Os\f1a l d , The Dee l ine o f the \Jest, K n o p f , N Y , 1 9 2 6 , t r a n s l ate d f rom the 1 9 1 8 o rigin a l by C h a r l e s Atki n so n , p. 1 0 6 - 1 0 7 : " T he aim o nce

Page V - 51

attained - the idea, the en t ire content of inner possibilities, f u lfilled and made ex ternal ly actual - the C u lture suddenly hardens, it mortifies, its blood congeals • • • This - the inward and outward f u l fillrient, the f inality, that awaits every living Cu lture - is the pur port of ;:il l the historic "decl ines", amongst ther, that d ecline of the Classical w h ich we know so well and f u ll y, and ano ther d ecline, ent i rely comparable to it in course and d irection, which will occu p y the f irst centuries of the coming millenium b ut is heralded alread y and sensible in and around us tod ay - the decline of the Wes t . " p. 10 9 -110 : '' Every Cu lture, every adolescence and maturing and decay of a Culture, ever y one of its intrinsical ly necessary stages and perio ds, has a d efinite d uration, always the same, always recurring with the emphasis of a sym bol."

(5 ) Toynbee, Arnold ,J . , A S tudy of History. abrid gement of Voluries I- VI, O x ford U . Press, N Y, First American Edition, Four th Printing, 1 9 4 7, p . 2 L� 4: 1 1 The prob lem of the breakdowns of civiliz ations is more o bviou s than the prob lem of their growths. Indeed it is almost as obviou s as the prob lem of their geneses . The geneses of civiliz ations call for ex planation in view of the mere f act that this species has come I nto existence and that we are able to enumerate twent y - six representatives of it - including in that number the f ive arrested civiliz ations and ignoring the abortive civil iz ations. We may go on to observe that, o f these twent y - s ix, no less than sixteen are now d ead and b uried. " p. 2 4 5 : "lf we accept this phenorienon as a universal to ken of d ecline, we shal l conclu de that all the six nonWestern civiliz a tions alive today had broken down internall y before they were bro ken in u pon b y the im pact of Western civiliz ation f rom o u t side • • • For our present p ur poses it is enough to observe that of the l ivin� civiliz at ions every one has al read y bro ken dmm and is in process of disintegration except our own. " p. 2 5 3 - 2 5 4: " The metaphor of the wheel in itself o f fers an il lu s tration of recurrence being concurrent with pro gress • • • Thus the d etection o f periodic r epetitive �overnent s does not impl y that the process itsel f is of the same cyclic order as they are. On the contrary, if any inference can historicall y b e drawn from the period icit y of

Page V- 5 2

these minor movements (such as the rise and decl ine of Graeco-Roman civil iz ation), we may rather infer that the major movement which they bear in mind is not recurrent but progressiv e . " (Comments in parentheses inserted b y us . ) Al so, in variou s pl aces, Toynbee emphasizes both schol astic rigidity and corruption as s ymptoms of decaying civil iz ations; he hints at a dif f erent, l ess charitabl e expl anation of the common methodol ogical dif ficul ties we have mentioned In the text. However, even if Toynbee ' s expl anation has sor,e tr uth in it, a redu ction in the cost of adhering to good methodol ogy shoul d stil l f acil itate nore worthwhil e research .

(G ) Turner ' s theory is wel l -known as an attempt to articul ate the f actor s w h ich caused American progress in the l ast f e\J centuries; hoHever, it is not onl y interesting in its own right, but l s an exampl e how such ideas can be usef ul sometimes to those trying to decipher more general l aws of histor y, which, in turn, may be useful to present pol icy-makers . Wal ter Prescott Web b, in "The Frontier and the 4 0 0 - Year Boom", p . 1 3 6, t-Jri tes in cor,ment on Turner ' s ideas : "Assuming that the f rontier cl osed a bout 1 8 9 0, it may be said that the boom (in i.!lJ.. of Western civil iz ation) l asted approx imatel y four hundred years . It l asted so l ong that it came to be considered the normal state, a f al l acious ass um ption for any boom. It is conceivabl e that this boom has given the pecul iar character to modern history, to what we cal l Western civil iz atio n. 11 (Art i cl e b y Web b l ocated in Tayl or, George R. , ed . , The Turner Thesis : Concerning the EQ.u:. of � Frontier 1D. American History 1 Heath Co . , Lex ington, Mass., Third Edition, 1 9 7 2. Web b goes on to suggest that the search for "ne,v frontiers" is essential l y an irrational , desperate attempt to preserve a dying enterprise ; however , his assunptio n that new foci o f economic devel opment cannot be found does not al l ow for some of the possib il ities o f technol ogical progress over the nex t f ew decades . Over centuries, the l imits of the earth itsel f may be expected to prevent unl imited growth; on the other hand, when one speaks in terms of centuries, one cannot entirel y rul e out the possibil ity of devel oping economic activities b eyond the pl anet earth itsel f. Certain aspects of economic and technol ogical growth depend on l arge numbers of

Page V-5 3

indegend ent random d isturhances, which, on the whole, may accumul ate and be subject to accurate prediction by statis tical procedures or the verbal equivalent ; however, the historic develo pment of nucl ear power, fo r exampl e, o r the future possibilities of elementary p articl e physics and nonequil f hrium (nonl ocal) thermod ynamics (see note 5 1) , involve a more sweepin� form of prio r ignorance, which translates into probabil ities f ar from one or zero and whose values may change according to government or even individual decisions . At any rate, it is possibl e that the ideas mentioned b y Webh m ay have appl ication to other parts of the historical data base, beyond the West.

( 7 } Hegel , "The Philoso phy of Histo r y", excerpted in The Phi l osophy of Hegel , Carl Friedrich , ed . , Modern library, N Y , 1 9 5 1i , p. 2 1 - 22: "The Principle of development contains further the notion that an inner destiny or determination , some kind of presup position, is at the base of it and is brought into existence . This final d etermination is essential . The spirit which has worl d history as its stage, its prope rty and its field of actual ization is not such as vrnuld move carelessly about in a game of external accidents, but is instead the absolute d etermining factor. " p.2 3: "Horld history presents therefore the stages in the devel opment of the principl e whose memory is the consciousnes s of free dor,1 • . • 1 1 S tages then l isted .

( 8 ) Schwinger, Julian, Particl es, Sources and Fields # Addison-Wesley, Reading , Mass. , 1 9 7 0 , Preface, especiall y paragraph two of preface . The most phenomenol ogical approach in basic physics, the "S-matrix" approach, restricts its attention to describinF, the S -matrix . This matrix is defined as the matrix which predicts .all scatterin g resul ts , fo r al l possible scattering ex periments in high-energy physics ; these, in turn, constitute the vast majority of high-ener�y data . (9) In q uantitative political science, the obvious reference here is to Blal ock , Hubert M . J r . , Caysal I nference .in Nonex perimental Research, u . of North Carolina Pre s s, Chapel Hil l, 1 9 6 4 . Hayward Alker also recommend s : Simon, Herbert, 11o,jel s of Man# l1iley, MY, 1 9 5 7 , Part I, and Dahl ,

Page V - 5 4

R • A • , 1 1 C a u s e a n d E f f e c t i n t h e S t u cl y of Po 1 i t i c s a n d D i s c u s s i o n " , i n C a u s e a n d E f f ec t , O . L e r n e r e d • , F r e e P re s s , N ew Yo r k , 1 9 6 5 •

( 1 0 ) H . R . T r e vo r - Ro pe r , t h e n o t e d t r a d i t i o n a l h i s t o r i a n ( an d a n t agon i s t o f To y n b e e ) , w r i t e s , o n p . v i . of H i s to r i ca l E s say s � Ha r pe r a n d Row , N Y , 1 9 6 6 : " I t i s p e r h a p s a n a c h ro n i s t i c t o w r i t e of a h i s to r i a n ' s p h i l o s o p h y . To rl a y mo s t p ro f e s s i o n a l h i s to r i a n s ' s pe c i a l i z e ' . T h e y c h oo s e a p e r i o d , sone t i me s a ve r y b r i ef p e r i od , a n d w i t h i n t h a t per i o d t h e y s t r i v e , i n - d e s p e r a t e con p e t i t i o n w i t h e v e r - ex pa n d i n g e v i d e n c e , t o k now a l l t h e f a c t s . T h u s a rn e d , t h e y c a n comfo r t a b l y s hoo t down a n y ama t e u r s w ho b l u n d e r o r r i v a l s w h o s t r a y i n to t h e i r h e av i l y f o r t i f i e d f i e l d ; a n d , o f cou r s e , k n ow i n g t h e s t r e n g t h o f mo�d e r n d e f en s i v e we a po n s , t h e y t h ens e l v e s k e e p p r u d e n t l y w i t h i n t h e i r own f ro n t i e r . T he i r s i s a s t a t i c wo r l d , a M a g i no t l i n e , a n d l a rge r e se r v e s wh i c h t h e y s e l dom u s e ; h u t t h ey ha v e no p h i l o s o p h y . Fo r a h i s to r i c a l p h i l o so p h y i s i n com pa t i b l e w i t h s u c h n a r row f ro n t i e r s . " N o t e 3 6 a n d n o t e 3 a r e a l so i n t e re s t i n g i n t h i s co n n e c t i o n ; a l s o , Pa g e s V- 3 4 a n d V- 3 5 o f o u r t e x t co n s i d e r i n t e r d i s c i p l i n a r y e f f e c t s i n s onewh a t mo r e d e t a i l .

( 1 1 ) R a i f f a , How a r d , Dec i s i o n A n a l ys i s : I n t ro d u c to r y L e c t u r e s Q.O. Ma k i ng C h o i c e s Under Uns;e r t a i nty, A d d i s o n - We s l e y , R e a d i n � , Ma s s . , 1 9 6 8 . T h i s boo k , a t l ea s t , i s c l e a r l y i n t en d e d t o conm u n i c a t e to a b ro a d e r comm u n i t y . T h e p h i l o s o p h i c a l f o u n d a t i o n o f t h i s v i ew o f p r o b a b i l i t y i s d e s c r i b e d i n Ky b u r g , H e n r y E . J r . , a n d Smo k l e r , H . E . , e d s . , S t u d i e s l.n S u h i ec t i ye Pro b a b i l i ty W i l ey , N Y , 1 9 6 4 . A n a t o l R a p o po r t h a s c r i t i c i z ed t h e a b u s e o f p ro b ab i l i s t i c co n c e p t s b y d e c i s i o n -ma ke r s w h o do n o t f u l l y u n d e r s t an d t h e m ; s ee h i s S t r a tegy a n d Co n s c i enc e, H a r pe r a nd Row , N ew Yo r k , 1 9 6 4 , e s pe c i a l l y C h a p t e r , 1 0 . Neve r t h e l e s s , a f a l s e e s t i ma t e o f u n ce r t a i n t y m a y b e l e s s d a n g e ro u s t h a n a fo r ce d c ho i ce o f a b s o l u t e ce r t a i n t i e s ; Ra i f f a , i n a m emo co - a u t ho r e d w i t h Ma r c A l p e r t , h a s d i s c u s s e d i n d e t a i l t h e p ro b l em o f e d u c a t i n g a n d " c a l i b r a t i n g 1 1 d ec i s i o n - ma k e r s ( i . e . com p e n s a t i n g f or t h e i r o v e r co n f i d e n ce ) , to e s t i m a t e p ro b ab i l i t i e s mo r e r e a l i s t i c a l 1 y• ( S e e 1 1 /\ Pro � r e s s R e po r t o n t h e T r a i n i n g o f P r o b a b i l i t y A s s e s s o r s , " av a i l ab l e i n 1 9 7 1 a s a n u n pu h l i s h e d m a n u s c r i p t f r om t h e o f f i c e o f Prof . Ra i f f a i n t h e L i t t au e r B u i l d i n g , H a r v a r d

PaP,e V- 5 5

u. ) Stil l , Rapoport ' s comme nts o n the hazar d s of m athematical appro aches are well worth not i n g, for those vs1ho \'JO U 1 d want to u s e s uch approaches .

(1 2 ) The programs E V A L, D I F F a n d D E L TA o f R aymond Hopk i n s were mad e available to us by Prof. Deutsch, Dept. of Government, Harvard. They have been des cribed in Ho n k i ns, R a yrio nd, " Pro j ectio n s o f Population Chanp;e b y Mob i l i z atio n and As sim i lat i o n, " R ehayioral Sci ence, 1 9 7 2, p. 2 s 1�. ( 1 3) Johnston, J. , Econometric Method s, McGraw-Hill , rJ Y, 1 9 7 2, Secon d Editio n, p . i x: "The purpos e of this boo k is to provide a f airly s elf - con t ained d evel opment and explorat i on of econometr i c method s • • • I t i s d i vided into two parts . Part 1 contains a f ull ex pos i tion of the norm al regre s s i o n model. Th i s serve s as an e s sen t i al bas i s f or the theory of econometr i cs in Part 2 . 1 1

(14) I t '.\l a s surpri s ing to f i nd the phras e 1 1 pat h anal y s i s " s o rare in boo ks up to 1 9 7 3. I n 1 9 7 1, we d i s cus sed the s u b j ect at length with Prof. Alker, at M I T, one of the ma i n expone nts of this approach, with Prof. Raymo n d Tanter then of the Center for Res earch in Conflict Resolutio n at the Univers i ty o f Mich i gan, and w i th s tudents ta k i n g "path a n aly s i s " a s a sub j ect in the I nter- Univers ity Consortium for Pol i tical Research . I n a 1 1 cases, it w a s c 1 ear that "pa th analy s t s " w a s intended as a k i nd of ref i ned "cau s al an al y si s ", using regre s sion coe ff icients (or time - ser i e s re gre s sion coef f i cients, i n the soph i s ticate d versions ? ) a s indices of the size and d i re ction of inf luence. Simon, Bl al ock and Boudon h ave also been as soc i ated with "path a n alysis, " in dis cus sions at var i o u s universities . (1 5 ) See Bl al ock, note 9 . Sinple cor relation w a s u sually u s e d i n 1 1 caus 21l anal y sis " b a sed on Bl al ock ; this i s the un i var i ate s pecial ca se of regres sion analy si s .

( 1 6 ) As an arbitrary ex ample, p i c ked from a good antholog y of p a p ers i n this field, consider Sin ger, .J. David, ed . , Quanti t a t i ve I nternat i onal Pol l tics, lns il'!:hts lill.d Evidence . Free Pres s, NY, 1 9 6 8, tables of re sults on p . 2 7 8 - 2 8 1, p . 1 1 2, p . 1 5 2 - 1 5 3 , p. 1 9 9, p . :r n s , p . 6 5, p. 2 3 2. Statis tical si g n ificance score s (1 1 p 1 1 ) he r e often run to 1 1 . 0 5 1 1 ,

Page V-5 6

or even to as poor as

( 1 7)


.10 1 1 •

lsard, Walter, Methods of B,eg i onal Analys i s : An I ntroduc t i on ..t.Q. R�glonal Sc i ence, MIT Press, Cambr i dge, Mass . , 196 3, Th i rd Pr i nt i ng, n . 2 2 .

(18 ) nuesenberry , .J . S., Frornr,, G . , Kle i n, L . R . , and Kuh, E . H . , e ds. , .I.he. Broo k i ngs Model : Some Further Results, Rand-Mc Nally, Ch i cago, 1 9 6 9, p.2 9 6-297. T h e full regressi on mod el , just i f i e d b y sol i d s i gn i f i cance i nd i cat i ons, d i d not perform as well as a " condensed " - dramat i cally reduced - rnod el, at f i r s t. Then "adjustr,en ts" were appl i ed, wh i c h d ra mat i cally i r,proved the f i t; these adjustments seer,ed to en ta i l mul t i ply i ng eac h coeff i c i en t by a constant, su i table to adjust the pred i c t i ons of eac h var i able to th e r i g h t level i n the f i rst-h alf test d ata. On p.2 9 8 th ey c au t i on, even sti 1 1, : " If the mod el does i ndeed suffer from om i ss i on of i mpor tan t but slowly-c h an� i ng var i ables, then i t i s probably not very useful for long-run analys i s or projec t i on . " E st i r,at i ng or adjust i ng coeff i c i en ts to max i m i z e pred i c t i ve power d i rectly. over th e tr i al d ata, i s th e essence of our proposal i n sec t i ons (v i i ) and (x i ) of Chapter ( I I ) of th i s thesi s. (1 9 ) From Chapter (I I I) , "0 " i s s i rnply the m atr i x 1 1 8 1 1 i n th e un i var i ate case ; g i ven past error levels, a (t-k ), i t i s the bes t bas i s even for pred i c t i ng x (t + l ) . However, loo k i ng at x (t+n) from x (t) only, we ge t a long �a,!.h of correlat i ons mult i ply i ng. out to �. • • thus to g e t the op t i mal pre d i c t i on of x (t+n) , one rnul t i pl i es one ' s pred i ct i on of x (t+n-1) by ¢ .

(2 0 ) See note 1 9 of Chapter (I I ) of th i s thes i s .

(2 1) See sec t i on (x i ) of Chapter ( I I ) of th i s thesi s, for a way to i ntroduce a k i nd of " i nterest rate " or "d i scount fac tor ", to pre d i c t i ons of more d i stan t t i mes i n the future. Suc h procedures may be unavo i d able w h en a small amount of process no i se does ex i st, an d does accumu late through ti r,e .

(22) Mc Crac ken, Harlan L . , Keynes i an Econom i cs l.o. .t.1::1.g_ S tream of Ec;onom i c Though t, Lou i s i ana S tate U n i vers i ty Press, 1 9 6 1, p.5 1 : " Perh aps one of the f i nest con tr i but i ons Keynes mad e to econom i c


V- 5 7

theory and economic pol icy has been on the s ubject o f inve s tment. According to previo u s clas sical analy sis, s av ings, inv e s tment, and the rate o f intere s t all f itted in to the s tandard pattern o f d emand, s up�ly and price. A high rate o f intere st increa s e d s avings and d ecreas ed inve s tment, while a low rate o f intere s t d e creas e d s av ing s and increa s e d inve s tment, so it was a f unction o f price - the rate o f interest - to gravitate to the equil ibrium point w h ere s aving e q ualled inve s tment. There would be no s uch thing as over-saving or und e rinve s tment { i. e. d epre s sion ), as the y were continuo u sly being bro u ght into bal ance by an automatic regula tor. For Keyne s cl as s ical intere s t theory was in error at two basic points. Firs t, while a priori reasoning l eads to the natural conclus ion that a high rate o f intere s t stimulate s s aving and a low rate reduce s s aving, a pos teriori evid ence • • • " Comments in parenthe s e s our own. Gard n er Ackley d e s crib e s e q uivalent ideas in Ma croeco nom i c Theo ry, McMillan, r� Y , 1 9 6 1 , (Twel f th Print ing 1 9 6 7 ) p. 1 5 4- l S S : "Wick s ell ' s anal y si s { the clas sical anal y s is > . • • gav e u s, as has been stre s s ed, a rudimentary theory o f the aggre gate d emand for goo d s. This d emand consists of two main division s : cons 11r1er d emand and inv e stment d emand. Each o f the se d emand s was conceive d to be intere st- elas tic : the l ower the intere s t-rate, the gre ate r the inve s tment demand ; and the greater the con s umer d emand, too { the l atter idea is, o f cours e, m erely a re statement o f the i d e a that s aving d epend s negativel y on the intere st rate ) . • • If either type o f d eriand d ecl ine d, the r e s ulting f al l in the rate o f in t ere s t woul rl s timulate them both, and s hif t re s o urce s to the one which had not rlecline d . If, hm-1 cv er, f or any reason { particularl y expansion or contraction o f the money s uppl y by the ban k s ) the rate o f intere st were prevente d from performing this regul atory f unction, aggregate d emand • • • woul d be altered • • . B u t if wag e s and price s should not d ecl ine (eno u gh ) • • • \vor kers wo uld become uneriplo y e d, and real a s \.\l ell a s mone y income woul d be cut. "

{ 2 3 ) The history o f the s e studie s is rather complex. The majo r initial s tu d y, by Simon Kuznets, Na tional Income : A S urve� Q.f. Findings, rr n E R, NY, 1 9 4 6 , u s e s technical language dif f icult to s ummariz e h ere. Elizabeth W. Gil bey, the E conomics

Pag e V-5 8

of Consumption, Randor, Hou se, NY, 1 � 6 8 , p. 2 5 : " l n attempting to test this hypothes i s, contradictory resul ts arose from the use of tine -series and cross-sectional data. Simon Ku znet ' s study of data going al l the way back to 1 8 7 0 showed that the percentage of aggre gate incone saved had in fact remained constant in the United States. " (The contradiction invol ved the distribution of saving across hous ehol ds. ) A more recent survey is Patinkin, Money, Interes t i!lli! Prices, Harper and Row, N Y, Second E dition, 1 9 6 5, p. fl 5 1- 6 6 1i. Patin kin discusses l argel y the "weal th eff ect", based on the " Pigou effect", a more recent attempt to re surrect cl assical ideas ; on p. 6 5 6 and 6 5 7 Patinkin cites numerous studies which measure w eal th as real assets times interest rates. On p. 6 6 3 he describes his own resul ts for "beta" and "al pha", the forner w h ich he e quates ..,, i th 1 1 Y L "(income), and the l atter, at the top of p. 6 5 9, defined as beta tines interest rate. Thus the l atter resul ts expl icitl y measure the hypothesis that interest rates affect the pe rcent §ge of incor,e save d, whil e the former do measure something cl osel y rel ated; al so, the studies of Gol dsmith cited by Patinkin r eaffirmed the idea of a "const ant saving-income ratio. " Patinkin describ es his own studies, anrl some of the previous studies, as showing l arge and significant effects by var i abl es d er i ved from interest rates ; howev er, the actual regression coefficients of al pha ran to . 0 4-. 0 8, at the most, much smal l er than the coeff icients of beta, which was al ready a l arger number to b e �in with.

(2 1J) Many woul d identif y the "l ib eral " Roosevel t with the "l iber al I I Keynes. However, U S G N P d ata indicate rather strongl y that the main recovery from the Depre ssion coincid e d with major mil itary spending induced, not by econonic theory ( though Keynes ' theory might hav e recommend e d it, given no al ternative spending options on the same scal e ), but by Worl d War II. Keynes i an theory, in many respects, was not ful l y acce pte d in the US until John Kennedy became presid ent . Schl esinger, Arthur F. , A Thousand Davs, Houghton-Miff l in, Roston, 1 9 6 5 , p • 1 0 0 5 : "The ( tax cu t ) bil l made sl ow progre ss through Congress. Pu bl ic reaction at first was muted. Kennedy used to inquire of the profe ssors of the Council what had happened to the several mil l i on col l e g e stud ents who had

Pa,P,;e V-5 9

presumably been taug ht the new eco nomics • • • Still • • . o n Septemb er 2 5, 1 9 6 3, the worst was over • • • The Yale speech h ad not been in vain ; and the American government, a generation after General Theory, h ad accepted the Keynesian revolutio n . " Reearrling Key nes an d Roo s evelt, Sch lesinger h as w r itten, in The Age of Roosevelt, Houghton-Mif flin, Ros ton, 1 9 6 6, Vol. Ill, p. 2 3 6 : "T h e First New Deal, in the main, d istru s ted s pen din g . Its co nse r v atives, like Jo h n so n and Moley, were orthodox in their f iscal vie\\/S and wanted a balanced bud get; and its liberals, like Tugwell and La Follette, d isliked spen din g as a drug which gave the patient a f alse sense of well-bein g before s urgery could be completed. " p . li 0 3: " S h ortly after Roo sevelt ' s inauguratio n (1 9 3 3) Keyn e s spo ke o n ce again in a brilliant pam phlet called ' The Means to Prosp e r ity . ' Here h e argued with new f o rce an d d etail f o r public spen ding as th e way out o f the depressio n . Em plo yin g the concept of the ' multiplier ' , i n t r o d u c e d b y h i s s t u r.l e n t • . • 1 1 p • 11 0 4 : " ' U n f ortun ately, ' Key nes wrote in A pril 1 9 3 3, ' It see ms impossible in the world o f today to find an ythin g between a government which does nothing at all and o n e which goes right o f f the deep en d ! ' " p. 4 0 5 : " • • • on May 2 8 , 1 9 3 4, Key nes came to tea at the Hhite House . T h e meetin g d oes not seem to h ave been a success . " p , l.f 0 6 : " • . • to Frances Perkins Roosevelt comnlained stran gely, ' He lef t a who l e rigamarole o f figures . He rrius t be a matheriatician rather than a political economist. ' "

(2 5 ) Solow, Robert M. , "Tech n ical Change an d the Aggregate Productio n Functio n ", Reyiew Q.f. Economic statist ics, 1 9 5 7, p . 3 1 2 - 3 2 o . So 1 ow wr ites : 1 1 Mot only is delta A ove r A (the percentage in crease in the autonomous terr,) uncorrelated with K/ L, but one migh t almost conclude from the grap h that d elta A over A is essentially a con s t ant in time, ex hibiting more or less ran doM f luctuations about a f ixed mean . " Loo kin g at f igure 3, o n e notes a pos sible excentio n to this, which Solow admits is a very tentativ e co nclusio n : the growth of th is term might h ave actually b e e n f aster, slightly, during the depths o f the depressio n (actually, laggin g it by three years in the graph ), th an un der normal con ditio ns .

Pa.�e V- 6 0

(2 6 ) E ssays in Sociolog�. f rom Max Weber, introduce d b y Talcott Parsons . Weber ' s con ce pt o f "ratio n aliz ation", as a tn�nd extendi n g from the 11 1 Conce pt I of Pl ato 11 to the "cage" o f modern machine civiliz ation, doe s amou nt to a long-term vision of history .

(2 7) See note 7 . { 2 8 ) See note 4 .

(2 9 ) See note 5 .

(3 0) McNeil, W . H . , The Rise of the \'JesL Mentor, NY, 1 9 6 5 . A s ide from the title, the themes are a bit too compl ex to summarize here . Man y of them are reminiscent of Turner, note 6 . Throu ghout the boo k, however, McNeil does kee p returning to the theme of human societie s adapte d to the p astoral niche as providing the soil on which new civil iz atio ns may develop or to which old civiliz ation s may spre ad, as the old heartl and decays . (3 1 ) E isenstadt, S . N . , The Poli tical Svstems Qf. f:npires, Free Pre ss of Gl encoe, 1 9 6 3 . Chapter 2 attempts to e xpl ain the "un ivers al states" of the Toyn bee and S pengler theories, al most the same societies . (3 2 ) See notes 4 and 5 .

{ 3 3) Lorenz, Konrad, On Aggression, H arcourt Rrace and Worl d, N Y, 1 9 6 6 , transl ate d b y M . W il s on • This source is al read y pop ular among some political scientists . d u s t as rel evant may b e Sim p son, George Gaylord, The Ma i or Features of Evolu tio n, Columbia University Pres s, N Y, 1 9 5 3, p . 3 9 1 : "The po pul ation s making a quantun shift (e . g. evol ution of human intelligence) do not l ose adaptation altogether ; to do so is to become extinct . I t is al so clear that .th.a direction Qf. change is adaptive, unless il � rn beginning • • • Yet the ve r y f act that selection pre s su re Is strong can onl y be a conconittant of movement from a more poorl y to a better adapte d status. Sele c tion LlQ.t. l inear h.Y..t. cent r ipe tal l:Lb,e.n adaptation perfected . I t is the ' stabilizing sel ection ' • • • The quantum chan�e is a b reak-through from one port i on of stabilizi n g sel ection to another . " The


Par,;e V-6 1

Indian caste systen, or the e arly C aribbean system of C arib pre d ators and Arawaks, are interesting e x am ple s of centripe t al d evelopment among humans in relatively static ecolo�ie s / economies. p. 3 9 2 : " Quantum evolution usually is and at some lev el it m ay always be involved in the o pening or so-called ' explosive ' ph ase of adaptive rad iation. T h e relative rapidity with which a variety of ad aptation zone s are t h en occu pied s eems q uite ine x plicable e xce pt by a series of (? ) and also, often, successive quantum s hifts into the varied zones. The rates thereafter slow dovm. 1 1

(34 ) See notes 3 7 , 3 and 1 0 . Also: Aron, R aymond , Main Sociological Though t, Vol. I , Basic C urrent� Rooks, 1 9 6 5 , trans lated by Harold Weaver, p . 4 : " American sociologists, in my own e xp erience, never tal k about laws of h is tory , first of all because th e y are not acquainte d with th em, and nex t because t h e y do not b elieve in th e i r e xistence. "


(3 5 ) T h e "multicausal ap proach " h as appeared t n h istorical rese arch , if our memory is correct, but th e sociologists - w ho find it h arder to retreat into simple narrativ e - h ave spoken much riore about the idea. See , for e x ample , Vernon, Glenn M. , Human Interaction: An Introduction ..:tQ. Sociology , Ronald Press , N Y , 1 9 6 5 , p. 3 0 and p. 8 0-8 1 especially . See also Mac i ver, Robert M. , Social Causation, Ginn and Co. , Boston, Mass. , 194 2.

( 3 6 ) Trevelyan, G.M . , Lill 8utob i ogra12hy gOQ Q ther Ess ay s , Longman, Green and Co. , 1 9 4 9 , London, p. 9 1: " T h e endles sly attractive game of speculating on th e might-have-beens of history can ne ver take us very far with sense or safety. For if one thing h ad been different, everything would th enceforth h av e been different - and in what way we cannot tell. • • As serious stud ents of history , ru.J. tl.e. m do is to watch and ..:tQ. inve s tigate, how t hing led ..:tQ. anoth er course act u ally ta ke n . T his pursuit is rend ered all th e more f ascinating and ronantic because w e know how very nearly it was all completely different. E x cept perh aps in terms of ph ilosoph y , no event \I/ as ' in e v itab le • ' 1 1 H is tor ians h ave of ten d is cu ss e d th e "turning points, " tiries w h en t h e susequent course of events v-muld h ave be en very d ifferent il

.Lo. � ™

.Lo. �

Page V-6 2

small events h ad wor k e d out dif f erently ; see, for example, Handlin, O s car, Choice QL Des t i ny : Turning Points ln American Histo ry,. A tlantic Monthly Pre s s, Boston, Mas s . , 1 9 5 4 . However, as th e las t two chapters of this example make clear, it is us ually not consid ered acceptable to imagine j us t how the s ubs e q uent ev ents mi�ht h ave been dif f erent, concretely .

(3 7 ) It has been s ugg e s ted that diff erences in be h avior of " high " and "low " situations may make such a treatment valuahle, or that an actual division of the 1;1Jorld into "hig h " and "lov.r" make s it d esirable. However, j us t because o ur s ample i s weak in the mirl