Toward a Formal Science of Economics: The Axiomatic Method in Economics and Econometrics [1 ed.] 0262192845

Toward a Formal Science of Economics provides a unifying way to look at the concept of economic science. It lays a found

559 108 20MB

English Pages 1033 [1045] Year 1990

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Toward a Formal Science of Economics: The Axiomatic Method in Economics and Econometrics [1 ed.]
 0262192845

Table of contents :
Toward a Formal Science of Economics
CONTENTS
1— Introduction
1.1— The Need for a Formal Unitary Methodological Basis for the Science of Economics
1.2— The Axiomatic Method and the Development of a Formal Science of Economics
1.2.1— The Rise of Formal Economics
1.2.2— The Rise of Formal Logic
1.2.3— The Development of a Formal Science of Economics
1.3— Formalism and the Unity of Science
1.3.1— The Unity of Science
1.3.2— Advantages of Formalism in Science
1.3.3— Formalism, Formalization, and the Scientific Method
1.4— Noteworthy Results
1.4.1— Parts I and II: Mathematical Logic
1.4.2— Part III: Consumer Choice
1.4.3— Part IV: Chance, Ignorance, and Choice
1.4.4— Part V: Nonstandard Analysis
1.4.5— Part VI: Epistemology
1.4.6— Part VII: Empirical Analysis of Economic Theories
1.4.7— Part VIII: Determinism, Uncertainty, and the Utility Hypothesis
1.4.8— Part IX: Prediction, Distributed Lags, and Stochastic Difference Equations
1.5— Acknowledgments
2— The Axiomatic Method
2.1— Axioms and Undefined Terms
2.2— Rules of Inference and Definition
2.3— Universal Terms and Theorems
2.4— Theorizing and the Axiomatic Method
2.5— Pitfalls in the Axiomatic Method
2.6— Theories and Models
2.7— An Example
I— MATHEMATICAL LOGIC I: FIRST-ORDER LANGUAGES
3— Meaning and Truth
3.1— A Technical Vocabulary
3.1.1— Names
3.1.2— Declarative Sentences
3.1.3— Constants and Variables
3.1.4— Functions and Predicates
3.2— Logical Syntax
3.3— Semantics
3.4— The Semantic Conception of Truth
3.5— Truth and Meaning
4— Propositional Calculus
4.1— Symbols, Well-Formed Formulas, and Rules of Inference
4.1.1— Symbols
4.1.2— Well-Formed Formulas
4.1.3— Rules of Inference
4.2— Sample Theorems
4.3— The Intended Interpretation
4.3.1— Tautologies
4.3.2— Theorems and Tautologies
4.4— Interesting Tautological Structures
4.5— Disjunction, Conjunction, and Material Equivalence
4.5.1— Either-Or and Both-And Sentences
4.5.2— Material Equivalence
4.6— Syntactical Properties of the Propositional Calculus
4.7— Proof of the Tautology Theorem
5— The First-Order Predicate Calculus
5.1— Symbols, Well-Formed Formulas, and Rules of Inference
5.1.1— The Symbols
5.1.2— The Well-Formed Formulas
5.1.3— The Axioms
5.2— Sample Theorems
5.2.1— Equality
5.2.2— The Quantifiers
5.2.3— Material Equivalence
5.3— Semantic Properties
5.3.1— Structures
5.3.2— Structures and the Interpretation of a First-Order Language
5.3.3— Tautologies and Valid Well-Formed Formulas
5.3.4— Valid Well-Formed Formulas and Theorems
5.4— Philosophical Misgivings
5.4.1— For-All Sentences
5.4.2— There-Exist Sentences
5.5— Concluding Remarks
II— MATHEMATICAL LOGIC II: THEORIES AND MODELS
6— Consistent Theories and Models
6.1— First-Order Theories
6.2— Proofs and Proofs from Hypotheses
6.3— The Deduction Theorem
6.4— Consistent Theories and Their Models
6.5— The Compactness Theorem
6.6— Appendix: Proofs
6.6.1— A Proof of TM 6.10
6.6.2— A Proof of TM 6.11
6.6.3— A Proof of TM 5.19
6.6.4— A Proof of TM 6.13
7— Complete Theories and Their Models
7.1— Extension of Theories by Definitions
7.1.1— Predicates
7.1.2— Functions
7.1.3— Valid Definitional Schemes
7.2— Isomorphic Structures
7.3— Elementarily Equivalent Structures
7.4— Concluding Remarks
7.5— Appendix
8— The Axiomatic Method and Natural Numbers
8.1— Recursive Functions and Predicates
8.1.1— Recursive Functions
8.1.2— G[dieresis(o)]del's [beta] Function
8.1.3— Recursive Predicates
8.1.4— Sequence Numbers
8.2— Expression Numbers
8.3— Representable Functions and Predicates
8.4 Incompleteness of Consistent, Axiomatized Extensions of T(N)
8.5 The Consistency of T(P)
8.6— Concluding Remarks
9— Elementary Set Theory
9.1— The Axioms of KPU
9.2— The Null Set and Russell's Antinomy
9.3— Unions, Intersections, and Differences
9.3.1— Unions
9.3.2— Intersections and Differences
9.4— Product Sets
9.5— Relations and Functions
9.6— Extensions
9.7— Natural Numbers
9.8— Admissible Structures and Models of KPU
9.9— Concluding Remarks
III— ECONOMIC THEORY I: CONSUMER CHOICE
10— Consumer Choice under Certainty
10.1— Universal Terms and Theorems
10.2— A Theory of Choice, T(H 1,
10.2.1— Axioms
10.2.2 The Intended Interpretation of T(H. . . , H 6)
10.2.3— Sample Theorems
10.3— The Fundamental Theorem of Consumer Choice
10.4— The Hicks-Leontief Aggregation Theorem
11— Time Preference and Consumption Strategies
11.1— An Alternative Interpretation of T(H 1,
11.2— The Time Structure of Consumer Preferences
11.2.1— Independent Preference Structures
11.2.2— Stationary Preference Structures
11.3— The Rate of Time Preference and Consumption Strategies
11.3.1— The Induced Ordering of Consumption Strategies
11.3.2— Irving Fisher's Rate of Time Preference
11.3.3— Stationary Price Expectations and Monotonic Optimal Consumption Strategies
11.3.4— Optimal Consumption Strategies and Age
11.4— Consumption Strategies and Price Indices
12— Risk Aversion and Choice of Safe and Risky Assets
12.1— An Axiomatization of Arrow's Theory
12.1.1— The Axioms
12.1.2— The Intended Interpretation
12.1.3— Sample Theorems
12.2— Absolute and Proportional Risk Aversion
12.2.1— The Absolute Risk-Aversion Function
12.2.2— Absolute Risk Aversion and Ordering of ([mu], m) Pairs
12.2.3— Absolute Risk Aversion and Investment in Risky Assets
12.2.4— The Proportional Risk-Aversion Function
12.3— The Fundamental Theorems of Arrow
12.3.1— Risky Assets and Absolute Risk Aversion
12.3.2— Safe Assets and Proportional Risk Aversion
12.4— New Axioms
12.5— An Aggregation Problem
12.6— Resolution of the Aggregation Problem
12.6.1— Preliminary Remarks
12.6.2— The Separation Property
12.6.3— Arrow's Theorems and the Separation Property
12.7— Appendix: Proofs
12.7.1— Proof of T 12.1
12.7.2— Proof of T 12.2
12.7.3— Proof of T 12.4
12.7.4— Proof of T 12.5
12.7.5— Proof of T 12.6
12.7.6— Proof of T 12.7
12.7.7— Proof of T 12.8 and T 12.9
12.7.8— Proof of T 12.10
12.7.9— Proof of 12.11
12.7.10 Proof of T 12.13
12.7.11— Proof of T 12.14
12.7.12— Proof of T 12.15
12.7.13— Proof of T 12.16
12.7.14— Proof of T 12.17
13— Consumer Choice and Revealed Preference
13.1— An Alternative Set of Axioms, S 1 ,
13.2— The Fundamental Theorem of Revealed Preference
13.2.1— A Rough Contour of S[sup (+)](x[sup(0)])
13.2.2— Salient Characteristics of the Lower Boundary Points of S[super(+)](x[super(0)])
13.2.3— Characteristics of Vectors in S[sup(+)](x[sup(0)]) [Union] (R[sup(n)sub(+)] – [overline (S[s...
13.2.4— The Fundamental Theorem
13.3— The Equivalence of T(S 1 ,
13.3.1— A Counterexample
13.3.2— Homothetic Utility Functions and the Fundamental Theorem
13.3.3— Additively Separable Utility Functions and the Fundamental Theorem
13.4— Concluding Remarks
14— Consumer Choice and Resource Allocation
14.1— Competitive Equilibria in Exchange Economies
14.1.1— A Scenario for Commodity Exchange
14.1.2— Competitive Equilibria in E
14.2— Resource Allocation in Exchange Economies
14.2.1 Pareto-Optimal Allocations and Fair Allocations
14.2.2— Pareto-Optimal Allocations and Competitive Equilibria
14.3— The Formation of Prices in an Exchange Economy
14.3.1— On the Stability of Competitive Equilibria
14.3.2— Concluding Remarks
14.4— Temporary Equilibria in an Exchange Economy
14.4.1— Consumption-Investment Strategies
14.4.2— The Current-Period Utility Function
14.4.3— Current-Period Temporary Equilibria
14.4.4— Feasible Sequences of Temporary Equilibria
14.5— Admissible Allocations and Temporary Equilibria
14.6— On the Stability of Temporary Equilibria
IV— PROBABILITY THEORY: CHANCE, IGNORANCE, AND CHOICE
15— The Measurement of Probable Things
15.1— Experiments and Random Variables
15.1.1— Events
15.1.2— Random Variables
15.2— Belief Functions
15.2.1— Basic Probability Assignments and Belief Functions
15.2.2— Orthogonal Sums of Belief Functions
15.2.3— Support Functions
15.2.4— Additive Versus Nonadditive Belief Functions
15.2.5— Additive Belief Functions
15.3— Probability Measures
15.3.1— Finitely Additive Probability Measures
15.3.2— The Bayes Theorem
15.3.3— Posterior Probabilities and Conditional Belief Functions
15.3.4— [sigma]-Additive Probability Measures
15.4— Probability Distributions
15.4.1— The Probability Distribution of a Random Variable
15.4.2— The Joint Probability Distribution of n Random Variables
15.4.3— Integrable Random Variables
15.4.4— Probability Distributions in Econometrics
15.4.5— Convergence in Distributions
15.5— Random Processes and Kolmogorov's Consistency Theorem
15.5.1— Random Processes
15.5.2— Kolmogorov's Consistency Theorem
15.5.3— The Measurement of Random Processes
15.6— Two Useful Universal Theorems
16— Chance
16.1— Purely Random Processes
16.1.1— Independent Events and Variables
16.1.2— A Purely Random Process
16.2— Games of Chance
16.2.1— The Absence of Successful Gambling Systems
16.2.2— The Arc Sine Law
16.2.3— The Classical Ruin Problem
16.3— The Law of Large Numbers
16.3.1— Tail Events and Functions
16.3.2— Kolmogorov's Strong Law of Large Numbers
16.3.3— The Central Limit Theorem
16.4— An Empirical Characterization of Chance
16.4.1— The Collectives of Von Mises
16.4.2— Church's Concept of Chance
16.5— Chance and the Characteristics of Purely Random Processes
17— Ignorance
17.1— Epistemic Versus Aleatory Probabilities
17.1.1— Risk and Epistemic Probability
17.1.2— Uncertainty and the Principle of Insufficient Reason
17.1.3— Modeling Ignorance [grave(a)] La Laplace and Edgeworth
17.1.4— Measuring Uncertainty with Entropy
17.2— The Bayes Theorem and Epistemic Probabilities
17.2.1— Learning by Observing
17.2.2— An Example
17.2.3— A Paradox
17.3— Noninformative Priors
17.3.1— Locally Uniform Priors
17.3.2— Exact Data-Translated Likelihoods
17.3.3— Approximate Data-Translated Likelihoods
17.4— Measuring the Performance of Probability Assessors
18— Exchangeable Random Processes
18.1— Conditional Expectations and Probabilities
18.2— Exchangeable Random Variables
18.2.1— Finite Sequences of Binary Exchangeable Random Variables
18.2.2 Sequences of Infinitely Many Binary Exchangeable Variables
18.2.3— Integrable Exchangeable Random Processes
18.3— Exchangeable Processes and Econometric Practice
18.3.1— Consistent Parameter Estimates
18.3.2— Finite-Sample Interval Estimates
18.3.3— Concluding Remarks
18.4— Conditional Probability Spaces
18.4.1— Conditional Probability Spaces
18.4.2— Renyi's Fundamental Theorem
18.5— Exchangeable Processes On a Full Conditional Probability Space
18.6— Probability Versus Conditional Probability
19— Choice under Uncertainty
19.1— The Decision Maker and His Experiment
19.1.1— The Decision Maker
19.1.2— The State of the World
19.1.3— Acts and Consequences
19.2— The Decision Maker's Risk Preferences
19.2.1— Risk Preferences
19.2.2— The Sure-Thing Principle
19.2.3— Constant Acts
19.3— Risk Preferences and Subjective Probability
19.3.1— Bets and Prizes
19.3.2— Qualitative Probability
19.3.3— Subjective Probability
19.4— Expected Utility
19.4.1— Savage's Fundamental Theorem
19.4.2— Measurable Utility
19.4.3— Expected Utility with a Finite Number of States of the World
19.5— Assessing Probabilities and Measuring Utilities
19.5.1— Assessing Subjective Probabilities
19.5.2— Measuring Utility Functions
19.5.3— A Test of Savage's Theory
19.6— Belief Functions and Choice under Uncertainty
19.6.1— Belief Functions and the Axioms of Savage
19.6.2— Belief Functions, Qualitative Probability, and Expected Utility
19.6.3— Belief Functions and Uncertainty Aversion
19.6.4— Examples
19.6.5— Concluding Remarks
V— NONSTANDARD ANALYSIS
20— Nonstandard Analysis
20.1— The Set of Urelements U
20.1.1— The Axioms for U
20.1.2— Structural Characteristics of U
20.2— A Model of the Axioms for U
20.2.1— Free Ultrafilters over N
20.2.2— An Ordered Field of Hyperreal Numbers *R
20.3— Elementarily Equivalent Structures and Transfer
20.3.1— Two Elementarily Equivalent Structures
20.3.2— Transfer
20.3.3— Transfer Versus Elementary Extension of Structures
20.4— Superstructures and Superstructure Embeddings
20.4.1— Superstructures W(·) over Sets of Urelements
20.4.2— The Superstructure over R
20.4.3— Superstructure Embeddings
20.5— Transfer and Superstructure Embeddings
20.5.1— Leibniz's Principle
20.5.2— [stroke(L)]os's Theorem and the Validity of Leibniz's Principle
20.6— Internal Subsets of W(*R)
20.6.1— A Classification of the Elements of W(*R)
20.6.2— Elementary Properties of Internal Sets
20.6.3— Hyperfinite Sets in W(*R)
20.7— Admissible Structures and the Nonstandard Universe
20.7.1 Admissible Structures
20.7.2— Cardinal Numbers
20.7.3— An Admissible Model of T
20.7.4— Admissible Models of T and Superstructures
21— Exchange in Hyperspace
21.1— The Saturation Principle
21.1.1— The Saturation Principle
21.1.2— Useful Consequences
21.2— Two Nonstandard Topologies
21.2.1— The * Topology
21.2.2— The S Topology
21.3— Exchange in Hyperspace by Transfer
21.3.1— A Hyperfinite Exchange Economy
21.3.2— A Nonstandard Version of a Theorem of Debreu and Scarf
21.4— Exchange in Hyperspace Without Transfer
21.4.1— On Exchange in the S Topology
21.4.2— An Auxiliary Lemma
21.4.3— The Fundamental Equivalence
21.5— Concluding Remarks
22— Probability and Exchange in Hyperspace
22.1— Loeb Probability Spaces
22.2— Standard Versions of Loeb Probability Spaces
22.2.1— Examples
22.2.2— A Hyperfinite Alias of Lebesgue's Probability Space
22.3— Random Variables and Integration in Hyperspace
22.3.1— Random Variables in Hyperspace
22.3.2— Integration in Hyperspace
22.4— Exchange in Hyperspace Revisited
22.5— A Hyperfinite Construction of the Brownian Motion
22.5.1— Independent Random Variables in Hyperspace
22.5.2— Brownian Motion
22.5.3— The Wiener Measure
VI— EPISTEMOLOGY
23— Truth, Knowledge, and Necessity
23.1— The Semantical Concept of Truth Revisited
23.2— Truth and Knowledge
23.3— The Possibility of Knowledge
23.3.1— The Universe Is Not Empty, PE 1
23.3.2— Induction
23.3.3— The Uniformity of Nature, PE 2
23.3.4— Identity and the Closest-Continuer Schema
23.3.5— Analogy
23.3.6— The Principle of Limited Variety, PLV
23.4 Different Kinds of Knowledge
23.4.1 Knowledge of Logical Propositions
23.4.2— Knowledge of Extralogical Propositions
23.4.3— Knowledge of Variable Hypotheticals
23.4.2.1— Knowledge by Definition, Analysis, Intuition, and Enumeration
23.4.3.2— Accidental, Nomological, and Derivative Laws
23.5— Necessity and Modal Logic
23.5.1— A Modal-Logical System, ML
23.5.2— Sample Theorems in ML
23.5.3— The Intended Interpretation of ML
23.5.4— Salient Properties of the Intented Interpretation of ML
23.5.5— Universals, Nomological Laws, and Modal Logic
23.5.6— Concluding Remarks
24— The Private Epistemological Universe, Belief, and Knowledge
24.1— The Private Epistemological Universe
24.1— A Reformulation of PLV, PE 3
24.1.2— Epistemological Universes
24.1.3— A Private Universe for the Theory of Knowledge and PE 4
24.2— Logical Probabilities and Their Possible-World Interpretation
24.2.1— Additive Logical Probabilities
24.2.2— Superadditive Logical Probabilities
24.2.3— Concluding Remarks
24.3— An Axiomatization of Knowledge
24.3.1— The Symbols
24.3.2— The Logical Axioms
24.3.3— The Nonlogical Axioms
24.3.4— The Rules of Inference
24.5— The Intended Interpretation of EL
24.3.6— Salient Properties of the Interpretation of EL
24.3.7— Theorems of EL
24.3.7.1— Useful Properties of P(·|·)
24.3.7.2— Good Inductive Rules of Inference and the Properties of P(·|·)
24.3.7.3— The Existence of P(·|·)
24.3.7— Theorems Concerning Kn(·) and Bl(·x)
24.3.7.5— Substitution in Referentially Opaque Contexts
24.3.7.6— The Epistemological Concept of Truth
24.4— Other Concepts of Knowledge
24.4.1— Peirce's Concept of Knowledge
24.4.2— Hintikka's Concept of Knowledge
24.4.3— Chisholm's Concept of Knowledge
24.4.4— Sundry Comments and a Look Ahead
25— An Epistemological Language for Science
25.1— Simple, Autonomous Relations
25.2— Analogy and the Generation of Scientific Hypotheses
25.2.1— Analogy and Inductive Inference
25.2.2— Models
25.2.3— Representative Individuals and Aggregates
25.2.4— Observations, Theoretical Hypotheses, and Analogy
25.3— Induction and Meaningful Sampling Schemes
25.4— Many-Sorted Languages
25.4.1— The Symbols
25.4.2— The Terms and the Well-Formed Formulas
25.4.3— The Axioms and the Rules of Inference
25.4.4— Sample Theorems
25.4.5— Structures and the Interpretation of Many-Sorted Languages
25.5— Semantic Properties of Many-Sorted Languages
25.6— A Language for Science
25.7— A Modal-Logical Apparatus for Testing Scientific Hypotheses
25.8— Appendix: Proof of the Completeness Theorem for Many-Sorted Languages
25.8.1— Predicate-Calculus Aliases of Many-Sorted Languages
25.8.2— The Completeness Theorem
VII— ECONOMETRICS I: EMPIRICAL ANALYSIS OF ECONOMIC THEORIES
26— Empirical Analysis of Economic Theories
26.1— Four Kinds of Theorems
26.2— The Structure of an Empirical Analysis
26.2.1— The Undefined Terms: S, ([Omega], [script (F)]), and P(·)
26.2.2— The Axioms Concerning [Omega]
26.2.3— The Axioms Concerning P(·) and ([Omega], [script(F)])
26.2.4— Sample Theorems
26.2.5— Testing an Economic Theory
26.3— New Axioms and New Tests
26.3.1— New Axioms Versus New Tests
26.3.2— An Example
26.4— Superstructures, Data-Generating Mechanisms, the Encompassing Principle, and Meaningful Sampli...
27— The Permanent-Income Hypothesis
27.1— Formulation of the Hypothesis
27.2— The Axioms of a Test of the Certainty Model: F 1,
27.3— Theorems of T(F 1,
27.4— Confronting T(F 1,
27.4.1— Budget Data Versus Time-Series Data
27.4.2— A Factor-Analytic Test
27.4.3— The Rate of Time Preference and the Human-Nonhuman Wealth Ratio
27.4.4— Concluding Remarks
27.5— A Test of the Uncertainty Version of Friedman's Theory
27.5.1— New Axioms
27.5.2— New Theorems
27.5.3— The Test
27.5.4— Concluding Remarks
27.6— Appendix: Standard Errors of Factor-Analytic Estimates
27.6.1— The Asymptotic Distribution of the Sample Covariance Matrix
27.6.2— The Asymptotic Distribution of Factor-Analytic Estimates
27.6.3— Bootstrap Estimates of Factor-Analytic Parameters
28— An Empirical Analysis of Consumer Choice among Risky and Nonrisky Assets
28.1— The Axioms of the Empirical Analysis
28.1.1— Axioms Concerning the Components of [omega][sub(T)]
28.1.2— Axioms Concerning the Components of [omega][sub(p)]
28.1.3— Axioms Concerning the Images of F
28.1.4— An Example
28.1.5— Axioms Concerning P(·) and [script(F)]
28.2— Arrow's Risk-Aversion Functions and the Data
28.2.1— The Data and the Axioms
28.2.2— Sample Theorems
28.2.3— An Indirect Test of SA 7 and SA 11-SA 17
28.2.4— A Test of Arrow's Hypotheses
28.3— Comparative Risk Aversion
28.3.1— One-Way Analysis of Variance: Theory
28.3.2— One-Way Analysis of Variance of the Data
28.3.3— Two-Way Analysis of Variance: Theory
28.3.4— Multiple-Classification Analysis of the Data
28.3.5— Education and Income
28.4— Concluding Remarks
VIII— ECONOMIC THEORY II: DETERMINISM, UNCERTAINTY, AND THE UTILITY HYPOTHESIS
29— Time-Series Tests of the Utility Hypothesis
29.1— A Nonparametric Test of the Utility Hypothesis
29.2— Testing for Homotheticity of the Utility Function
29.3— Testing for Homothetic Separability of the Utility Function
29.4— Excess Demand Functions and the Utility Hypothesis
29.4.1— Testing the Utility Hypothesis with Group Data That Satisfy GARP
29.4— Testing for the Homotheticity of Individual Utility Functions with Group Data That Satisfy GAR...
29.4.3— A Characterization of Excess Demand Functions
29.4.5— Constructing a "Test" of the Utility Hypothesis When the Group Data Do Not Satisfy GARP.
29.4.6— Summing up
29.5— Nonparametric Versus Parametric Tests of the Utility Hypothesis and a Counterexample
30— Temporary Equilibria under Uncertainty
30.1— The Arrow-Debreu Consumer[sup(2)]
30.1.1— Nature
30.1.2— The Consumer
30.1.3— Markets and Expenditure Plans
30.1.4— Concluding Remarks
30.2— The Radner Consumer
30.2.1— Notational Matters
30.2.2— The Consumer
30.2.3— Markets and Expenditure Plans
30.2.4— Concluding Remarks
30.3— Consumer Choice under Uncertainty
30.3.1— Definitional Axioms
30.3.2— The Intended Interpretation
30.3.3— Axioms Concerning the Properties of V(·) and Q(·)
30.3.4— The Fundamental Theorem of Consumer Choice under Uncertainty
30.3.5— Concluding Remarks
30.4— The Arrow-Debreu Producer
30.5— Entrepreneurial Choice under Uncertainty
30.5.1— Definitional Axioms
30.5.2— The Intended Interpretation
30.5.3— Axioms Concerning the Properties of g(·), V(·), and Q(·)
30.5.4— The Fundamental Theorem of Entrepreneurial Choice under Uncertainty
30.6— Temporary Equilibria under Uncertainty
30.6.1— Notational Matters
30.6.2— Axioms for a Production Economy
30.6.3— The Existence of Temporary Equilibria
30.6.4— Concluding Remarks
30.7— Appendix: Proofs of Theorems
30.7.1— Proof of T30.1 and T 30.2
30.7.2— Proof of T 30.5
30.7.3— Proof of T 30.7
31— Balanced Growth under Uncertainty[sup(1)]
31.1— Balanced Growth under Certainty
31.1.1— The Indecomposable Case
31.1.2— The Decomposable Case
31.2— Balanced Growth under Uncertainty in an Indecomposable Economy
31.2.1— Balanced Growth When n = 1
31.2.2— Balanced Growth When n [greater/equal to] 2
31.3— Balanced Growth under Uncertainty in a Decomposable Economy
IX— ECONOMETRICS II: PREDICTION, DISTRIBUTED LAGS, AND STOCHASTIC DIFFERENCE EQUATIONS
32— Distributed Lags and Wide-Sense Stationary Processes
32.1— A Characterization of Wide-Sense Stationary Processes
32.1.1— Examples
32.1.2— Orthogonal Set Functions and Stochastic Integrals[sup(1)]
32.1.3— The Spectral Distribution Function
32.1.4— The Spectral Representation of a Wide-Sense Stationary Process
32.2— Linear Least-Squares Prediction
32.2.1— The Best Linear Least-Squares Predictor
32.2.2— Examples
32.2.3— Wold's Decomposition Theorem
32.2.4 Kolmogorov's Theorem
32.3— Distributed Lags and Optimal Stochastic Control
32.3.1— Distributed Lags
32.3.2— Examples
32.3.3— A Stochastic Control Problem
32.3.4— Rational Distributed Lags and Control
33— Trends, Cycles, and Seasonals in Economic Time Series and Stochastic Difference Equations
33.1— Modeling Trends, Cycles, and Seasonals in Economic Time Series
33.1.1— Trends
33.1.2— Cycles and Seasonals
33.1.3— Concluding Remarks
33.2— ARIMA Processes
33.2.1— The Short and Long Run Behavior of ARIMA Processes
33.2.2— An Invariance Principle and the Associated Wiener Measures
33.2.3— The Invariance Principle and the Long Run of ARIMA Processes
33.3— Dynamic Stochastic Processes
33.3.1— A Definition and Illustrative Examples
33.3.2— Fundamental Theorems
33.4— Concluding Remarks On Multivariate Dynamic Stochastic Processes
34— Least Squares and Stochastic Difference Equations
34.1— The Elimination of Trend, Cycle, and Seasonal Factors in Time Series
34.1— Basic Assumptions
34.1.2— Linear SCT-Adjustment Procedures
34.1.3— Notational Matters
34.1.4— Least-Squares Estimates of the Deterministic Components of a Time Series
34.1.5— Removal of Seasonal, Cyclical, and Trend Factors in Time Series
34.2— Estimating the Coefficients in a Stochastic Difference Equation: Consistency
34.2.1— Equations with Fixed Initial Conditions
34.2.2— Equations with Random Initial Conditions: Special Cases
34.2.3— Equations with Random Initial Conditions: The Fundamental Theorem
34.2.4— Concluding Remarks
34.3— Estimating the Coefficients in a Stochastic Difference Equation: Limiting Distributions
34.3.1— Equations with Fixed Initial Conditions
34.3.2— Equations with Random Initial Conditions
34.3.3— A Simulation Experiment
34.4— Concluding Remarks
NOTES
Chapter 1
Chapter 2
Chapter 3
Chapter 5
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Chapter 14
Chapter 15
Chapter 16
Chapter 17
Chapter 18
Chapter 19
Chapter 21
Chapter 22
Chapter 23
Chapter 24
Chapter 25
Chapter 26
Chapter 27
Chapter 28
Chapter 29
Chapter 30
Chapter 31
Chapter 32
Chapter 33
Chapter 34
BIBLIOGRAPHY
INDEX
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
W
Z

Citation preview

Page iii

Toward a Formal Science of Economics The Axiomatic Method in Economics and Econometrics Bernt P. Stigum

 

Start of Citation[PU]MIT Press[/PU][DP]1990[/DP]End of Citation   

The MIT Pf~" C.mbridg•. M"5K~ .... t
[q::::>

::::>

[[p ::::> q] ::::> [p ::::> r]]]

pll

Whatever their meanings, the expressions in these axioms are called for-

The Propositional Calculus

55

mulas. They are well formed (wf) if and only if their being well formed follows from the following formation rules: FR 1

A propositional variable standing alone is a wf formula (wff).

FR 2

If r is wf, then """ r is wf.

FR 3

If r and

~

are wE, then [r

::J

~]

is wE.

lt is easy to verify that the formulas in axioms LA I-LA 3 are wf. For more

complicated formulas, a method exists by which one may determine in a finite number of steps whether such formulas are wf. The procedure is described in Church 1956 (pp. 70-71, 121-123). 4.1.3 Rules of Inference

The formation rules tell us how to produce new well-formed formulas (wffs) out of existing wffs. Some of the wffs are theorems. A theorem is either an axiom or derived from the axioms with the help of the rules of inference that we postulate. For the propositional calculus developed in this chapter, we assume the following two rules of inference: RI 1

(Modus Ponens): Let A and B be wffs. From [A

::J

B] and A, we infer B.

RI2 Let B be a wff, and let b be a propositional variable. Moreover, let S~AI be the wff that results from substituting B for each occurrence of b in A. From A, we infer S~AI·

The meaning of RI 1 is clear, but the idea behind RI 2 might not be. In short, RI 2 insists that if we can assert A no matter what b expresses, we can also assert 5~A I no matter what B asserts. Examples of the way in which RI 1 and RI 2 are applied are given in section 4.2. In section 4.3 I show how one can determine in a finite number of steps whether a wff of the propositional calculus is a theorem. 4.2 Sample Theorems

There are many theorems in my symbolic language. Examples are the axioms LA l-LA3 given above and T 4.1 at:1d T 4.2 given below. The latter theorems have been chosen to illustrate different aspects of how theorems can be proved in the propositional calculus. T 4.1

[p::J p]

56

Chapter 4

Proof

[[p::J [q

::J

r]]

::J

[[p

::J

q]

::J

[p

::J

r]]]

[[p

::J

pJ]

::J

[[p

::J

q]

::J

[p

::J

p]]]

[p

::J

[q [q

::J

p]]

::J

[[p

::J

q]

::J

[p

::J

p]]

[[p

::J

[q

::J

p]]

::J

[p

[p

::J

::J

p]]

p]

The proof of this theorem is constructed in strict adherence to the definition of a proof given in chapter 3. We begin with LA 2. Then we substitute p for r and use RI 2. Thereafter we assert LA 1, apply LA 1 and Modus Ponens, substitute [q :::> p] for q, and use RI 2. Finally we apply LA 1 and use Modus Ponens to conclude [p :::> pl. Notice that there are no commas, periods, or semicolons in the proof; our symbolic language lacks these symbols. Proofs in mathematical logic of even simple theorems are quite involved. To facilitate and streamline such proofs, logicians frequently use derived rules of inference. These rules are established in the metalanguage as theorems about the object language. In proofs they are used on equal footing with the postulated rules of inference. It is understood, however, that the derived rules can be dispensed with. In the next theorem we establish a derived rule of inference: DRI 1

From a theorem T infer [q

::J

T].

I label the theorem TM 4.1 rather than T 4.2 in order to make explicit that the theorem belongs to the metalanguage, not the object language. TM 4.1 Proof

Let T be a theorem. Then [q

::J

T] is a theorem.

T [p

::J

[q

::J

p]]

[T

::J

[q

::J

T]]

[q

::J

T]

In this proof we assert T and LA 1. We then substitute T for p and use RI 2. Finally we apply Modus Ponens to conclude [q :::> T]. Note that we omitted the sequence of formulas that constitute the proof of T. For cases in which TM 4.1, can be used, T often emerges at the end of such a sequence. Then TM 4.1 allows us to pass directly from T to [q:::> T] without the two intervening steps. In the proof of T 4.2 below we illustrate the usefulness of TM 4.1.

The Propositional Calculus

57

A variant of a wff A is a wff obtained from A by alphabetic changes of the variables such that any two occurrences of the same variable in A remain occurrences of the same variable and such that any two occurrences of distinct variables in A remain occurrences of distinct variables. A variant proof is a proof in which a variant of an axiom rather than the axiom itself is asserted in one or more steps. To establish the validity of a variant proof, it suffices to show that a variant of a theorem is a theorem. From this it follows that the final wff in a variant proof must be a theorem as well. The validity of a variant proof is an immediate consequence of the following metatheorem: Let bi , .•• , bn denote n different variables and let Sk::::kAI denote the formula that results from the simultaneous substitution of the wffs, Bt , ..• , Bn , for bt , .•. , bn in A. If T is a theorem, then S~:::·.::k TI is also a theorem.

TM4.2

A proof of TM 4.2 can be obtained in a finite number of steps by repeated applications of RI2 (see Church 1956, pp. 82-83). The theorem therefore establishes the validity not just of variant proofs, but of a derived rule of inference: DRI2

From a theorem T infer Sk::::k TI.

Below we use both DRI 1 and DRI2 to construct a variant proof of a theorem belonging to the object language: ~

T4.2

[-p

Proof

[[ -q [r

~

[p

~

~

qll

-p] ~ [p ~ u))

[[-q

~

-p]

~

[p

~

q]]]

~

t]]

~

[[r

~

s]

~

[r

~

[p

[[r

~

[s

[[r

~

[[-q

~

-p]

~

[p

~

[r

~

~

q]]]

t]]] ~

[[r

~

[-q

~

-p))

~

q]]]]

[[r ~ [- q ~ - pll ~ [r ~ [p ~ qJJ]

[[-p

~

~

[q

~

(-p

~

(-q

[p

~

[-q

-pll

~

[-p

~

[p

~

q]]]

p)) ~

-pJ]

[-p ~ [p ~ q]]

In this proof we first assert a variant of LA 3 and apply a variant of DRI 1. We then assert a variant of LA 2 and substitute ['" q :::::> '" p] and [p :::::> q] for 5 and t in accordance with DRI2. Thereafter we use Modus Ponens,

58

Chapter 4

substitute ~ p for r, and use RI 2. Finally we assert LA I, substitute ~ p for p and ~ q for q in accordance with ORI 2, and apply Modus Ponens to infer [ ~ p :=> [p :=> q]].

4.3 The Intended Interpretation

The intended interpretation of my symbolic language is as follows: The variables p, q, r, s, ... have as their domain a set of declarative sentences which are written in some definite language, say English, and have as their range two truth values, truth and falsehood. Let t denote truth, f falsehood. A wff consisting of a variable a alone has the value t for the value t of a, the value f for the value f of a. For a given assignment of values to the variables of the wff A, the value of ~ A is f if the value of A is t. The value of ~ A is t if the value of A is f. Finally, for a given assignment of values to the variables of the wffs A and B, the value of [A :=> B] is t if the value of A is f or if the value of B is t. The value of [A :=> B] is f if the value of B is f and the value of A is t. In the interpreted language, the symbol ~ is used as "not" is used in English. Thus if A is a declarative sentence written in English, ~ A can be read "not A." The meaning of :=>, the material implication sign, is less clear. Often [A :=> B] is read, "if A, then 8." Whenever A denotes t, this makes sense, because in this case [A :=> B] denotes t if and only if B is true. However, when A denotes f, [A :=> B] is true whether B is true or false. This seems to stretch the meaning of "if ... , then." However, if one is committed to a two-valued logic and insists that theorems denote truth and contradictions falsehood, TM 4. I and TM 4.3 and TM 4.4 below show that ours is the only possible interpretation of :=>. TM 4.3

If T is a theorem, ["" T:::J q] is a theorem.

TM 4.4

If T is a theorem, "" [T:::J "" T] is a theorem.

The validity of TM 4.3 and TM 4.4 can easily be established with the help of two useful theorems: T4.3

[p:::J """"p]

T 4.4

[[p:::J [q :::J r]] :::J [q :::J [p :::J r]]]

I leavE' the proofs of T 4.3 and T 4.4 and TM 4.4 to the reader. To prove TM 4.3, we assert T 4.3, substitute T for p, and use RI2 and RI I to deduce that ~ ~ T. Then we assert T 4.2, substitute ~ T for p, and use RI 2 to

The Propositional Calculus

59

deduce that I'" '" T ::J I'" T ::J q]). Finally, we apply RI 1 to conclude that ::J q). Strange or not strange, our interpretation of ::J has an interesting property: it allows one to assign a definite truth value to a wide class of sentences that under any other interpretation might have been considered incomplete or nonsensical. E 4.1 illustrates such a case.

I'" T

E 4.1 According to our interpretation, the sentence "If the weather is sunny and bright tomorrow, Borg will beat Tanner in the tennis finals" assumes the value f if and only if the weather tomorrow is sunny and bright and Borg does not beat Tanner. If the weather is bad, the prediction assumes the value t no matter who wins the match. To avoid misunderstanding, we usually take IA implies B" rather than "if A, then B."

::J

B) to read "A materially

4.3.1 Tautologies

At each point in its intended domain of definition, a variable p is a declarative sentence. From this and from the formation rules. it follows that once declarative sentences are substituted for variables in a well-formed formula, this formula becomes, in the intended interpretation of my symbolic language, a declarative sentence as well. In studying these formulas, we are interested in their abstract form, not their meaning. Our main intent is to ascertain what structure a declarative sentence must have in order that it denote truth no matter what it asserts. A wff that denotes truth no matter what it asserts is called a tautology. We show in table 4.1 that the interpreted versions of LA I and LA 3 are tautologies. Similar arguments show that LA 2 is also a tautology. In table 4.1 the formulas are the column heads, and their values for different combinations of values of p and q are listed below them. As asserted above. Ip ::J Iq ::J p)) denotes t regardless of the values assumed by p and q. The same is the true for II'" p ::J '" qJ ::J Iq ::J pJ]. Table 4.1 Truth table for LA 1 and LA 3. p

q

t

f

-p

-q

Iq::;)p]

[p::;)(q::;)p)]

[-p::;)-q\

ll-p::;)-q]::;)[q::;)pll

Chapter 4

60

4.3.2 Theorems and Tautologies

It is easy to demonstrate that RI 1 and RI 2 preserve tautologies; i.e., if their premises are tautologies, so are their conclusions (Church 1956; pp. 97 and 127). From this and from the fact that the axioms are tautologies, it follows that, in the intended interpretation of the propositional calculus, all the theorems we can derive are tautologies. Thus, as the reader can verify, T 4.1, T 4.2, and all the other theorems stated in this chapter are tautologies. The converse of the preceding observation (Le., a wff is a tautology only if it is a theorem) is also true and is proved in section 4.7. Consequently, we have TM 4.5, the Tautology Theorem. TM 4.5 A wff A is a theorem if and only if it is a tautology in the intended interpretation of the language.

This result is startling. I have previously defined the term "theorem" as it applies to my uninterpreted symbolic language. In this section we have interpreted the language and used the interpretation to give a complete characterization of the language's theorems. Moreover, this characterization provides a foolproof method for checking in a finite number of steps whether a wlf is a theorem: compute the wlf's truth table. 4.4 Interesting Tautological Structures

Sentences that denote truth because of their structure alone are interesting because they can be used to formulate derived rules of inference in mathematics. To see how, I shall next describe in English the characteristics of some of the most important tautologies in my symbolic language. I beg£n with T 4.1. According to our interpretation of ~, [p ~ p] can be a tautology if and only if it is always the case that either p is false or p is true, i.e., if and only if no excluded middle exists. Thus in our interpreted language [p ~ p] provides a succinct statement of the law of the excluded middle. To obtain a statement of the law of contradiction, we first observe that p and '" '" p have the same denotation. Thus both [p ~ '" '" p] and ['" '" p ~ p] must be theorems in our language. Bearing that observation in mind, we can state the law of contradiction (T 4.5). T 4.5

'" '" [p ~ '" '" p]

According to our interpretation, this law is a tautology if and only if the

The Propositional Calculus

61

assertion, p is true and not not p is false, is always false, Le., if and only if p and not p cannot ever both be true. Our translations of T 4.1 and T 4.5 may seem unconvincing even though they accord with the received "doctrine." The reason is that any two theorems in my symbolic language, say 5 and T, are equivalent in the sense that both [5 ~ T] and [T ~ 5] are theorems as witnessed by TM 4.1. It is hard to make the law of the excluded middle and the law of contradiction sound different when we (1) insist that the denotation of a name is unique, (2) allow only two truth values, and (3) insist that their symbolic prototypes, T 4.1 and T 4.5, mutually imply each other. The interpreted version of LA 3 is called the converse law of contraposition. Its positive half asserts that, if not q materially implies not p, then p materially implies q. Thus to show that q is a consequence of p, it suffices to show that not q implies not p. From this we see that the interpreted version of LA 3 postulates the validity of one of the most applied methods of proof in mathematics. Axiom LA 3 has a converse: T 4.6. T4.6

[[p:::J q]:::J

[~q:::J ~p]]

By combining the interpreted versions of LA 3 and T 4.6, we can deduce that p materially implies q if and only if not q materially implies not p. Or said another way: [p ~ q] and [ ~ q ~ ~ p] have the same denotation. This is verified by table 4.2. In addition to those just presented. we can derive several other laws which in their interpreted versions establish the validity of well-known methods of proof in mathematics. Two examples are the transitive law of material implication (T 4.7) and the law of reductio ad absurdum (T 4.8). T4.7

[[p:::J q]:::J [[q:::J r]:::J Ip:::J r]]]

T 4.8

[lp:::J q] :::J [lp:::J

~ql

:::J

~p]]

A third example is the law of assertion (T 4.9). Table 4.2 Truth table for lp

p

q

::>

q] and [-q ::> -pl.

-p

Ip ::> q)

Chapter

4

T 4.9

[p::J [[p

62

::J

q]

::J

q]]

The names of T 4.7 and T 4.9 give good characterizations of the structures of the sentences expressed by these laws. In T 4.8 we insist that p cannot materially imply both q and not q unless p denotes f and hence '" p denotes truth. 4.5 Disjunction, Conjunction, and Material Equivalence

In my propositional calculus we can assert "Per is a philosopher" and "Per is a physicist." We can also combine these two sentences into assertions such as "either Per is a physicist or Per is a philosopher" or "Per is a physicist and Per is a philosopher." The next section indicates how. 4.5.1 Either-Or and Both-And Sentences

To formulate either-or sentences, let us agree that, for any wjf B, A=df

B

means that "A is an abbreviation for B" and implies that A may be substituted for B whether B stands alone or forms part of a longer wff. Let [A v B]

=df [ '" A ~

B]

In our interpreted language [A v B] is true if and only if either A or B or both denote t; it is false if and only if both A and B denote f. Consequently, in our interpreted language [A v B] can be read as if it declared "either A or B or both." To formulate both-and sentences, we let

In our interpreted language [A /\ B] denotes truth if and only if both A and B denote t. It is false if and only if either '" A or '" B or both are true. Consequently, the formula [A /\ B] can be read as if it declared "both A and 8." The following useful theorems describe some of the properties of v and /\:

T 4.10

[p::J [q

T 4.11

[[p v p]

::J

p]

T 4.12

[[p /\ q]

::J

q]

V

p]]

The Propositional Calculus

T 4.13

63

[[[p v q] /\ [""p v r]] ::) [q v r]]

The meaning of these theorems is self-explanatory. For later reference it is interesting to note that T 4.10, T 4.11, and T 4.13, respectively, form the basis of three rules of inference which J. R. Shoenfield calls the expansion rule, the contraction rule, and the cut rule, i.e., infer [B v A] from A, infer A from [A v A], and infer [B v C] from [A V B] and ['" A v C]. (Shoenfield 1967, p. 21.) 4.5.2 Material Equivalence

In our interpreted language two sentences are said to be materially equivalent if they have the same denotation. We can express material equivalence in our language by letting [A

== B]

=df '"

[[A::::> B]

::::> '"

[B

::::>

All.

By consulting the truth table of the right-hand side of the definition of ==, we see immediately that [A == B] denotes truth if and only if A and B have the same denotation. It is false if and only if A and B have different denotations. Thus the expression [A == B] can be read as if it declared "A if and only if B." From our definition of /\, it follows that [A == B] is a different way of writing [[A ::::> B] 1\ [B ::::> All. The sentence [A == B] declares a relationship between two assertions A and B. Thus we can think of == as a relation among declarative sentences. This relation is reflexive, symmetric, and transitive according to the following theorems:

==

T4.14

[p

p]

T 4.15

Up == q] ::) [q == p]]

T 4.16

U[p

==

q] /\ [q

==

r]] ::) [p

==

r]]

A relation with the properties displayed in T 4.14- T 4.16 is called an equivalence relation. Since this particular relation is constructed from material implication, == is usually referred to as the relation of material equivalence. T 4.17- T 4.22 are examples of materially equivalent assertions. T4.17

Up v q] == [q v p]]

T 4.18

[[[p v q] V r]

T4.19

[""[P/\q]==[""pv""q]]

==

[p V [q v r]]]

64

Chapter 4

T4.20

["'[p V q]==["'P/\ "'q]]

T 4.21

Up /\ [q

T 4.22

[[p v [q /\ rJ]

V r]]

== Up /\ ==

q] V [p /\ r]]]

[[p v q] /\ [p

V r]]]

Of these, the first two are usually referred to as the complete commutative and associative laws of disjunction, the next two as de Morgan's laws for declarative sentences, and the last two as the distributive laws of conjunction and disjunction. We also note that T 4.18 can be used to formulate ]. R. Shoenfield's associative rule of inference: infer [[A v B) v C) from [A v [B v C)) (Shoenfield 1967, p. 21). Besides possessing the three properties of an equivalence relation, == also has a fundamental substitutive property which is described in the next two metatheorems: TM 4.6 If B results from A by substitution of C for D at zero or more places, and if [C == D] is a theorem, then [A == B] is also a theorem. TM 4.7 If B results from A by substitution of C for D at zero or more places, and if [C == D] and A are theorems, then B is also a theorem.

These two theorems cannot be established simply by consulting the truth tables of the relevant wffs. However, since TM 4.6 is a special case of TM 5.15, which we prove in chapter 5, and since TM 4.7 is an easy consequence of TM 4.6, I omit their proofs. 4.6

Syntactical Properties of the Propositional Calculus

Logistic systems such as the one I have described in this chapter mayor may not have certain properties that a user would require of a meaningful language. One is consistency: A language is consistent if it contains no sentence A with the property that both A and "" A are theorems. This property is important, since if both A and "" A are theorems, B is a theorem no matter what it asserts. This is demonstrated by applying T 4.2 and Modus Ponens: [""A::::> fA::::> B))

[A::::> B)

A B.

The Propositional Calculus

65

Our propositional calculus is consistent inasmuch as not all wffs are tautologies. Another property the reader might want a language to have is completeness: A language is complete if there is no wff A that is not a theorem and at the same time can be added as an axiom without causing inconsistency in the language. This property expresses the idea that everything we might justly hope to assert in the language actually can be said. How desirable a property it is, is debatable. More about that later. Here we merely observe that the completeness of my propositional calculus is a simple corollary of TM 4.5 (see Church 1956, pp. 110 and 128). A third property of a language that may be important is independence concerning the relations between its axioms, its rules of inference, and the theorems that can be derived from them. A set of axioms and a set of rules of inference are independent if it is impossible to delete one axiom or one rule of inference and still derive all the theorems that can be derived from the two original sets. This property is less vital than consistency and less interesting than completeness. A logician or mathematician might for convenience choose to work with a set of axioms and rules of inference that are not independent. Our LA 1-LA 3 and RII and RI2 are independent (see Church 1956, pp. 112-114 and 127). 4.7 Proof of the Tautology Theorem

In this section I shall outline a proof of the Tautology Theorem, i.e., TM 4.5. My proof makes use of two auxiliary theorems, T 4.23 and T 4.24. The first is an obvious strengthening of T 4.4. T 4.23

[[p::J [q ::J rJ]

==

[q ::J [p ::J rlJ]

The second insists that p and ,....., p cannot both imply q unless q is true. T 4.24

[[p::J q] ::J [[ "" P ::J q] ::J qJ]

I state both T 4.23 and T 4.24 without proof since the arguments needed to establish these theorems are easy to come by. We observed in section 4.3 that the axioms are tautologies and that the rules of inference preserve tautologies. From this and our inductive definition of a theorem, it follows that a theorem of the propositional calculus is a tautology. To prove the converse-i.e., that a tautology is a theorem-we begin by establishing the following auxiliary result.

66

Chapter 4

TM 4.8 Let A be a wff and let P1' ... , Pk be the propositional variables appearing in A. Moreover, let B be Aor "" A according as A denotes t or f and, for each i = 1, ... , k, let Bi be Pi or "" Pi according as Pi denotes t or f. Then [B 1 ::) [B2 ::) [ ••• ::) [1\ ::) B] . .. ]]] is a theorem.

The proof of TM 4.8 goes by induction on the number of connectives occurring in A. If there is no connective in A, then k = I, A is P1' B1 is B, and [B 1 ::::> B] becomes [B ::::> B], which is a theorem by T 4.1 and RI2. Suppose next that TM 4.8 is true for all wffs that contain less than n ocurrences of connectives and let A be a wff that contains exactly n occurrences of connectives. Then there exist wffs C and V containing less than n occurrences of connectives such that A is '" C or [C ::::> V]. In either case, C and V are uniquely determined. (See Church 1956, pp. 122-123, for a proof of this fact.) Suppose that A is '" C. Then C contains n - 1 occurrences of connectives. Let C\' be C or '" C according as C denotes t or f. Then B is '" '" C"" or C"" according as C denotes t or f. Moreover, by the induction hypothesis, we can assert that [B 1 ::::> [B2 ::::> [ ••• ::::> [1\ ::::> C""] ... ]]] is a theorem. From this, the material equivalence of C"" and", '" C"", and TM 4.7 follows the validity of TM 4.8 for the present case. Suppose next that A is [C ::::> V] and let C"" and V"", respectively, be Cor '" C and V or '" V according as C denotes t or f and V denotes t or f. Moreover, let Pt i = I, ., . I, and PSj' j = I, " . m, be the propositional variables occurring, respectively, in C and V. Then, by the induction hypothesis, we can assert that both [Btl ::::> [Bt2 ::::> [ ••• ::::> [Btl::::> C""] ... ]]] and [Bsl ::::> [BS2 ::::> [ ••• ::::> [Bsm ::::> V""] ... ]]] are theorems. From these theorems and repeated use of TM 4.1, T 4.23, DRI2, and TM 4.7, we deduce that [B 1 ::::> [B2 ::::> [ ••• ::::> [1\ ::::> C.. . ] ... ]]] and [B 1 ::::> [B2 ::::> [ ••• ::::> [1\ ::::> V . . . ] ... ]]] are theorems as well. There are several subcases. Suppose first that V"" is V. Then B is [C ::::> 0] and [0"" ::::> B] is a theorem by LA 1 and TM 4.2. From this it follows by standard arguments that TM 4.8 is valid for the given case. Suppose next that C"" is '" C and V"" is '" V. Then B is [C ::::> 0] and, by T 4.2 and TM 4.2, [C"" ::::> B] is a theorem. From this and the arguments used when V"" is 0, it follows that TM 4.8 is valid in this case as well. Finally, suppose that C"" is C and V"" is '" V. Then B is [C"" 1\ V""] and [C"" ::::> [V"" ::::> B]] is a theorem. From this and obvious arguments, it follows that TM 4.8 is valid now too and hence generally valid. So much for TM 4.8. To prove the if part of the Tautology Theorem, we now let A and the B j , i = I, ... , k, be as described in TM 4.8 and assume that A is a tautology. Then [B 1 ::::> [B2 ::::> [ ••• ::::> [1\ ::::> A] . .. ]]] is a theorem. j

'

The Propositional Calculus

67

Since A is a tautology, it must also be the case that ['" B1 :J [B 2 :J [Bk :J A] ]]] is a theorem. Consequently, by T 4.24 we deduce that [B 2 :J [B 3 :J [ :J [13,. :J A] . .. ]]] is a theorem as well. Repeated use of the same arguments suffices to demonstrate that A is a theorem. The proof of TM 4.8 was obtained by "induction on the number of connectives occurring in A." Such a proof may be unfamiliar to the reader. Hence I shall conclude the chapter with a remark on generalized inductive definitions and proofs by induction. A generalized inductive definition of a collection C of objects consists of a set of laws, each of which says that, under suitable hypotheses, an object x belongs to C. Examples are my definitions of theorems in section 3.2 and wffs in section 4.1. When we give such a definition, it is always understood that x E C only if it follows from the laws that x belongs to C. Suppose that a collection C of objects has been defined by a generalized inductive definition. To ascertain that all the members of C have a property P, it suffices to demonstrate that the objects having P satisfy the laws of the definition. Such a proof is called a proof by induction on objects in C. "The hypotheses in the laws that certain objects belong to C become, in such a proof, hypotheses that certain objects have property P; these hypotheses are called induction hypotheses" (Shoenfield 1967, p. 5). Our proof of TM 4.8 was obtained by induction on the length of wffs in accordance with FR 1-FR 3, the formation rules for wffs of the propositional calculus. In our propositional calculus, induction on the length of wffs amounts to induction on the number of connectives occurring in the wffs. [ ... :J

The First-Order Predicate Calculus

5

The symbolic language I presented in chapter 4 is not rich enough to serve our purposes. For example, we can assert "All physicists are philosophers," but we cannot infer from this and the fact that Per is a physicist that Per is a philosopher as well. We cannot even infer from "All ravens are black" that there exists a black raven. 5.1

Symbols, Well-Formed Formulas, and Rules of Inference

In this chapter we shall fill the lacunae in my language. In fact, I shall present, and adopt as my own in this text, a language-the first-order predicate calculus-that is so rich it can be used to formalize all the mathematical theories of interest to us. This language need not contain any propositional variables. Yet it is an extension of the propositional calculus in the following sense: Any theorem of the propositional calculus, e.g., [p ~ p], becomes a metatheorem of the predicate calculus when the propositional variables are replaced by syntactical. variables varying through wffs, e.g., [A ~ A]. Every value of the resulting metatheorem is a theorem of the predicate calculus. In presenting the predicate calculus, I shall list its logical and nonlogical symbols, describe its well-formed formulas, formulate its rules of inference, and postulate its axioms. 5.1.1 The Symbols

The nonlogical vocabulary of the predicate calculus consists of indexed sets of function symbols, {PLEIn' and indexed sets of predicate symbols, {pi}iEJ n ' n = 0, 1, .... For each n and every i E In' P is an n-ary function symbol. Similarly, for each n and every j E In' pi is an n-ary predicate symbol. A O-ary function symbol is a constant, and a O-ary predicate symbol is a propositional variable. The In and the In may be empty, finite,

The First-Order Predicate Calculus

69

or denumerably infinite. In the mathematical theories which we discuss in this book, 10 is empty. E 5.1 A scientist's choice of index sets I" and 1" depends on his subject matter. For instance, if he studies natural numbers, he may work with one constant, 0; one unary function,S; two binary functions, + and '; one binary predicate,

B] is wf.

wE.

Note that 'I is applied only to individual variables and that (Va) is to be read

70

Chapter 5

as "for all a." Note also that if A is a formula in a given predicate calculus, there exists a method by which one may determine in a finite number of steps whether A is wf (see Church 1956, pp. 170, 70-71, and 121-123). 5.1.3 The Axioms

In stating the axioms of the predicate calculus we make use of syntactical variables having wffs as values. The expressions that present the axioms are so-called axiom schemata. They belong to the syntax language but not to the object language. Only their values are axioms. The first three axiom schemata generalize the three axioms of the propositional calculus: PLA 1

[A:::> [B :::> AJ]

PLA 2

[[A:::> [B :::> e]] :::> [[A:::> B] :::> [A :::>

PLA3

[[-A:::> -B]:::> [B:::>

q]]

All

Since everyone of the values of these expressions is an axiom, LA I -LA 3 are axioms in any predicate calculus containing O-ary predicate symbols. The next two axiom schemata concern the meaning of 't/. To state them we need two new concepts, a free variable and a bound variable, and a syntactical symbol, Aa(b). An occurrence of an individual variable a in a wff A is bound in A if it occurs in a wf subformula of A of the form ('t/a)B; otherwise it is free in A. We say that a is a free (or bound) variable of A if some occurrence of a is free (or bound) in A. The symbol Aa(b) denotes the wff that results from substituting the term h for all free occurrences of a in A. Ideally Aa(b) ought to say the same thing about b that A asserts of a. This does not always happen. For example, if a and b are individual variables that vary over natural numbers, and if A is expressed by

"" ('t/b) "" [[a

= 2b]

v [(a

+

I)

= 2b]]

then A insists that a is either even or odd, while Aa(b) maintains that there is a value of b which equals 0 or I. We shall avoid anomalies, such as the one exhibited above, by introducing a convention: Let us say that b is substitutable for a in A if, for each variable x occurring in b, no part of A of the form ('t/x)B contains an occurrence of a which is free in A. Then we agree that whenever we assert Aa(b), it is implicitly assumed that b is substitutable for a in A.

The First-Order Predicate Calculus

71

Now the two axiom schemata: PLA 4 Let A and B be wffs and let a be an individual variable that is not a free variable of A. Then

[('v'a)[A

::::J

B]

::::J

[A

::::J

('v'a)BJ].

PLA 5 Let A be a wff, let a be an individual variable, and let b be a term that is substitutable for a in A. Then [('v'a)A ::::J Aa(b)].

The second of these is the predicate-calculus version of the dictum de omni et nullo. Most mathematical theories treat equality, =, as a logical symbol. Therefore I have assumed that = is one of the logical symbols of the predicate calculus. I write [x = y] rather than = (x, y) and postulate that satisfies the following axiom schemata: PLA 6

If b is a term, [b = b].

PLA 7 Let A be a wff, let a be an individual variable, and let band c be terms that are substitutable for a in A. Then

5.1.4 The Rules of Inference

There are two rules of inference in our predicate calculus. One is the Modus Ponens: PRJ 1

Let A and B be wffs. From [A

::::J

B] and A, we infer B.

The other is the Rule of Generalization: PRI2

Let A be a wff. If a is an individual variable, from A we infer NaJA.

In E 5.2 we illustrate the use of PRI 2: E 5.2 There once was a town with many men and one barber. The barber shaved everyone who did not shave himself and only those. He had a problem that we shall use to illustrate PRI 2. Let the universe of discourse be all men in a given town, and let P(·) and Q(.) be unary predicates that, for each x in the universe, declare x does not shave himself, and x is shaved by the barber,

respectively. Moreover, let b denote the barber, let y denote any man in the universe different from b, and give == its intended interpretation. Then, according to the story, for all y

72

Chapter 5

(i) [P(y)

==

Q(y)]

denotes t; i.e., y does not shave himself if and only if he is shaved by the barber. As to h, note that, if h does not shave himself, P(h) denotes t, Q(b) denotes f, and Oi) [P(h)

==

Q(h)]

denotes f. If h shaves himself, P(h), ,..., Q(h), and (ii) denote f. Hence we cannot assert (Vx) [P(x) == Q(x)]. However, if we let the barber shave himself, we can assert [P :::> Q] and use PRI2 to infer (Vx)[P(x) :::> Q(x)].

The rules of inference of our predicate calculus do not include RI 2 of the propositional calculus. It is therefore important to note that in any predicate calculus with O-ary predicate variables, RI 2 is a derived rule of inference. For a proof see Church 1956, pp. 149-150. 5.2 Sample Theorems

In this section we shall establish several metatheorems concerning =, V, and ==. We begin with equality. 5.2.1 Equality

In mathematics = is an equivalence relation with a certain substitutive property. PLA 6 and TM 5.1, TM 5.2, and TM 5.4 below show that our = has the same characteristics. TM 5.1

Let a and b be terms. Then [[a

TM 5.2

Let a, h, and e be terms. Then

[[a

=

h]

:::>

[[b

=

e] :::> [a

=

= b]

:::>

[h

=

a]].

em.

To prove TM 5.1 we record first the predicate-calculus version of T 4.23: TM5.3

[[A:::> [B:::> C]]

==

[B:::> [A:::> C]]]

Then letting A assert [x = a], we observe that when we substitute for x in PLA 7, then by PLA 7, TM 5.3, PLA 6, and PRI 1, [[a

=

b] ~ [[a

=

a] ~ [b

= alll

[[a

=

a] ~ [[a

=

b] ~ [b

= alll

[a [[a

= =

a] b] ~ [b

=

a]].

The First-Order Predicate Calculus

73

To prove TM 5.2, let A assert [x = e] and substitute for x in PLA 7 to infer [[b

= a]

::J

[[b

= e]

[a

::J

= e]]).

Then TM 4.1 with [a = b] as q, PLA 2, TM 5.1, and two applications of PRI 1 suffice to establish [[a = b] ::J [[b = e] ::J [a = em, PLA 6, TM 5.1, and TM 5.2 show that = is an eqUivalence relation. TM 5.4 demonstrates that = also has the usual substitutive property. TM 5.4 Let [be an n-ary function symbol and let tj and terms. Then [[[t l

= 51]

/\ [[t 2

= 52]

/\ [... /\ [til

=5

11 ] •••

Sj,

]]]::::> [f(tl,···,t ll )

i

= 1, ... , n, be

= [(5 1, ... ,5

11 )]].

The proof of TM 5.4 is a generalization of the proofs of TM 5.1 and TM5.2. 5.2.2 The Quantifiers In interpreting our language, (Va) is read as "for all a." Thus we can use V to assert sentences such as "All ravens are black." To state sentences such as "There exists a raven," we must introduce a new symbol: (3a)A

=df ""

(Va) "" A.

In our interpreted language (3a) will read as "there exists an a." Next we shall state and prove several metatheorems concerning V and 3. In reading the proofs, note that only necessary details are spelled out, and that in every proof we make use of TM 5.5-the predicate-calculus version of T 4.7. TM5.5

[[A::::> B]::::> [[B::::> C]::::> [A::::> C]]]

Moreover, note that in the proof of TM 5.8 below we assert particular cases of the predicate-calculus versions of T 4.3 and T 4.6, i.e., TM 5.6 and TM5.7. TM5.6

[A::::> ",-,,,,-,A]

TM5.7

[[A::::> B]::::> [--B::::> --A]]

In the first theorem, which concerns 3, we ascertain that if we find a b satisfying A, we can be certain that (3a)A. TM 5.8 Let a be an individual variable and let A be a wff. In addition, let b be a term that is substitutable for a in A. Then

74

Chapter 5

[Aa(b) :::> (3a)A].

Proof

[(V'a)--A:::> --Aa(b)] [[(Va)--A:::> --Aa(b)] :::> [----Aa(b) :::> -(V'a)--A]] [ -- -- Aa(b) :::> -- (Va) -- A] [[Aa(b) :::> -- -- Aa(b)] :::> [[ -- -- Aa(b) :::> -- (V'a) -- A] :::> [Aa(b) :::> -- (Va) -- A]]] [Aa(b) :::> -- -- Aa(b)] [Aa(b) :::> -- (Va) -- A]

In this proof we first assert PLA 5 for ,...., A and a version of TM 5.7. Then we apply PRII to infer [,....,,...., Aa(b) :::l ,...., (Va)""" A]. Finally we use TM 5.5 and TM 5.6 and apply PRI 1 twice to conclude that [Aa(b) :::l (3a)A]. In PLA 4 and in TM 5.9 and TM 5.10 below, the distributive properties of V are described. TM 5.9

Proof

.[(Va)[A:::> B] :::> [(Va)A :::> B]] [(Va)[A:::> B] :::> [Aa(a) :::> Ba(a)]] [[(Va)A :::> Aa(a)] :::> [[Aa(a) :::> Ba(a)] :::> [(Va)A :::> Ba(a)]]] [(Va)A :::> Aa(a)] [[Aa(a) :::> Ba(a)] :::> [(Va)A :::> Ba(a)]] [[(Va)[A :::> B] :::> [Aa(a) :::> Ba(a)]] :::> ([[Aa(a) :::> Ba(a)] :::> [(V'a)A :::> Ba(a)]]

::::>

[(V'a)[A :::> B] :::> [(V'a)A :::> Ba(a)]]]]

[(Va)[A :::> B] :::> [(Va)A :::> Ba(a)]]

In this proof we assert PLA 5 for [A :::l B], a version of TM 5.5 and PLA 5. Then we apply PRII to infer [[Aa(a) :::l Ba(a)] :::) [(Va)A :::l Ba(a)]]. Finally we assert a version of TM 5.5 and apply PRI 1 twice to deduce [(Va)[A :::l B))

:::l

[(Va)A :::) B]].

TM 5.10

Proof

[(Va)[A:::> B] :::> [(Va)A :::> (v'a)B]]

[(Va)[A:::> B] :::> [(Va)A :::> B]] (Va) [(Va) [A :::> B] :::> [(Va)A :::> B]] [(V'a)[(V'a)[A :::> B] :::> [(V'a)A :::> B]] :::> [(V'a)[A :::> B] :::> (Va) [(Va)A :::> B]]] [(Va)[A :::> B] :::> (V'a) [(Va)A :::> B]] [(Va)[(Va)A :::> B] :::> [(Va)A :::> (Va)B]]

The First-Order Predicate Calculus

75

[[(Va)[A ::::) B] ::::) (Va)[(Va)A ::::) B]] ::::) [[(Va)[(Va)A ::::) B] ::::) [(Va)A ::::) ('v'a)B]] ::::) [(Va)[A ::::) B] ::::) [(Va)A ::::) ('v'a)B])]] [(Va)[A ::::) B] ::::) [(Va)A ::::) ('v'a)B]]

In this proof we assert TM 5.9, apply PRI2, and use PLA 4 and PRll to infer [(Va)[A ~ B] ~ (Va)[(Va)A ~ B]]. Then we assert a version of PLA 4 and a version of TM 5.5 and apply PRI 1 twice to establish [(Va)[A ~ B] ~ [(Va)A

~

(Va)B]].

It is clear from our definition of :I that :I must have the same distributive properties as V. I leave it to the reader to establish this fact and conclude our discussion of V and :I with TM 5.11. TM 5.11 Proof

[(Va)A::::) (3a)A]

[(Va)A::::) Aa(b)] [Aa(b) ::::) (3a)A] [[(Va)A ::::) Aa(b)] ::::) [[Aa(b) ::::) (3a)A] ::::) [(Va)A ::::) (3a)A]]] [(Va)A ::::) (3a)A]

In this proof we assert PLA 5, TM 5.8, and a version of TM 5.5. Then we apply PRI 1 twice to infer [(Va)A ~ (:la)A]. While reading the preceding theorems, it is helpful to note the following: Suppose that we give "" and ~ their intended interpretations and read (Vx) as "for all x" and (:Ix) as "there exists an x." Suppose also that in each relevant case we choose the universe appropriately. Then we can use PLA 5 and PRll to justify the inference from "All physicists are philosophers" and "Per is a physicist" to "Per is a philosopher." Similarly, from TM 5.11 and PRI 1 we can infer "There is a black raven" from "All ravens are black." We can also apply TM 5.9 and PRI 1 to infer from "All human beings are mortal" and "Everybody in the universe is a human" that "If y is a member of the universe, y will die." Finally we can use TM 5.10 and PRll to infer from "All human beings are mortal" and "Everybody in the universe is human" that "Everybody in the universe will die." We note in passing that, if all individuals are not ravens, the symbolic rendition of all ravens (R) are black (B) is (Vx)[R(x) ~ B(x)]. From this assertion, TM 5.11, and PRI 1, we can infer that (:lx)[R(x) ~ B(x)] and that if this x is a raven, it is black. We cannot, however, deduce that there is a black raven; i.e., we cannot deduce that (:Ix) [R(x) /\ B(x)].

76

Chapter 5

5.2.3 Material Equivalence In chapter 4 I showed that the material equivalence relation of the propositional calculus is an equivalence relation with a certain substitutive property. Next I shall demonstrate that the == relation of my predicate calculus has the same characteristics. The results are stated in TM 5.12TM 5.15. I leave the proofs of the first three to the reader and outline a proof of the fourth. The axiom schemata PLA 1-PLA 3 and PRI 1 can be used to establish TM 5.12-TM 5.14, the predicate-calculus analogues of T 4.14-T 4.16.

== AI.

TM 5.12

If A is a wff, [A

TM 5.13

If A and Bare wffs, [[A

TM 5.14

If A, B, and Care wffs, then

[[A

==

BI ::::> [[B

==

CI::::> [A

==

==

BI ::::> [B

== A]].

C]]].

Hence == is a material equivalence relation. This relation also has a substitutive property that we record in TM 5.15-a predicate-calculus analogue of TM 4.6. TM 5.15 Let A be a wff and let Bl , ... , Bn be wf subformulas of A. Also let A be a wff that is obtained from A by respectively substituting the wffs 81 , ... , 8n for Bl , ... , Bn at some of the occurrences in A of the latter formulas. Finally, suppose that, for every i = 1, ... , n, [B i == 8d is a theorem. Then [A == Al is also a theorem.

We shall prove this theorem by induction on the length of w/fs. The idea of such a proof is simple. Think of the wffs as being distributed in layers. In the first layer, 51' are all the atomic formulas. In the second layer, 52' we observe all the wffs that are constructed from atomic formulas with the help of "', ~,and V. In the third layer are all the wffs that are formed from the wffs in 51 and 52 with the help of "', ~, and V. And so on. We establish the theorem first for atomic formulas. Then we assume that the theorem is valid for all wffs in all layers up to and including some given level and show that it is valid for the wffs in the next layer as well. The antecedent clause in this last step is referred to as the induction hypothesis. In carrying out the last step in the proof described above, we make tacit use of the following facts, the proof of which I leave to the reader. (See Church 1956, pp. 180 and 122-123.)

A is either an atomic fonnula or there exist wffs B, C, and D and an individual variable a such that A is '" B, [B ~ el, or (Va)D. In each case, A is of that form in one and only one way. 1. Every wff

The First-Order Predicate Calculus

77

2. A wf part of '" B either is '" B or is a wf part of B. 3. A wf part of [B of both.

:::>

C] either is [B

:::>

C] or is a wf part of B or C but not

4. A wf part of (Va)D either is (Va)D or is a wf part of D. Now the proof: If A is A, then TM 5.12 implies that [A == A] is a theorem. So in the remainder of the proof we assume that A differs from A. Suppose first that A is atomic. Then A is Bi for some i, A is 13i , and [A == A] is a theorem by hypothesis. Suppose next that A is '" C for some wff C. If Bi occurs in A, either A is Bi or Bi is contained in C. If A is Bi , A is 13 i , and [A == A] is a theorem by hypothesis. If the occurrences of the Bi are contained in C and C is obtained from C by substituting 13i for Bi , i = I, ... , n, then by the induction hypothesis [C == C] is a theorem. From this, TM 5.13, and the fact that A is '" C, it follows by T 4.12, TM 5.7, and repeated use of PRI 1 that [A == A] is a theorem as well. A similar argument suffices for the case when A is [C :::> D] for two wffs C and D. Finally, if A is (Va)C for some wff C, A is Bi for some i, and the proof goes through as above, or the Bi are contained in C. If the Bi are contained in C, A is (Va)C, and [C == C] is a theorem by the induction hypothesis. Consequently, by TM 5.13, T 4.12, PRI2, TM 5.10 and repeated use of PRI I, it follows that [A == A] is a theorem as well. Theorem TM 5.15 is used in many ways, e.g., to facilitate the application of PLA 5 in proofs. To wit: In the predicate calculus a variant of a wff A is a wff B that is obtained from A by a sequence of replacements in each of which a part (Vx)C is replaced by (Vy)Cx(y), where y is a variable that is not free in C. When a term b is not substitutable for a variable a in a wff A, we replace A by a variant of A in which none of the variables in b are bound and apply PLA 5 to the variant formula. The justification for this procedure is obtained from TM 5.15 and from the metatheorem TM 5.16, whose proof I leave to the reader. TM 5.16 Let A be a wff and let a and y be individual variables. Moreover, suppose that A contains no free occurrence of y and that y is substitutable for a in A. Then [(Va)A

==

(Vy)Aa(y)].

Another way in which TM 5.15 is used in proofs is to justify replacing wffs by wffs in which the quantifiers appear at the front of the formulas. For example, if A is [B :::> (3x)C] and x is not free in B, TM 5.15 and TM 5.17 below justify replacing A by (3x)[B :::> C]. Similarly, if D is [(Vx)C :::> B] and x is not free in B, D can be replaced by (3x) [C :::> B].

78

Chapter 5

TM 5.17 free in B, [[B

:::>

Let Band C be wffs and let x be an individual variable. If x is not

(3x) C)

==

(3x)[B ::J C]].

I leave the demonstration of [(3x)[B ::J C) ::J [B ::J (3x) C)) to the reader. To prove the converse relation we proceed as follows: First, by applying in succession TM 5.7, PRI I, the tautology ['" '" B == B], and TM 4.7, we deduce from ['" B ::J [B ::J C)) that ['" [B ::J C) ::J B). From this it follows by application of PRJ 2, TM 5.9, PRI I, and TM 5.7 that (5) ['" B ::J (3x) [B ::J C)). By similar arguments, we deduce from [C::J [B ::J C)) that (T)[(3x)C ::J (3x)[B ::J C)). Then we apply TM 4.1 to 5 and T and infer (i) ([B::J (3x) C) ::J ['" B ::J (3x) [B ::J Cm

and [B ::J [(3x) C ::J (3x) [B ::J Cm. From the last assertion, PLA 2, and PRI I, it follows that (ii) [[B::J (3x) C) ::J [B ::J (3x)[B ::J Cm. But if that is so, then (i), (ii), TM 5.3, TM 4.7, and the predicate-calculus version of T 4.24 suffice to demonstrate that [[B ::J (3x)C) ::J (3x) [B ::J C)). 5.3 Semantic Properties

In this section I shall discuss some of the semantic properties of my first-order predicate calculus. In this discussion I use the term first-order language to denote a first-order predicate calculus with a specified set of nonlogical symbols. Two first-order languages differ only if they have different index sets In and In' i.e., if they do not have the same number of function and predicate symbols. 5.3.1 Structures

To interpret a given first-order language L, I first introduce the concept of a structure for L. A structure f!} for L is a quadruple (1f!}1, N.@, F.@, G.@), where I~ I is a set of individuals; N.@ is a set of names of the individuals in I~ I, with one name for each individual and different names for different individuals; F.@ is a family of functions from 1f!}1 to 1f!}1; and G is a family of predicates in 1f!} I. To each n-ary f in {P} i e In corresponds an n-ary f.@ in pC!2!, and to each n-ary Pin {PiLeJ n corresponds an n-ary p.@ in G.@, n = 0, I, .... The f.@ exhaust F.@ as f varies through the {P LeI n' and the p.@ exhaust G.@ as P varies through the {Pi}i e Jn' 1

79

The First-Order Predicate Calculus

E 5.3 Let L be a first-order language with 10' Jo, In' In' n ~ 2, empty, with five unary function symbols p, j = 1, ... , 5, and with six unary predicate symbols, pi, j = 1, ... , 6. We construct a structure @ for L in the following way: Let I@I = {Xl' X z }, N§i = {ai' a z } and observe that there are only four unary functions from I@! to I@/. They are described in table 5.1. There the value of k at XI is Xz, and the value of kat Xz is XI' Similarly, there are only four unary predicates in I@I. They are described in table 5.2. There a t in the Xl row and Pz column indicates that XI satisfies Pz ; and an f in the X z row and Pz column indicates that X z does not satisfy Pz. We obtain a structure @ for L by assigning XI to a I ; X z to a z ; f to f 1 and F; g, h, and k to [3, and [5, respectively; PI to pI; Pz to p z; P3 to p 3 and p4 ; and P4 5 6 to p and p .

r,

5.3.2 Structures and the Interpretation of a First-Order Language

Using the structure ~ for L, we obtain in interpretation of L in a manner described below. In this interpretation we first add the names in N§i to the O-ary functions of L and denote the expanded first-order language by L(~). Then we interpret (i) each a E N§i as the individual which it names, ~(a); and (ii) each variable-free term a E L (~) of the form f (ai' ... , an) as f§i (~(a I)' ... ,~(an»' n = 0, 1, ....

It can be shown that, if a is a variable-free term in

L(~),

then either a is a

Table 5.1 The unary individual functions in F~.

F X

f

g

h

Ie

XI

XI

Xl

.I)

Xl

Xl

.II

Xl

1"2

X)

Table 5.2 The unary predicates in G~.

P X XI X2

P,

P2

PJ

P4

80

Chapter 5

name or there exist an n, a uniquely detennined n-ary function symbol f, and variable-free terms a1' ... , an such that a is f (a 1, ... , an)' (See Shoenfield 1967, p. 19.) From this it follows that (i) and (ii) detennine the interpretation !2C(a) of all variable-free tenns in L(!2C). Next let b be a term in L with n free variables Xl' " ., Xn; let e l ' ... , en be names in Ny;; and let bxl .... ,x)e 1 , ••• , en) be the term we obtain by substituting i for Xi at each occurrence of Xi in b, i = 1, ... , n. Then bX1,oO.,x)e 1 , •.• , en) is a !2C-instance of b. Conditions i and ii determine the interpretation of every !2C instance of b. In that way they determine the interpretation of b as well. To interpret the wffs of L, we must first interpret the closed formulas in L(f0), i.e., the wffs in L(f0) in which no variable is free. This we do by induction on the length of wffs.

e

(iii) Let A be the closed wff [a = b), where a and bare tenns. Since A is closed, a and b must be variable-free. We interpret A by insisting that !2C(A) = t if f0(a) = !2C(b). Otherwise f0(A) = f. (iv) Let A be the closed atomic fonnula P(a 1, ... , an), where P is not =. Since A is closed, the aj are variable-free. We interpret A by insisting that f0(A) = t if P~(f0(a1)' ... , !2C(a n)). Otherwise f0(A) = f. (v) If A and B are closed wff, then f0( "" A) = f if !2C(A) = t and !2C( "" A) = t if !2C(A) = f. Also f0([A ~ BD = f if !2C(A) = t and !2C(B) = f. Otherwise !2C([A ~ BD = t. (vi) If C is a wff that contains only one free individual variable x, then f0((Vx)C) = t if and only if f0(C x (a)) = t for all a E N~. Otherwise f0((Vx)C) = f. It is clear that conditions iii-vi determine the interpretation of all closed wffs in L(fiJ) and hence in L. But if that is true, then those conditions determine the interpretation of the remaining formulas in L as well. To show how for a fonnula B with n free variables, let a f0 instance of B be a closed fonnula of the fonn Bxl ,... ,x)a 1, ... ,a n), where (al, ... ,an) is an n-tuple of names in NQr. Conditions iii-vi determine the interpretation of every f0 instance of B. In that way they determine the interpretation of B as well. In TM 5.18 we see that the interpretation of f0 instances of tenns and wffs described above is unambiguous. TM 5.18 Let f!/J be a structure for L; let a be a variable-free term of L(f!jJ); and let 0 E Nr:! be the name of f!/J(a). If b is a term of L(f!/J) in which no variable except x occurs, then ;?2!(b x (a)) = ;?2!(b x (O)). Furthermore, if A is a wff in L(f!jJ) in which no variable except x is free, then

The First-Order Predicate Calculus

81

We shall establish the validity of the theorem for terms by induction on the length of b. If b is a name, bx(a) and bx(8) are both b and there is nothing to prove. If b is a variable, bx(a) is a, bx (8) is 8, and ~(a) = ~(8) by our choice of 8. Finally, if there is an n-ary function symbol f and terms bi , ... , bn such that b is f (b i ' ... , bn), then by the induction hypothesis, ~(bx(a)) = ~(f(biX(a),

= ff!J (~(bix(a)),

, bnx(a))) , ~(bnx(a)))

= ff!J(~(biX((}))"'" ~(bnx((})))

=

~(f(bix(8), ... , bnx (8)))

= ~(bx((}))' To establish the theorem for wffs we proceed by induction on the length of A. If A is atomic, then there are terms band c such that A is [b = c] or there are terms bi , ... , bn and an n-ary predicate symbol P such that A is P(b i , ... , bn). In either case, ~(Ax(a)) and ~(Ax((})) declare equivalent sentences. Hence ~(Ax(a)) = t if and only if ~(Ax(8)) = t. If A is not atomic, there are wffs B, C, and D such that A is '" B or [B :::::> C] or (Vy)D. I leave it to the reader to use the induction hypothesis to establish the validity of the theorem when A is '" B or [B :::::> C]. When A is (Vy)D, we may assume that y and x are not the same variable, since otherwise Ax(a) and A x((}) are both A. Let ex be a variable that varies through the names in Nf!J. Then ~(Ax(a)) = t if and only if ~(Dx,y(a, ex)) = t for all (J. E NqJ. Hence, by the induction hypothesis, ~(Ax(a)) = t if and only if ~(Dx,y(8, ex)) = t for all (X E NqJ; i.e., £0(A x (a)) = t if and only if ~(Ax(8)) = t. 5.3.3 Tautologies and Valid Well-Formed Formulas

In presenting the propositional calculus, I insisted that a tautology be a wff that denotes truth no matter what it asserts. For the propositional calculus this amounts to demanding that a tautology be a wff that denotes truth because of the meaning of '" and :::::> alone. Then to 'ascertain whether a given wff is a tautology it suffices to check its truth table. In our predicate calculus the meaning of '" and :::::> is determined by PLA 1-PLA 3 and PRI1. Any two-valued interpretation of '" and:::::>, in which the values of PLA 1-PLA 3 denote truth, their negations falsehood, and PRI 1 is a valid rule of inference, must be as described in condition v above. In this interpretation a wff is a tautology if and only if it is a value of

82

Chapter 5

Table 5.3 Truth table for PLA 4. A

P

(V'x)[A::::> Pj

[A::::> (V'x)Pj

[(V'x) [A ::::> ::::> (V'x)Pll

PI

::::>

[A

PI PI P2 P2 P3 P3 P4 P4

one of our first three axiom schemata or it is a value of a theorem schemata that can be derived from PLA 1-PLA 3 with the help of PRJ 1. Examples of such theorem schemata are TM 5.3, TM 5.5-5.7, and TM 5.12- TM 5.14. Not all the theorems of our predicate calculus are tautologies. Hence to give a semantic characterization of the theorems of our language, I must introduce several new terms. We say that a wff B is valid in ?2 if and only if ?2(B') = t for every ?2 instance B' of B. It follows from this definition that a closed formula A is valid in ?2 if and only if ?2 (A) = t. E 5.4 Let L be as described in E 5.3, and let!!) be a structure for L such that I!!)I = {Xl' Xl}' The truth table for PLA4, table 5.3, shows that PLA4 is valid in all such !!).

As E 5.4 showed, for the given pair {Xl' x 2 }, PLA 4 was valid in every = {X l ,X2 }. Henceforth we shall say that a wff A is valid in 1?2,1 if it is valid in every structure for L with the same individuals as 1?21. A valid wff is a wff that is valid in every structure. It is clear that a tautology is valid. Hence the values of PLA 1- PLA 3 are valid wffs. We next show that the values of PLA 4 and PLA 5 are also valid wffs. A value of PLA 5 denotes f only if (Va)A denotes t and Aa(b) denotes f. According to our interpretation of V, however, (Va)A denotes t only if Aa(b) denotes t for all values of b. Hence the values of PLA 5 must be valid in every domain. To show that the values of PLA 4 are valid, we begin by assuming the contrary. Thus there exist a domain £, a structure ?2 for L with 1?21 = £, wffs A and B, and an individual variable a that is not free in A such that ?2([(Va) [A => B] => [A => (Va)B]]) = f. That can happen in ?2 only if (Va)

?2, with 1?21

The First-Order Predicate Calculus

83

[A ~ B] denotes t and [A ~ (Va)B] denotes f. [A ~ (Va)B] denotes f in !?2 only if A denotes t and (Va)B denotes f. But if A and (Va)[A ~ B] denote t in !?2, then, for all b, !?2([A ~ B]a(b)) = !?2[A ~ Ba(b)] = t and !?2(Ba(b)) = t. Consequently, !?2((Va)B) = t as well. This is a contradiction which demon-

strates that E, !?2, and the pair A, B with the required properties do not exist; i.e., the values of PLA 4 are valid, as was to be shown. The validity of PLA 6 is obvious. The validity of the values of PLA 7 is an easily demonstrated consequence of TM 5.15. To see why, let !?2 be a structure for L, let A be a wff in which a is a free variable, and let band c be terms that are substitutable for a in A. Then TM 5.18 implies that, for every !?2 instance of b, c, and A, either !?2(b) differs from !?2(c) or !?2(Aa(b)) = t only if !?2(Aa(c)) = t. Hence the given value of PLA 7 is valid in !?2. Since !?2 is arbitrary, the same value of PLA 7 must be valid in all structures of L. 5.3.4 Valid Well-Formed Formulas and Theorems

It is easy to verify that the rules of inference, PRI 1 and PRI 2, preserve validity; i.e., if the premises are valid formulas, so too are the conclusions. From this and the validity of the values of PLA 1-PLA 7, it follows that the theorems of my language are valid wffs. The converse-i.e., a valid wff is a theorem-is true and will be proved in section 6.6. Consequently, we have TM 5.19. TM 5.19

A wff A is a theorem if and only if it is valid.

This theorem is as remarkable as TM 4.5 of chapter 4. It uses one interpretation (of many possible) to characterize a class of wffs that was defined in terms of concepts of the uninterpreted predicate calculus. However, in contrast to TM 4.5, TM 5.19 does not provide an algorithm for checking whether a wff is a theorem. This is so because the number of individuals in the universe is unbounded. 5.4 Philosophical Misgivings

I believe that sections 5.1-5.3, together with our discussion of the interpretation of the propositional calculus, establish that a first-order language, such as the one presented above, can serve as a guide to sound scientific reasoning. There are, however, others who disagree. They have misgivings about the meaning and about the use of for-all and there-exist sentences in a formalized language. We look at those misgivings next.

84

Chapter 5

5.4.1 For-All Sentences

Frank Ramsey (1954, p. 241) would have disputed our treating sentences such as (Va)[A ~ B] as conjunctions. He considered that acceptable only when the universe D is finite. When D is infinite, he would have argued, such expressions may look like conjunctions but are not. They cannot be written out as conjunctions and asserted as such. And if they are not conjunctions, they are not declarative sentences. Instead they are rules for judging: if I meet an A, I shall regard it as a B. In particular, to Ramsey a sentence such as "all men are mortal" has no truth value; instead it is a prescription for behavior: if I meet a man, I shall regard him as mortal. To assert a finite conjunction,

Ax(a 1 )

and

Ax(a z )

and ... and Ax(a n ),

we must be able to identify the a/ s; i.e., we must be able to name the n objects concerned. Since we cannot write down the names of infinitely many objects, we cannot declare an infinite conjunction of sentences in the way we asserted the finite conjunction above. We must use an auxiliary device such as the quantifier V. To justify such use I shall give examples that show that we can assert, understand, and assign a truth value to (Va)A even in cases where we cannot name all the a's involved. I begin with a trivial sentence. Its validity is analytic in the sense that "the predicate is part of the subject of which it is asserted" (Russell 1976, p. 46). Hence we can both understand it and assign a truth value to it without knowledge of any instances of it. E 5.5 Let the universe consist of all human beings. According to Webster, no matter what value x assumes, "Either x is not a widow or x has had a husband." By PRI 2 it follows that "For all x, either x is not a widow or x has had a husband."

Next I give an example from elementary number theory. This sentence is also easy to understand. Moreover, we can infer its truth from just a few instances of it, and we can establish its validity by mathematical induction. E 5.6 1

Let the universe consist of all the positive integers. Then "For all x,

+ 2 + '" + x = x(x + 1)/2."

My third example is one of Russell's stock examples (Russell 1976, p. 62). His sentence is easy to comprehend and undeniably true. Yet we cannot mention an instance of it.

The First-Order Predicate Calculus

85

E 5.7 Only a finite number of pairs of integers have been or will be thought of by human beings. Hence there are pairs of integers that nobody ever will have thought of: All products of two integers, of which no human being ever will have thought, are over 100.

5.4.2 There-Exist Sentences An existence sentence is a disjunction of assertions concerning the individuals in the universe; e.g., if the universe contains n individuals, the sentence "There is an a such that Ax(a)" asserts that "Either Ax(a 1 ) or A x (a 2 ) or ... or Ax(a n )." According to our discussion in section 4.5, we can express finite but not infinite disjunctions in my language. Consequently, when the universe is infinite, we must use auxiliary devices, such as the quantifier 3, to express there-exist sentences. In this chapter we have used and interpreted 3-formulas as if they were negations of V statements. That is all right as long as the universe is finite, but in the eyes of the Dutch intuitionists it is an abuse of language when the universe is infinite. The infinite-according to them and to Aristotle (1980, p. 57)-is forever growing and exists only potentially, not actually. Thus an infinite totality, such as the set of positive integers, is to be viewed as being constructed step by step from the finite without hope of the construction ever being completed. The totality exists only as a possibility of an unbounded extension of the finite. This allows the assertion of V sentences, since they use only the potentiality of the infinite. It renders the negation of such statements meaningless since the negation refers to the actuality of the infinite. An 3 sentence-as I conceive it-is in the view of Dutch intuitionists not a declarative sentence. It is a propositional abstract resembling a document, which indicates the presence of a treasure without disclosing its location. The document is worthless until we discover where the treasure is hidden (WeyI1949, pp. 50-51).2 One consequence of the intuitionists' refusal to allow the negation of V sentences over an infinite universe is that in their logic the law of the excluded middle is not a generally valid principle. They insist that a declarative sentence denotes truth or falsehood according as it, or its negation, can be verified. If neither the sentence nor its negation is verifiable, the sentence is neither true nor false. Cases in point are sentences (9) and (12) of chapter 3. The first, which concerns the decimal expansion of 0, is at the moment neither true nor false, even though in time it may be

Chapter 5

86

proved true. The second, which concerns pebbles on a beach, is neither true nor false because verification is inconceivable. The important point to note here about the intuitionists' attitude toward the law of the excluded middle is that their notion of truth belongs to the theory of knowledge while mine belongs to semantics. Thus, in spite of their objections, I insist on the general validity of [A ~ A]. 5.5 Concluding Remarks

Church has shown that my language is consistent in the sense that it contains no wff A with the property that both A and '" A are theorems (Church 1956, pp. 182 and 283). He has also shown that my language is not complete-i.e., complete in the sense that there is no wff A that is not a theorem which can be added as an axiom without causing the language to be inconsistent (Church 1956, pp. 185 and 284). For some this result may be disappointing; for me it seems fortuitous, since it is the incompleteness of the language that enables us to use it as a framework for mathematical and economic theories. In this respect, it is interesting to observe that TM 5.19 is often referred to as Codel's Completeness Theorem. To see why, note that TM 5.19 insists that the predicate calculus is complete in the sense that it is impossible to derive more valid wffs by adding axioms and rules of inference to PLA 1PLA 7 and PRI 1 and PRI 2. Our axioms and rules of inference account for all the valid wffs there are. In the next three chapters I shall state some difficult theorems without proof. The proofs can be found in Joseph R. Shoenfield's book Mathematical Logic (1967). Shoenfield bases his arguments on different axioms and different rules of inference, so I conclude this chapter by showing Shoenfield's predicate calculus to be equivalent to my own. We observe first that the sentence-formation rules of the two calculi are equivalent. Hence a formula A is wf in my predicate calculus if and only if it is a wff in Shoenfield's predicate calculus. Next I will show that a theorem in Shoenfield's calculus is also a theorem in my calculus. This fact follows from the following observations. Shoenfield's axioms are the values of [A ~ A], [Ax(a) ~ (3x)A], PLA 6, and PLA 7. His rules of inference are the following: (i) Infer [B v A] from A. (ii) Infer A from [A v A]. (iii) Infer [[A v B] v C] ~rom [A v [B v Cll.

The First-Order Predicate Calculus

87

(iv) Infer [B v C] from [[A v B] 1\ ["" A v C]]. (v) If x is not free in B, infer [(3x)A ~ B]] from [A

~ B].

Theorems T 4.1 and TM 5.8 show that Shoenfield's axiom schemata are metatheorems in my predicate calculus. Furthermore, theorems T 4.10, T 4.11, T 4.13, and T 4.18 and the metatheorem TM 5.20 demonstrate that Shoenfield's rules of inference are derived rules of inference in my predicate calculus. TM 5.20

If x is not free in B, [[A::) B] ::) [(3x)A

~

B]]

Hence a theorem in Shoenfield's calculus is a theorem in my calculus, as was to be shown. Finally I show that a theorem in my predicate calculus is also a theorem in Shoenfield's. Shoenfield proves that every tautology is a theorem (see Corollary, Shoenfield 1967, p. 27). Consequently, PLA 1-PLA 3 are theorem schemata in his predicate calculus. Shoenfield also shows that PLA 4 and PLA 5 are theorem schemata in his calculus (see Shoenfield's V-Introduction Rule and Substitution Rule, 1967, p. 31). Finally, Shoenfield's Detachment Rule and Generalization Rule demonstrate that my rules of inference are derived rules of inference in his calculus (Shoenfield 1967, pp. 28 and 31). From these observations it follows that a theorem in my predicate calculus is a theorem in Shoenfield's calculus.

This page intentionally left blank

II

Mathematical Logic II: Theories and Models

This page intentionally left blank

6

Consistent Theories and Models

In this part we shall study the syntactical and semantic properties of theories developed by the axiomatic method. If Tis such a theory, we think of T as having been embedded in a first-order language; Le., I have assigned symbols to the undefined terms of T and used them and the logical vocabulary of the predicate calculus to formulate wffs that express the ideas of the axioms and the theorems of T. Moreover, to PLA 1- PLA 7 of chapter 5, I have added as new axioms the set r of wffs that express the axioms of T. In the expanded axiom system, a theorem is either a theorem of the predicate calculus or a symbolic rendition of a theorem of T. Moreover, a wff that expresses a theorem of T is a theorem in the expanded axiom system. Finally, the syntactical and semantic properties of Tare reflected in the syntactical and semantic properties of r. In this chapter and in chapter 7 we study the relationship between the syntactical and semantic properties of a given subset r of the wffs of my symbolic language. I show in this chapter that r is consistent if and only if it has a model, i.e., if and only if there is a structure f» in which all members of r are valid. In chapter 7 I show that, if r is consistent, r is complete if and only if every two models of r are elementarily equivalent, i.e., if and only if they assign truth to the same closed wffs. In chapter 8 we discuss some of the shortcomings of the axiomatic method. For instance, I show that any finite set of axioms for the natural numbers is incomplete. Hence, no matter how we choose the set of axioms r, there is a true assertion concerning natural numbers that is not a logical consequence of r. I also show that, if r is a set of wffs that can be used as axioms for the theory of natural numbers, my symbolic language with r added to the original axioms cannot be used to establish the consistency of

r. Finally, in chapter 9 I show how the Kripke-Platek theory of sets can be embedded in a first-order language.

Chapter 6

92

6.1 First-Order Theories

A first-order theory T(r) consists of a first-order language L r , a set of nonlogical axioms r, and all the theorems that by the rules of inference can be derived from r and the axioms of L r . The axioms and theorems of Lr are, respectively, the logical axioms and the logical theorems of T(r). The nonlogical axioms rare nontautological postulates pertaining to a particular theory under consideration. They are phrased in the vocabulary of Lr and single out a set V of function and predicate symbols that are to be the undefined terms of the theory. The theorems that can be derived with the help of both logical and nonlogical axioms are the nonlogical theorems of T(r). They concern properties of the elements of V and certain objects that are defined in terms of members of V. In the development of T(r), the members of the nonlogical vocabulary of L r that do not belong to V play no useful role. Therefore, when we specify the language of a first-order theory, we usually insist that the nonlogical vocabulary contain only symbols for the undefined terms of the theory. A simple example of a first-order theory is given in E 6.1. Observe that in specifying the nonlogical symbols, I adopt symbols that belong to number theory rather than f's and P's. Also, in stating the axioms and the theorems, I use the syntactical variable A to denote a wff. The syntactical variable is free to vary through all the wffs of L p . E 6.1 The classical axiom system for natural numbers, T(P), concerns five undefined terms: a constant, 0; a unary function,S; two binary functions, + and,; and one binary predicate, [x

+ 0)]

+ S(y)) = S(x + y)]

P4

[(x

P5

[0

P6

[(x- S(y))

P7

-[x

P8

[[x

P9

[[Ax(O) /\ (V'x)[A ::::> A x(5(x))]] ::::> A]

=


A] by T 4.1. Otherwise ~ [B :::> [A :::> B]] by PLA 1 and ~ [A :::> B]

Chapter 6

96

by PRI 1. Suppose next that C is a wff and that A ~ C and A ~ [C => B]. By the induction hypothesis, ~ [A => C] and ~ [A => [C => B]]. But then, by PLA 2, ~ [[A => [C => B]] => [[A => C] => [A => Bm, and by two applications of PRI 1, ~ [A => B]. Finally, suppose that C is a wff with one free variable x and suppose that B is (Vx) C. Suppose also that A ~ C. By the induction hypothesis, ~ [A => C] and by PRI 2, ~ (Vx) [A => C]. But then by PLA 4 and PRI 1, ~ [A => (Vx) C]; i.e., ~ [A => B], as was to be shown. We can prove TM 6.4 (a generalization of TM 6.1) by induction on n. TM 6.4 If A l , closed, then A l , f-[A l

::::>

... , "

.,

An and Bare wffs of a first-order language, and if the Ai are An f- B if and only if

[A 2 ::::> [ ••• [An ::::> B] .. . J]].

In passing we note that TM 6.1 and TM 6.4 imply the validity of TM 6.5. TM 6.5

[[A::::> C] ::::> [[B ::::> C] ::::> [[ --A ::::> B] ::::> CJ]]

The latter is a much less transparent assertion than TM 6.1. A direct proof of TM 6.5 would be lengthy, but theorems TM 6.1 and TM 6.4 provide a shortcut to establishing it. When we use TM 6.4 to prove theories in the object language, it is important to bear in mind in what way a proof from the hypotheses r differs from a proof in T(r). E 6.2 illustrates this point. E 6.2 Let A be a wff with one free individual variable x. The following sequence of wffs illustrate a proof ofAx(y) in T(A):

A (V'x) A [(V'x)A ::::> Ax(Y)] Ax(Y)·

Yet f- [A ::::> Ax(Y)] is false. That is as it should be. Otherwise TM 6.4 would imply an absurd result: that [Ax(z) ::::> Ax(Y)] is a theorem of the first-order predicate calculus. Note, therefore, that Ax(y) f- [Ax(z) ::::> Ax(Y)]

is true. So too is

f-

[Ax(Y) ::::> [Ax(z) ::::> Ax(y)]],

as it should in accordance with TM 6.4.

E 6.2 illustrates that, in a proof from hypotheses, PRI 2 can be used only when the variables generalized upon are not free variables in the given set of hypotheses.

Consistent Theories and Models

97

6.4 Consistent Theories and Their Models

We shall next use the Deduction Theorem to give a semantic characterization of Cn(r) when r consists of a finite number of formulas, r 1 , ... , r m • To do so we let M(r) denote the set of models of T(r). A model of T(r) is a structure ~ with the property that all the r i are valid in ~. Hence ~ E

M(r) if and only if r i is valid in ~

i

=

1, ... ,m.

It is easy to show that if B E Cn (r1 , ... , r m ) and if ~ E M(r1"'" r m ), then B is valid in ~. The converse is also true: if M(r1"'" rm ) is not empty and if B is valid in ~ for all ~ E M(r1,"" r m ), then BE Cn (r1 , ... , r m ). Thus we have TM 6.6. TM 6.6 If M(rI" .. , r m) is nonempty, then B E Cn(rI' ... , r m) if and only if B is valid in ~ for all ~ E M(rI"'" r m).

To see why the sufficiency part of TM 6.6 is true, suppose that the contain no free individual variables. Then BE Cn (r1 ,

... ,

rm )

if and only if r 1 ,

... ,

rm

~

ri

B.

Consequently, by TM 6.4, BE Cn (r1 , ~

rri

::::>

... ,

rm)

if and only if

[rz ::::> r... rrm - 1

::::>

rrm

::::>

B]J ... ]]].

From this and from TM 5.19 it follows that B E Cn (r1 , ..• , r m ) if and only if

[[r1

/\

[rz /\ [... rrm -

1 /\

r m ]···]]]

::::>

B]

is a valid wff. This last wff in tum is valid if and only if B is valid in ~ for all ~ E M(r1"'" r m ). When the r i contain free variables, we let t i denote the closure of r i ; i.e.,

i

=

1, ... , m,

where x l' ... , X k are the free variables of the r i . To establish the sufficiency part of TM 6.6 when the r i contain free variables, we appeal to TM 6.7 and use roughly the same arguments that we used in the special case above. TM 6.7

BE C,,(rI,'''' r m) if and only if

tl,

.•. ,

t m f- B.

The validity of TM 6.7 can be obtained by elaborating on the following observations: If t 1 , ... , t m ~ B, then PRI 2 and TM 6.4 imply that both the t i and rt1 ::::> r... rtm - 1 ::::> rtm ::::> B]] ... ]] belong to Cn (r1 , ... , r m ). By re-

Chapter 6

98

peated use of PRI I, it follows from this that B E Cn(rl , ... , r m ) as well. The converse is an immediate consequence of the fact that BE Cn(rl , ... , r m ) only if B E Cn(tl , ... , t m ), which can be intuited from PLA 5 and TM 5.10. In TM 6.6 we insisted that M(rI"'" r m ) be nonempty. That was not necessary. According to the next theorem, TM 6.8, if M(rI"'" r m ) is empty, then B E Cn(rl , ... , r m ) no matter what wff B denotes. TM 6.8

M(f'l"'" r m ) is nonernpty if and only if T(f'l' ... , r m ) is consistent.

To establish this, suppose first that M(rI"'" r m ) is nonempty and that fifi E M(rI"'" r m ). If T(rI"'" r m ) is inconsistent, there is a formula B without free variables such that BE Cn(rl , ... , r m ) and "" BE Cn(r1 , ... , r m ). But then fifi(B) = I and fifi( "" B) = I, which is absurd. Thus T(rI"'" r m ) must be consistent. Next, suppose that M(rI"'" r m ) is empty; then for all structures fifi, there is an i such that fifi( "" t i ) = I. From this and from TM5.19 follows f-[""tl v [""t2 v [... [""tm - I v ""tm ] .•• ]]]. But then both the t i , i = I, ... , m, and "" t m belong to Cn(rl , ... , r m ), which can happen only if T(rI"'" r m ) is inconsistent. Thus T(rI"'" r m ) is inconsistent if and only if M(rI"'" r m ) is empty. In proving TM 6.8, we observed the validity of one half of TM 6.9. TM 6.9

M(f'l"'" r m ) is inconsistent if and only if ~[-rl v [ - r2 v [... [ - rm- 1 v -rm] ... ]j].

The validity of the other half can be established as follows: If T(rI"'" r m ) is inconsistent, "" t m E Cn(rl , ... , r m ). Hence, by TM 6.7, t l , ... , t m f"" t m . From this, and from TM 6.4 we conclude that f- ["" t l V ["" t 2 v [ ... [""tm - I v ""tm ] .•. ]]]. 6.S The Compactness Theorem In section 6.4 we established an equivalence between the concept of a consistent theory and a theory with a model when the theory is based on a finite sequence of wffs, r 1 , .•. , r m • As TM 6.10 asserts, this equivalence holds for any first-order theory. TM 6.10 Let nonernpty.

r

be a set of wffs. Then T(r) is consistent if and only if M(r) is

TM 6.11 makes a similar assertion. TM 6.11 If r is a set of wffs, and if M(r) is nonernpty, then BE Cn(r) if and only if B is valid in f!fi for all f!fi E M(r).

Consistent Theories and Models

99

The preceding results, which are proved in the appendix, are remarkable not only because they establish an equivalence between a syntactical and a semantical concept, but because they suggest shortcuts to verifying difficult facts, as witnessed in E 6.3. E 6.3 It is impossible (as we shall see in TM 8.7) to establish the consistency of T(P I-P 9) as a theorem of T(P). It is possible, but hard, to prove this consistency within a larger axiomatic system. Yet the existence of the standard model of T(P) and TM 6.10 suffice to establish the consistency of T(P I-P 9).

In reading TM 6.11, note that, if B E Cn (r), there exists a finite sequence of formulas r i E r, i = I, ... , m, such that BE Cn(rl , ... , rm ). Thus if T(r) is inconsistent, there must be a finite subset of formulas in r, say r l , ... , rm , such that T(rl , ... , r m ) is incon~istent as well. Since the converse is obviously true also, we can deduce from TM 6.10 and TM 6.11 the validity of TM6.12. TM 6.12 Let r be a set of wffs. Then T(r) is consistent if and only if, for every finite sequence f i E r, j = 1, ... , m, M(rl,"" f m ) is not empty.

TM 6.12 is called the Compactness Theorem. It has many interesting applications. I illustrate one of them in E 6.4. E 6.4 Let Lp denote the first-order language of T(P I-P 9) and let 1] denote the standard model of T(P). In addition, let T(N) denote T(P I-P 8, NA), where

NA

[[x

< yl v .[[x = yl v

[y

< xlll

Since NA is a theorem of T(P) (see PT 1 of section 6.1), 1] is a model of T(N) as well. Finally, let LpN denote the language we obtain by adding a constant rx to the vocabulary of Lp , and, for each n = 0, I, ... , let APn assert: APn : [kn

< rxl,

where kn is defined inductively by

ko =df 0 kn

=df

S(kn -

l ),

n

=

I, 2, ....

We shall use the Compactness Theorem to show that the theory T(NS), whose language is LpN and whose nonlogical axioms comprise all the APn and all the closed wffs in Lp that are true in 1], is consistent. Let f N denote the collection of closed wffs in Lp that are true in 1] and let A be any finite subset of r N U {APl , AP2 , .•• }. There is an n such that, if APm is in A, then km < kn • Hence we obtain a model of T(A) if we interprete rx as n and let 1] interpret the remaining parts of LpN' Since A is arbitrary, we conclude from TM 6.12 and TM 6.10 that T(NS) is consistent and has a model.

In reading E 6.4, note that T(NS) is a theory of nonstandard natural numbers. A model of T(NS) has a universe that contains representatives of each

100

Chapter 6

and every natural number in addition to representatives of numbers that are larger than any natural number. If a first-order theory T(r) has one model, it is sure to have many. How these models differ is a topic we discuss in chapter 7. An astonishing theorem of T. Skolem, given here as TM 6.13, bears on this problem. TM 6.13 Let r be a finite or denumerably infinite set of wffs and suppose that T(r) is consistent. If r has a model with an infinite universe, there exists a ~ E M(r) whose universe of discourse I~I is denumberably infinite.

Skolem's theorem is an extension of a theorem of Lowenheim. Therefore, TM 6.13 is usually referred to as the Lowenheim-Skolem theorem. I sketch a proof of the theorem in the appendix. TM 6.13 has many paradoxical consequences. One of them is exemplified in the following observation: It is possible to formulate a first-order theory with countably many nonlogical symbols in which we can prove that the set of real numbers is uncountable. Yet, according to TM 6.13, any such theory has a model with a countably infinite number of individuals. This is paradoxical but not contradictory. To wit: The mapping used to enumerate the individuals in the universe does not belong to the model. Hence the enumeration does not invalidate the theorem that insists that there is no one-to-one mapping of the set of real numbers onto the set of natural numbers. 6.6 Appendix: Proofs

In section 6.4 we used TM 5.19 to establish TM 6.8-the finite version of TM 6.10. In this appendix we shall sketch a proof of TM 6.10 and show that the validity of TM 6.10 implies the validity of TM 5.19. We shall also sketch a proof of TM 6.13. The details left out of the proofs can be found on pp. 44-47 and pp. 78-79 of Shoenfield 1967. 6.6.1 A Proof of TM 6.10 The proof of TM 6.10 is obtained in steps. We begin with the sufficiency part. Let f» be a model of T(r) and observe that (i) If A is a closed wff, A E Cn(r) only if f»(A)

= t.

Since, for all closed wffs A, f»([A 1\ -A]) = I, it follows from (i) that A and '" A cannot both be theorems of T(r). Consequently, (ii) If T(r) has a model, T(r) is consistent.

Consistent Theories and Models

101

To establish the necessity part of TM 6.10, I introduce some new concepts and prove several auxiliary results. Let Land t be first-order languages. Then t is an extension of L if the vocabulary of L is contained in the vocabulary of L Next, let T(r) and T(t) be first-order theories. Then T(t) is an extension of T(r) if Lt is an extension of Lr and every theorem of T(r) is a theorem of T(t). Moreover, T(t) is a conservative extension of T(r) if (1) T(t) is an extension of T(r) and (2) any wff A of Lr that is a theorem of T(t) is also a theorem of T(r). Finally, T(t) is a simple extension of T(r) if Lr and Lt are identical and T(t) is an extension of T(r). One useful example of a conservative extension of a theory is described in TM6.14. Let Land r be, respectively, a first-order language and a consistent set of wffs of L. Moreover, let t be the language we obtain by adding a denumerable number of new constants to the vocabulary of L. Finally, let T and t be T(r) as developed, respectively, in Land L Then t is a conservative extension of T.

TM 6.14

To see why TM 6.14 is valid, we need only demonstrate that if A is a wff of L with n free variables, Xl' ... , x n ' then for any sequence of distinct new constants, el' ... , en' T ~ A if and only if

t

~ AXt ..... xJe l , ... , en)'

where T ~ A is short for "A is a theorem of T." It is obvious that T ~ A only if t ~ A. Consequently, by applying first PRI 2 and then PL 5 and PRI 1, we find that T ~ A only if

t ~

AXt,. .. ,xJe l , ... , en)'

Conversely, suppose that t ~ AXt,. .. ,xJe l , ... , en) and let Yl"'" Yn be distinct variables that do not occur in A or in the proof ofA xt ,... 'X n (e l' ... , en)' Then it is easy to verify that if we replace each occurrence of ei in the proof ofAxt, ... ,xJe l , ... , en) by Yi, i = 1, ... , n, we obtain a proof of T ~ AXt, ... ,xJYl"'" Yn)' From this, PRI2, PLA 5, and PRI 1 we deduce that

t

~ AXt, ... ,xJel, ... ,en) only if T ~ A.

Our first auxiliary result concerns the existence of complete simple extensions of consistent theories. A theory is complete if it is consistent and if every closed wff is either a theorem or the negation of a theorem. TM 6.15

If T(r) is consistent, then T(r) has a complete simple extension.

Let !F be a family of subsets of the wffs in Lr and suppose that A E !F if and only if T(r, A) is consistent. To establish TM 6.15, we begin by

Chapter 6

102

showing that $' is of finite character, i.e., that A E $' if and only if every finite subset of A belongs to $'. If A E $' and B c A, then BE$', since T(r, A) is an extension of T(r, B). If A ¢ g;, then T(r, A) is inconsistent. Hence there is a closed wff B such that BE Cn(r, A) and", BE Cn(r, A). If A c A and t c r together contain the axioms that are used in proving B and '" B, then T(r, A) is inconsistent. Hence A ¢ !F. Since $' is of finite character and obviously contains the empty set, we can use a theorem of Teichmiiller and Tukey (see Kelley 1955, pp. 33-34) to claim the existence of a maximal set in $', i.e., a set that is not a subset of any other member of $'. Let A be such a maximal set and observe that T(r, A) is a consistent, simple extension of T(r). Next, suppose that B is a closed wff that is neither a theorem nor the negation of a theorem of T(r, A). Then T(r, A, '" B) is consistent and, hence (A u { '" B} ) E $'. But this contradicts the maximality of A unless '" B E A. Since '" B is not a theorem, '" B ¢ A, and we have arrived at a contradiction which demonstrates that T(r, A) is complete. Our next auxiliary result establishes the existence of a useful conservative extension of T(r). To state the result we introduce a sequence of so-called special constants for Lr . The special constants of level n are defined by induction on n. Suppose that the special constants of all levels less than n have been defined, and let (3x)A be a closed wff formed with these constants and the symbols of Lr . If n > 0, suppose also that (3x)A contains at least one special constant of level n - 1. Then let C(3x)A be a special constant of level n and call it the special constant for (3x)A. Next we add all the special constants to the nonlogical vocabulary of Lr and denote the resulting language by L~. If (3x)A is a closed wff of L~, there is a unique special constant for (3x)A in L~. Let this constant be r. We shall refer to the formula [(3x)A ~ Ax(r)] as the special axiom for r. Moreover, we shall denote by TC (r) the theory whose language is L~ and whose nonlogical axioms consist of the wffs in r and the special axioms for the special constants of L~. Then (iii) TC (r) is a conservative extension of T(r). To establish (iii) we first let TC be the theory whose language is L~ and whose nonlogical axioms are the wffs in r. Then from TM 6.14 it follows that TC is a conservative extension of T(r). Since TC is a conservative extension of T(r), we can verify (iii) by showing that every wff of Lr that is a theorem TC(r) is a theorem of TC. Let A be a wff of L r that is a theorem of TC(r), and suppose that B1 , ••• , B,. are the special axioms which are used in the proof of A. Then, by TM 6.7, B1 , ... , B,. ~ A is a theorem in TC. Hence, by TM 6.4,

Consistent Theories and Models

TC

r-- [B

1

:::::>

103

[B2 :::::> [ ••• :::::> [~ :::::> A] . .. ]]].

To show that TC r-- A, we proceed by induction on k. If k = 0, there is nothing to prove. Therefore, suppose that k ~ 1 and that B1 is [(3x)C:::::> Cx(r)], where r is the special constant for (3x)C. We may suppose that the level of r is at least as great as the levels of the special constants for which B2 , ... , ~ are the special axioms. Then r does not occur in A and the Bi for i = 2, ... , k. But if that is so and if y is a variable that does not occur in the proof of A, we can replace r by y in this proof to obtain a proof in TC of

TC

r-- [[(3x)C :::::> Cx(y)] :::::> [B2 :::::> [ ... :::::> [1\ :::::> A] ... ]]].

From this, TM 5.20, and PRI 1, it follows that

TC

r-- [(3y)[(3x)C:::::>

Cx(y)]:::::> [B2 :::::> [ ... :::::> [~:::::> A] ... ]]].

Now, for any given wff C, [(3x)C :::::> (3x)C] is a value of theorem schemata of TC. Hence, by TM 5.15 and the obvious analogue of TM 5.16, TC r-[(3x)C:::::> (3y)Cx(Y)]. But, if that is so, TM 5.17 and TM 5.15 imply that TC r-- (3y)[(3x)C :::::> Cx(Y)]. Hence, by PRI 1, TC

r-- [B2 :::::> [ ... :::::> [~ :::::> A] .. . ]],

and then, by the induction hypothesis, TC So much for TC(r). Next we show that

r-- A, as was to be shown.

(iv) If T is a complete simple extension of TC(r), there is a structure ~ for L~ such that, for any closed wff A of L~, ~(A) = t if and only if T r-- A. The required structure is obtained in the following way. For any pair a, b of variable-free terms in Lf, let a "" b mean [a = b] can be proved in T-i.e., T r-- [a = b]-and observe that"" is an equivalence relation. We let I~I and N~, respectively, denote the set of all equivalence classes of "" and the names of equivalence classes. Moreover, for each variable-free term a, we designate the equivalence class of a by aO; and for every nand n-ary f and P in L~, we let ffJJ(a?, ... ,a~)

=

(f(a 1 ,

...

,an))O

and insist that P~(a?, ... , a~) if and only if T

Then ~

=

r-- P(a 1 , ... , an)'

(I~I, N~, F~, G~) is well defined. To wit: let ai'

bi , i =

1, ... , n,

104

Chapter 6

be variable-free tenns and let f and P be n-ary function and predicate symbols. If ap = bp, i = I, ... , n, it follows from TM 5.4, PLA 7, and T I- [a i = bJ, i = I, ... , n, that T

I- [f(a 1 , ..• , an) =

f(b 1 , ••• , bn)]

T

I- [P(a 1 , · · · , an) ==

P(b 1 , · · · , bn)]·

Hence (f(a 1 , ... , an))o = (f(b 1 , ... , bn))o and T I- P(a 1 , ... , an) if and only if T I- P(b 1 , ••• , bn)· To show that if A is a closed wff, ~(A) = t if and only if T I- A, we proceed as follows. First we observe that if a is a variable-free tenn, ~(a) = aO. Next we suppose that A is a variable-free atomic formula. If A is [a = b], then ~(A) = t if and only if ~(a) = ~(b), i.e., if and only if T I- A. If A is P(a 1 , ••• ,an), then ~(A) = t if and only if P~ (~(a 1 ), ... , ~(an))' i.e., if and only if T I- A. Hence, for variable-free atomic fonnulas, ~(A) = t if and only if T I- A. Finally we demonstrate (by induction on the lengths of wffs) that, for all closed wff, ~(A) = t if and only if T I- A. Let A be a closed wff of Lf. Then there exist closed wffs of Lf, B, C, and D and an individual variable x such that A is '" B, [B::::> C] or (Vx)D. Suppose first that A is '" B. Then ~(A) = t if and only if ~(B) = f. By the induction hypothesis, ~(B) = f if and only if B is not a theorem of T. Finally, by the completeness of T, B is not a theorem of T if and only if T I- '" B. Consequently, ~(A) = t if and only if T I- A. With only obvious modifications, the arguments used above can be applied to A when A is [B ::::> C]. Those details I leave to the reader. Suppose that A is (Vx)D. Then ~(A) = f if and only if there is an i E N~ such that ~(Dx(i)) = f. Now, i is the name of an individual aO. Hence, by TM 5.18, ~(A) = f if and only if there is a variable-free tenn a such that ~(Dx(a)) = f. By the induction hypothesis ~(Dx(a)) = f if and only if Dx(a) is not a theorem of T, and by the completeness of T, Dx(a) is not a theorem of T if and only if T I- '" Dx(a). Hence ~(A) = f if and only if there is a variable free tenn, a, such that T I- '" Dx(a). Now T I- '" Dx(a) only if T I- (3x) '" D. Also if e denotes the special constant of (3x) '" D, then T I- (3x) '" D only if T I- '" Dx(e). From this, the tautology ['" '" D == D], and TM 5.15, we deduce that T I- '" Dx(a) for some variable free tenn, a, if and only if T I- '" A. Consequently ~(A) = f if and only if T I- '" A and, by the completeness of T, ~(A) = t if and only if T I- A. It follows from (iv) that the ~ we constructed in the proof of (iv) is a model of T. But then ~ is a model of TC (r) as well. By omitting some of

Consistent Theories and Models

105

the functions and predicates of !!), we obtain a structure for Lr that is a model of T(r). This concludes the proof of (v) If T(r) is consistent, T(r) has a model. Assertions (ii) and (v) represent the two halves of TM 6.10. Thus the proof of TM 6.10 is complete. 6.6.2 A Proof of TM 6.11

By appealing to PRI 2 and PLA 5, we can show that it suffices to establish TM 6.11 for closed wffs. Let B be a closed wff and suppose that B is a theorem of T(r). Then T(r, '" B) is inconsistent and, by TM 6.10, possesses no model. Hence, for all !!) E M(r), !!)( '" B) = f and !!)(B) = t. Suppose next that, for all !!) E M(r), !!)(B) = t. If B is not a theorem of T(r), T(r, '" B) is consistent and possesses a model d in which d(B) = f. Since d is a model of T(r), this contradicts our original hypothesis. Hence B must be a theorem of T(r). 6.6.3 A Proof of TM 5.19

The validity of TM 5.19 follows from the validity of TM 6.11 by letting be the empty set.

r

6.6.4 A Proof of TM 6.13

The proof of TM 6.13 is based on two facts that' we may formulate as follows: (vi) Lf contains a denumerable infinity of special constants. (vii) TC(r) has a model!!) such that each individual in I!!)I is !!)(r) for infinitely many special constants r. I shall sketch a proof of (vii) and leave (vi) to the reader. Let!!) be the structure for T that we constructed in proving assertion (iv). Then every individual in I!!) I is !!)(a) for some variable-free term a in Lf. By PLA 6 and TM 5.8, TC(r) ~ (3x) [x = a]. Hence TC(r) ~ [r = a], where r is the special constant for (3x) [x = a]. From this and (iv), it follows that !!)(r) = !!)(a) = aO. By replacing x with other variables, we find infinitely many other special constants with the same property, as was to be shown. To prove TM 6.13, we first add a denumerable infinity of constants, el , ez , ... to L r and denote the resulting language by L. Next, for each pair ei'

Chapter 6

106

ej of distinct new constants, we add to the axioms of T(r) an axiom '" lei = ej] and denote the resulting theory by 1. Then an easy application

of the Compactness Theorem suffices to demonstrate that f is a consistent theory. Finally, we form i e from i and fe from f in the way we formed Lf from Lr and Te from T(r), and we obtain a language that contains a denumerable infinity of special constants and a consistent theory. From this and from (vii) applied to fe, we deduce that fe has a model !?fi with a denumerable number of individuals. Since !?fi(eJ "# !?fi(e) when '" lei "# ej ], the universe of !?fi must contain infinitely many individuals. By eliminating some functions and predicates from !?fi-i.e., by restricting !?fi to Lr-we obtain a model of T(r) with a denumerable infinity of individuals.

7

Complete Theories and Their Models

A first-order theory T(r) is complete if and only if there is no closed wff A in Lr such that (1) A does not belong to Cn(r) and (2) T(r, A) is consistent. In this chapter I shall give several model-theoretic characterizations of complete theories. We begin by studying a certain hierarchy of theories that we construct by means of definitions. We then discuss isomorphic and elementarily equivalent structures and establish the correspondence between characteristics of such structures and complete theories. 7.1 Extension of Theories by Definitions

Two first-order theories T and T' can differ in many ways. We say that T' is an extension of T if and only if (1) the vocabulary of T is contained in the vocabulary of T' and (2) the theorems of T are theorems of T'. We say that T' is a conservative extension of T if and only if (1) T' is an extension of T and (2) any formula A which is wf in the language of T and is a theorem of T' is also a theorem of T. Finally, T and T' are equivalent if they are extensions of each other. E 7.1 Let T(P) and T(N) be as described in E 6.1 and E 6.4. It follows from PT 1 of section 6.1 that T(P) is an extension of T(N). The theories are not equivalent since P 9 is not a theorem schemata of T(N).

7.1.1 Predicates

One way in which conservative extensions of a theory are obtained is via definitions of new constants, functional constants, and predicate constants. We shall begin with predicates. Let T(r) be a first-order theory in which the formulas in r are the only nonlogical axioms; let Xl' ... , X m denote distinct individual variables, and

108

Chapter 7

let A be a wff in Lr in which no variable other than Xl' ... , X m is free. In addition, insist that P be an m-ary predicate constant that satisfies AD 1. AD1

[P(Yl,···,Ym) =A X1 ••••• xjYl'···'Ym)]

Finally, let T(r, AD 1) denote the theory obtained from T(r) by adding the symbol P to the vocabulary of Lr and the assertion AD 1, called the defining axiom of P, to the other nonlogical axioms of T(r). Then T(r, AD 1) is a conservative extension of T(r). In fad, we now have TM 7.1. TM 7.1 Let T(r), A, P, and AD 1 be as above. Moreover, let B be a wff in L(LAD 1); let A' be a variant of A in which no variable of B is bound; and let Blfbe a wff in L obtained from B, by replacing each part P(a l' ... ,am) of B by A~l ..... xjal"'" am)' Then Blf- E Cn(r) if and only if B E Cn(r, AD 1).

In the predicate calculus, a variant of a wff A is a wff obtained from A by a sequence of replacements in each of which a part (Vx)B is replaced by (Vy)Bx(y), where y is a variable which is not free in B. Therefore, AD 1, TM 5.16, and TM 5.15 imply that if Band Blf- are as described in TM 7.1, then [B == Blf-] E Cn(r, AD 1). Hence BE Cn(r, AD 1) if and only if B'f E Cn(r, AD 1). From this it follows that, to establish TM 7.1, it suffices to demonstrate that B'f E Cn(r) if B E Cn(r, AD 1). This we shall do by induction on theorems. We begin with the axioms. If B is a value of PLA 1-PLA 3, so is B'f, and there is nothing to prove. If B is a value of PLA 4, say [('Ix) [C ::::) D] ::::) [C ::::) (Vx)D]], then B'f is [('Ix) [C'f ::::) D'f] ::::) [C'f ::::) (Vx)D'f]], which is another value of PLA 4, since x is not free in C'f. If B is a value of PLA 5, say [(Vx)C::::) Cx(y)], then B'f is [(Vx)C'f ::::) Cx(yt], C;(y) is a variant of Cx(y)'f, and TM 5.16 and TM 5.15 imply that B'f is a theorem of T(r). Ditto for B and B'f when B is a value of PLA 7. Finally, if B is a member of r, B'f is B; and if B is the defining axiom of P, B'f is [A' == A], which, by TM 5.16, belongs to Cn(r). Suppose next that B has been inferred from C and [C::::) B]. By the indudion hypothesis, C'f E Cn(r) and [C ::::) Bt E Cn(r). Since [C ::::) B]'f is [Clf- ::::) Blf-], we conclude, by PRI 1, that B'f E Cn(r). Finally, suppose that B is (Vx)C and that C E Cn(r, AD 1). By the induction hypothesis, C'f E Cn(r). Hence, by PRI2, (Vx)C'f E Cn(r). Since Blf- is (Vx)C'f, this concludes the proof of TM 7.1. The definitional scheme expressed in AD 1 differs from the definitional scheme used in chapter 4. To see how they differ, consider E 7.2. E 7.2

The symbol

[x ~ Y]

=df

[[x


'" R(x)].

i.e., if x is not subjected to the test prescribed by P, x does not have the property R. To escape the unfortunate situations described in (ii) and (iv), Camap (1936, pp. 441-444) suggested that R be defined by a so-called reduction sentence, (v) [P(x):::::> [R(x)

==

Q(x)]].

Then, if x is not tested, x need not satisfy R. But (v) provides only a partial definition of R. Hence as a prototype of a general definitional scheme, (v) does not satisfy the conditions imposed on valid definitions in chapter 2.

Difficulties such as those exhibited in E 7.4 are caused by the characteristics of material implication. Hence they are not particular to the problem of introducing scientific concepts by definition but pop up in many

112

Chapter 7

areas of scientific endeavor, e.g., in the theory of evidence. For illuminating discussions of definitional schemes for science, I refer the reader to Hempel 1952 and to Przelecki 1969 (pp. 63-87). 7.2 Isomorphic Structures

In chapter 3 I insisted that a declarative sentence is true if the denotations of the names in the sentence satisfy the relations predicated of them by the sentence. I also suggested that a complete explication of truth for a given language would require prescribing for each and every relation of the language the objects that satisfy them. These ideas can be specialized to suit our present purposes as follows: Let L be a first-order language. A structure !?) for L is a quadruple consisting of a set of individuals I!?) I (the universe of discourse), a set of names of individuals, and functional and predicate constants that are related in a definite way to the functional and predicate symbols of our first-order language L. By adding the names of the elements in I!?) I to the constants of L, we obtain a language which in section 5.3.2 we denoted by L(!?)). If B is a wff in L, then B is valid in !?) (i.e., is true) if the individuals in I!?) I satisfy the relations which B predicates of them in !?). We shall identify the concept of truth in !?) with the triple (I!?)I, L, Tr(!?))), where Tr(!?)) designates the set of closed wffs in L(!?)) that denote truth in !?). As !?) varies, the concept of truth varies, both because the individuals in the universe of discourse change and because the idea of a true sentence in L(f0) changes. Here is an example to aid our intuition. E 7.5 Let T(G) be a first-order theory whose language L G contains two nonlogical symbols-a constant, c and a binary function symbol'. The nonlogical axioms of T(G) are the following:

=

Gl

[((x-y)'z)

G2

[(c x)

G3

(V'x)(3y)[(y' x)

(x,(y,z))l.

= xl. = cl.

(Here (x' y) is a synonym for· (x, y).) There are two structures for LG , .91 and f!4, which are models of T(G) and which satisfy the following conditions: (i)

1.911 =

(ii) d(c)

=

R++

= R. = O.

and 1f!41

1 and f!4(c)

(iii) If a and b are variable-free terms, then (a) .91(. (a, b))

=

(d(a)· d(b))

(b) f!4(' (a, b)) = (f!4(a)

and

+ f!4(b)).

Complete Theories and Their Models

113

Here Rand R++ denote, respectively, the set of real numbers and the set of positive real numbers. Also 1 and 0 are the real numbers usually named by these symbols. Finally, the function symbols on the right-hand sides of (a) and (b) denote the standard multiplication and addition signs.

In this section we shall study how the concept of truth changes from one structure to another. To do so, I introduce several new concepts. Let A and B be sets and suppose that ¢(.): A ~ B. Then ¢ is injective if it is one-one and bijective if it is injective and on to. E 7.6

Let Rand R++ be as in E 7.5. The mapping ¢J('): R -+ R++, defined by for x E R is bijective. Similarly, if Nand M denote, respectively, the set of natural numbers and the set of even natural numbers, the mapping ¢J('): N -+ M defined by ¢J(x) = 2x is bijective. ¢J(x)

=

eX

Next let f2 = (1f21, NqJ, FqJ, CqJ) and .91 = (1.911, N.sI, psi, Cd) be structures for L and suppose that ¢(.): 1f21 ~ 1.911 is bijective. Then ¢ is an isomorphism of f2 and .91 if it satisfies the following conditions for all m-ary fand Pand all ai E 1f21, i = 1, ... , m: (7.1)

and (7.2)

where m varies over 0, 1, .... Finally, two structures f2 and .91 are isomorphic if and only if there is a bijective mapping ¢J('): 1f21 ~ 1.911 that satisfies equations 7.1 and 7.2. There are several things to note about equations (7.1) and (7.2) above. First, the symbols = and == must not be taken to be symbols of the object language. They are shorthand expressions for"equals" and "if and only if," respectively. Second, fqJ and fd correspond to the same function symbol in L. Similarly, pqJ and p,sI correspond to the same predicate symbol in L. Third, 1f21 may consist of objects that are entirely different from the objects in 1.911. 1f21 can also be either identical with or a subset of 1.911. When 1f21 contains a finite number of individuals, 1.911must contain the same number of objects. In that case, if If21 and 1.91I contain the same elements, ¢ is a permutation of the individuals in If21. The mapping ¢(.): R ~ R++, defined by ¢(x) = eX, x E R, is an isomorphism of the two structures in E 7.5. Another example is given in E 7.7. E 7.7 Let L, f, g, h, k, ad Pl - P4 be as in E 5.3. Furthermore, let ~ be the structure obtained by assigning {x 1' X 2} to I~ I; f to f land F; g, h, and k to

e,

Chapter 7

114

j4, and [5, respectively; Pl to p l ; P2 to p 2; P3 to p 3 and p4; and P4 to p 5 and p6.

Finally, let d be the structure obtained by assignng {Xl' X2} to Idl; [ to J3; g to [1 and [2; hand k to [4 and [5, respectively; Pl to pl; P2 to p3 and p4; P3 to p2; and P4 to pS and p 6 • Then the function t/J, defined by t/J(x l ) = X2 and t/J(x 2) = Xl' is an isomorphism of ~ and do

If !!2 is a structure for L and if A is a set and if> is a bijection from 1!!21 to A, we can use equations 7.1 and 7.2 to define a structured for L that is isomorphic to !!2 and satisfies 1.91 1 = A. Let T(N) and T(NS) be as in E 6.4, let", denote the standard model of T(N), and let ",' be a model of T(NS) such that 1",1 (J 1",'1 = 0. In addition, let N* denote the set of all a E 1",'1 so that a = ",'(kn ) for some n = 0, 1, . and let

E 7.8

0"

A

= 1",1 u (1",'1 -

N*).

-+ A be such that ¢J(a) = a if a E (1",'1 - N*), and = 0, I, Then ¢J is a bijection. Hence the conditions ["·(¢J(al),oo.,¢J(a k)) = ¢J(f"'(al,oo.,ak))' P"" (¢J (a 1 ), ¢J(a k)) == p,,' (a l , ak), ['I' E p', p,,' E G"', (al,.oo,a k) E 1",'l k, k = 0, I, together with 1",*1 = A, N,," = N" u (N", - {k~, n = 0, I, ... }), and k~ being the name in N", of ",' (k n ),

Finally, let ¢J('): 1",'1

",(kn )

=

¢J(",'(kn )), n

0

00 0'

•••

00"

'00'

define a structure

",*

for L that is isomorphic to ",'

0

Let !!2 and .91 be structures for L and let ¢J('): 1!!21 -+ 1.911 be an isomorphism of !!2 and d. Equations 7.1 and 7.2 do not mention N~ and N.~. So for each a E N~, we let a~ E N sl denote the name of ¢J(!!2(a)). Moreover, if u is a term or a wff in L(!!2), we let u~ denote the term or wff in L(d) in which each name a in u has been replaced by a~. With this notation we can assert TM 7.3. TM 7.3 Let f0 and d be structures for L; let ¢J('): 1f01 -+ Idl be an isomorphism of f0 and d; and let a and B be, respectively, a variable-free term and a closed wff of L(f0). Then d(a~)

=

¢J(f0(a))

and

d(B~)

= f0(B).

Thus, if two structures !!2 and .91 are isomorphic, the set of formulas in L that are valid in !!2 is identical to the set of formulas in L that are valid in d. However, the concept of truth in !!2 differs from the concept of truth in .91 to the extent that the individuals and the function and predicate constants of !!2 differ from those of d. To prove the theorem, let a be a variable-free term and proceed by induction on the length of a. If a is a name, there is nothing to prove. If a is [(at, ... , an), then by the induction hypothesis,

Complete Theories and Their Models

115

¢J(~(a)) = ¢J(ffiJ(~(al)"'" ~(an)))

=

r (¢J(~(al))' ... , ¢J(~(an))) 91

= fsl (d(af), ... , d(a!)) =

d(a rP ).

The second half of the theorem can be established in a similar way by induction on the length of wffs. We first show how for an atomic formula, P(a l' ... , an)' where the ai are variable-free terms and P is not =. In this case, ~(A)

= t ==

pfiJ(~(al)'"'' ~(an))

==

p-91 (¢J(~(al))'

... , ¢J(~(an)))

==

psi (d(af), ... , d(a!))

==

d(ArP).

When B is [a = h] for some variable-free terms a and h, we observe that ¢J is injective and apply the same arguments as above. Suppose next that there exist closed wffs A, C, and D and an individual variable x such that B is '" C, [C :::) D] or (Vx)A. Since ('" C)rP is '" CrP and [C :::) D]rP is [CrP :::) DrP], the validity of the theorem for the first two cases is an immediate consequence of the induction hypothesis. To deal with the third case, we observe first that ~(B) = f if and only if ~( '" B) = t and that BrP is (Vx)A rP. Then we note that since ¢J is surjective, for all j E N sI , there is an i E NfiJ such that j is i rP . Finally, we appeal to the induction hypothesis and deduce that ~(", B)

= t ==

~((3x) '" A)

=

t

==

~(", Ax(i)) =

t for some i E NqJ

==

d( '" Af(irP))

= t for some i E NqJ

==

d( '" Af(j))

= t for some j

==

d( '" BrP) =

t.

E N sl

From these arguments it follows that d(BrP) = ~(B). If ~ is a structure for L, ~ is isomorphic to itself since the identity mapping on I~ I is a bijection. In addition, if ~ and d are structures for L such that ~ is isomorphic to d, then d is isomorphic to ~. Finally, if ~ d, and f!J are structures for L such that ~ is isomorphic to sI, and d is

Chapter 7

116

isomorphic to []I, then ~ is isomorphic to []I. Thus the relation of being isomorphic is an equivalence relation that partitions the set of all structures for L into equivalence classes. Some of these equivalence classes are of particular interest to us. Let r be a set of wffs, let T(r) be the corresponding theory, let EL denote an equivalence class of isomorphic structures, let ~ belong to M(r), and suppose that ~ EEL. If d E Ev then d E M(r) as well. Thus if ~ is a model of T(r) for one ~ E Ev it is a model of T(r) for all ~ EEL. How large EL is depends on the number of elements in I~I. If ~ is finite, every permutation of I~ I determines a structure d that is isomorphic to ~ and belongs to EL . If I~I contains infinitely many individuals, every bijective mapping of I~ I onto subsets of I~ I determines a structure d that is isomorphic to ~ and belongs to EL . The fact that a consistent theory T(r) has many models suggests that it can be used to talk about many things. This is especially interesting to mathematicians because the theorems they establish for T(r) need not be proved anew for each different model of T(r). To scientists, on the other hand, it is cause for concern because they view the numerousness of the models of a scientific theory to be a measure of how vague such a theory is. Note, therefore, that for a scientific theory the vagueness caused by the existence of structures that are isomorphic to its intended model cannot be avoided. It is a fact of life scientists must live with. If T(r) has models with infinitely many individuals, it has models that are not isomorphic. If two models of T(r) are not isomorphic, the concepts of truth that they determine may differ because their respective universes, functions, and predicates differ. They may also differ in that the two sets of wffs in L, which correspond to the true formulas in the two models, are different. The latter situation is a characteristic feature of incomplete theories as evidenced in TM 7.4. TM 7.4 Suppose that r consists of a denumerable number of formulas and that T(r) has only models with infinitely many individuals. Suppose also that all models with a denumerable universe are isomorphic. Then T(r) is complete.

We showed previously that, if a theory such as T(r) has a model with an infinite universe, then it has a model with a denumerably infinite universe. Thus the conditions of the theorem are not empty. They provide modeltheoretic conditions sufficient for a theory to be complete. Similar conditions suffice when r consists of nondenumerably many wffs too. The validity of TM 7.4 is easy to prove. Suppose that the conditions of TM 7.4 hold and that T(r) is incomplete. Then there is a closed formula BEL such that B is not a theorem in T(r) and such that T(r, B) and

Complete Theories and Their Models

117

T(f', '" B) are both consistent and possess, respectively, models ~ and ~'f, which are also models of T(r). Hence these models have infinite universes of discourse. From this and TM 6.13, we conclude that there exist structures .91 and f!4 with denumerably infinite universes that are models of T(f', B) and T(f', '" B), respectively. Since these models are also models of T(r), they are isomorphic. But if that is so, then d(B) = f!4( '" B) = t is a contradiction, which demonstrates that T(r) is complete.

7.3 Elementarily Equivalent Structures

When ~ and f!4 are structures for L, with the property that the closed wffs in L which are true in ~ are also true in f!4 and vice versa, they are said to be elementarily equivalent. Two isomorphic structures are elementarily equivalent. There exist, however, elementarily equivalent structures that are not isomorphic. To discuss elementarily equivalent structures, I must introduce several new concepts: Let ~ and .91 be structures for L that satisfy the following conditions: (i) I~I c 1.911 and N~ eNd' (ii) f~(al'" .,ai) = f d (a 1 , ,ai) for all i-ary f~ E F~, i = 0,1, (iii) p~ (b i , i = 1,2,

,

; and all (a 1 , bi) == pd (b i , ; and all (b i

,

,ai) E 1~li. bi) for all i-ary p~

,

,

bJ E

E G~,

1~li.

Then .91 is an extension of ~. Moreover, .91 is an elementary extension of ~ if .91 is an extension of ~ and d(A) = ~(A) for every closed wff A of L(~). Finally, if .91 is an (elementary) extension of ~, we shall say that ~ is an (elementary) substructure of d. E 7.9 Let 1] be the standard model of T(N) and let 1]'f be the model of T(NS) that we constructed in E 7.8. Then 1] is a substructure of 1]'f. To see why, observe that, by construction, 11] I c 11]"'1 and N" eN,,". Furthermore, for n E 11] I, Str(n)

= S"(1](kn )) = 1](S(kn )) = 1](kn +1) = fj>(1]'(k n +1)) = fj>(1]'(S(kn ))) = fj>(S'" (1]'(kn ))) = S""(fj>(1]'(kn ))) = S"'(1](kn )) = S""(n).

Similarly, for n, +"(n, m)

mE 11]1

and

+(., .), we find that

= 1]( + (kn , km )) = 1](kn + m ) = fj>(1]'(k n + m )) = fj>(1]'(+(kn ,km )) = fj>(+"'(1]'(kn ),1]'(km ))) = +"'(fj> (1]' (kn )), fj>(1]'(km ))) = +"'(1](kn }, 1](km )) = +""(n,m).

118

Chapter 7

With obvious modifications, the same arguments show that, when n, '''(n,m)

=

mE

1111,

·,,'(n,m),

and ""' [y E z]]]]. Axiom KA 4 sounds innocuous. To see that it is not, consider E 9.1. 1 Consider a universe that consists of all human beings who were, are, or will be; and suppose that (' [z E y]]]].

Then the tautology [[5(x) /\ 5( y)] :=> [5(x) /\ 5( y)]], an application of [[[B :=> C] /\ [A :=> [B :=> D]]] :=> [A :=> [B :=> [C /\ D]]]], and SD 1 suffice to establish [[x

=

y]:=> [[5(x) /\ 5(y)]:=> [x c y]]].

By similar arguments, [[y = x]

:=>

[[5(y) /\ 5(x)]

and by [[5(x) /\ 5(y)] [[y

=

==

:=>

[y ex]]],

[5(y) /\ 5(x)] and use of TM 5.15,

x]:=> [[5(x) /\ 5(y)]:=> [y ex]]].

Hence, if we apply TM 4.1, with q equal to [x = y], and use PLA 2 to ascertain that [[[x

=

y]

:=>

[y

=

x]]

:=>

[[x

=

y]

:=>

[[5(x) /\ 5(y)]

:=>

[y ex]]]],

150

Chapter 9

we can, by TM 5.1 and PRJ 1, deduce [[x

=

y]

[[5(x) /\ 5(y)]

:::>

:::>

[y ex]]].

:::>

[[x c y] /\ [y ex]]]],

But, if that is so, then [[x

=

y]

[[5(x) /\ 5(y)]

:::>

and other application of TM 5.3 yields [[5(x) /\ 5(y)]

:::>

[[x

=

y]

:::>

[[x c y] /\ [y ex]]]].

Since KA 4 and TM 4.1, with q equal to [5(x) /\ 5(y)], imply that [[5(x) /\ 5(y)]

:::>

[[[x c y] /\ [y ex]]

:::>

[x

=

y]]],

we can conclude that [[5(x) /\ 5(y)]

:::>

[[x

=

y]

==

[[x c y] /\ [y ex]]]],

as was to be shown. To state the remaining axioms of KPU, I must first introduce the idea of a do formula: The collection of do formulas of L is the smallest collection Y containing the atomic formulas of L which is closed under the following operations: (1) if

[[5(x)

ST 6

[5(z)::::> (:ly) [5(y)

1\

(Vy)""'" [y E 1\

xll

(Vy)""'" [y EX]] ::::> [x c z]]] 1\ ,.....,

[y E zlll

To show how ST 4 is established, we first assert an instance of KA 8: [5(z)

::J

(3u) [5(u) /\ (\Ix) [[x E u]

==

[[x E z] /\ '" [x

=

x]]]]].

Next we assert [x = x]; use [A == '" '" A], TM 4.1, with q equal to [x and [A == '" '" A]; and conclude that

E

z],

'" [[x E z] /\ '" [x = x]].

Then TM 4.1, with q equal to [[x E u] == [[x E z] /\ '" [x = x]]], and use of [[[A == B) ::J ' " B] ::J [[A == B] ::J ' " A]] allow us to declare [[[x E u]

==

[[x E z] /\ '" [x

=

x]]]

::J ' "

[x E u]].

By PRI2 and with the use of TM 5.10, it follows that

Chapter 9

[(\lx)[[x E U]

152

==

[[x E Z] /\ "-' [x = x]]] :::J (\Ix) "-' [x E u]].

Hence, [[S(U) /\ (\Ix) [[x E u]

==

[[x E z] /\ "-' [x

=

x]]]] :::J [S(u) /\ (\Ix) "-' [x E u]]].

Next we apply PLA3, PRI2, TM5.IO, and PLA3 again to deduce [(3u)[S(u) /\ (\lx)[[x E u]

==

[[x E z] /\ "-' [x

=

x]]]]

:::J (3u) [S(u) /\ (\Ix) "-' [x E u]]].

From this we obtain [S(Z) :::J (3u) [S(u) /\ (\Ix) "-' [x E u]]]

by first applying TM 4.1, with q equal to S(z), and PLA 2 and then by asserting the noted instance of KA 8 and using PRI 1. But, if that is so, the theorem follows by an appeal to PLA 3, PRI2, PLA 4, PLA 3 again, [ "-' "-' A == A], TM 5.15, KA 2, and PRI 1. In the case of ST 5, we first use T 4.2, PRI2, and TM 5.10 to deduce that [(\ly) "-' [y E x] :::J (\ly) [[y E x] :::J [y E z]]]

and, hence, that [[S(x) /\ (\ly) "-' [y E x]] :::J [S(x) /\ (\ly)[[ y E x] :::J [y E z]]]].

By applying TM 4.1, with q equal to S(z), and TM 5.3 to this assertion we find that [[S(x) /\ (\ly)"-'[y E x]] :::J [S(z):::J [S(x) /\ (\ly)[[yEX]:::J [yEZ]]]]].

Since we also can assert [S(z)

:::J S(z)],

apply TM 4.1 to it, and establish

[[S(x) /\ (\ly) "-' [y E x]] :::J [S(z) :::J S(z)]],

we conclude that [[S(x) /\ (\ly)"-'[y EX]]:::J [S(z):::J [S(z) /\ [S(x) /\ (\ly)[[y EX]:::J [y E z]]]]]].

But if this is so, the tautologies [[A /\ [B /\ C]] == [[A /\ B] /\ C]] and [[A /\ B] == [B /\ A]], SD I, TM 5.15, and another application of TM 5.3 suffice to establish ST 5. To establish ST 6, we begin by asserting an instance of KA 8, [S(Z) :::J (3y) [S(y) /\ (\Ix) [[x E y]

and observe that

==

[[x E z] /\ "-' [x EX]]]]],

Elementary Set Theory

[(V'x)[[x E y]

==

153

[[x E z] /\ '" [x E x]]] ~ '" [y E yll.

We then use the observation to give a simple proof of (V'x) [[x E y]

==

[[x E Z] /\ '" [x E x]]] ~ '" [y E Z].

By the Deduction Theorem, it follows that [(V'x)[[x E y]

==

[[x E Z] /\ '" [x E x]]] ~ '" [y E

zll.

Hence, [[5( y) /\ (V'x)[[x E y]

==

[[x E Z] /\ '" [x E x]]]] ~ [5( y) /\ '" [y E Z]]].

But if that is so, we can use PLA 3, PRI2, TM 5.10, and PLA 3 again to establish [(3y)[5(y) /\ (V'x)[[x E y]

==

[[XEZ] /\ "'[x E x]]]] ~ (3y)[5(y) /\ "'[y E z]]].

From this assertion and the asserted instance of KA 8, the theorem follows by applying TM 4.1, with q equal to 5(z), and PLA 2. In the intended interpretation of L, ST 4 asserts that a set exists that contains no elements. According to ST 5, such a set is a subset of all sets. The third theorem states that no matter how we choose the set x, there is a set y in the universe that does not belong to x. Consequently, the universe is not a set. Hence we cannot substitute the universe for x in KA 8 and assert the existence of the set of all sets that are not elements of themselves; no such set exists. This result precludes Russell's antinomy in this elementary set theory. A set that has no elements is called a null set and is usually denoted by 0. I will introduce 0 in L as in SO 2. SD 2

[[z

= 0] ==

[5(z) /\ (Vy)[[y E z]

:J

-[y

=

y]]]]

Then ST 4, E 9.2, and TM 5.15 supply the existence condition for the introduction of 0; and KA 2, KA 4, and KA 8 imply the validity of the required uniqueness condition, ST 7. ST 7

[[5(x) /\ [(Vy) - [y EX] /\ [5(z) /\ (Vy) - [y E z]]]]

:J

[x

=

z])

That is, two sets that have no elements are equal. In order that 0 be an efficient means of communication, we must be able to assert ST 8. ST8

[5(0) /\ (Vy)-[y E 0])

Chapter 9

154

We next show how SD 2 and ST 4 can be used to prove ST 8. Similar arguments suffice to establish ST 9. ST 9

[5(z):::>

[0

c z]]

To prove ST 8, we let B(z) abbreviate [5(z) from PLA 7 that

1\

(Vy) '" [y E z]], and deduce

Consequently, by an application of PLA 2, [[[z

= 0]

:::> B] :::>

[[z

= 0]

:::>

Bz (0)]].

Now SD 2, the material equivalence of (Vy) [[ y E z] :::> '" [y = y]] and (Vy) '" [y E z], and TM 5.15 allow us to assert [[z = 0] :::> B]. Hence, by PRI I, [[z

= 0]

:::>

Bz (0)].

From this we can, by applying in succession PLA 3, PRI2, PLA 4, and PLA 3 again, establish

Finally, appealing to ST 4 and PRll allows us to deduce Bz (0), as was to be demonstrated. We have sketched the proofs of ST I, ST 3-ST 6, and ST 8 to illustrate how proofs of KPU theorems can be written as sequences of wffs in strict adherence to the definition of a proof given in chapter 3. These proofs are lengthy. Therefore, in the remainder of the chapter, I shall present only informal proofs of the theorems asserted. The details necessary to transform our proofs into sequences of wffs are left to the reader. 9.3 Unions, Intersections, and Differences

In the intended interpretation of L, KA 6 insists that there is a set that contains x and y. By KA 8, there is a set that contains only x and y. By KA 4, there can be only one such set. I denote this set by {XI y} and introduce the term {x, y} in L by SD 3. SO 3

[[z

= {XI y} I ==

[5(z) /\ [[[x E zl /\ [y E z]] /\ (Vu E z) [[u

=

xl v [u

=

y]]]]]

Then arguments similar to those used to prove ST 8 can be used to prove ST 10.

Elementary Set Theory

ST 10

155

[5( {x, y}) /\ (Vu)[[u

E {x, y}]

==

[[u

=

x]

v [u = y]]]]

When [x = y], {x, y} contains only one element. We usually denote the

singleton {x, x} by {x}. This notation is unambiguous as witnessed in the following simple theorems, the proofs of which I leave to the reader. ST 11

[[{x} = {y}]

ST 12

[[{x}

ST 13

[[{ {x}}

==

[x = y]]

= {x, y}] == =

[x

= y]]

{{x}, {x, y} }]

==

[x

=

y]]

9.3.1 Unions

Axiom KA 7 introduces the existence of unions of sets in a subtle way. In the intended interpretation of the axioms, it postulates that if x is not empty, there exists a set z which equals the union of the elements of x. The presumption that x exists and is nonempty is important. If x l' ... , xn is a sequence of sets, the axiom alone does not allow us to infer that there is a set z that equals the union of these sets. Such an inference is possible only if we first establish the existence of a set w whose elements are the x/ s. Without this restriction on the existence of unions, we could prove that the universe is a set and thus establish a result that contradicts ST 6. We can introduce notation for the union of a pair of sets, x and y, by SD4. SD 4

~

[[5(x) /\ 5(y)]

[[z

=

(x U y)]

==

[5(z) /\ [(Vu EX) [u E z]

/\ [(Vv E y)[v E z] /\ (Vw E z)[[w E x] V [w E y]]]]]]]

To establish the existence condition for x u y we note first that the existence of {XI y} and KA 7 imply that there is a set i such that [(Vu E x)[u E i] 1\ (Vv E y)[v E i]l. Then we use i and KA 8 to form a set z such that [[u E z] =: [[u E i] 1\ (:3w E {x, y})[u E w)). This z is the union of x and y we are searching for. Its uniqueness is a consequence of KA 4. From this it follows that the union of a pair of sets is well defined by SD 4. Moreover, arguments similar to those used to establish ST 8 suffice to prove the validity of ST 14. ST 14

[[5(x) /\ 5(y)]

~

[5((x

u y)) /\ (Vu)[[u

E (x U y)]

==

[[u E x] V [u E y]]]]]

From ST 14 and the properties of v (see T 4.8, T 4.9, and T 4.15), we infer ST 15 and ST 16. ST 15

[5(x)

~

[x

=

(x

U

x)]]

Chapter 9

ST 16

156

[[S(x) A S(y)]

=

[(x u y)

~

(y U x)]]

In reading SO 4 and the theorems that followed, note that I provide only a partial definition of u. To introduce u as a function symbol in L, I must also specify the value of U on pairs x, y that are not sets; e.g., ["" [S(x)

1\

S(y)]

::J

[[z = (x u y)]

==

[z = 0]]].

Here, as well as in the remainder of the chapter, I shall leave it to the reader to add such details. 9.3.2 Intersections and Differences From the existence of the sets x and y and from KA 8, we can deduce the existence of the intersection of x and y and the difference of x and y. Since the uniqueness of intersections and differences is a consequence of KA 4, I can introduce notation for such sets by SO 5 and SO 6. SD 5

[[S(x) A S(y)] ~ [[z A

SD 6

[[S(x) A S(y)] ~ [[z A

= (x n

y)]

==

[S(z) A [('tu E x)[[u E y] ~ [u E zll

('tw E z) [[w E x]

=

(x - y)]

==

A

[w E yllllll

[S(z) A [('tu EX) ['" [u E y] ~ [u E z]]

('tw E z) [[w E x]

A

'"

[w E y]]]]]]

Then arguments similar to those used to establish ST 8 suffice to prove ST 17 and ST 18.

==

ST17

[[S(x) A S(y)] ~ [S((xny)) A ('tU)[[UE(xny)]

ST 18

[[S(x) A S( y) 1 ~ [S( (x - y)) A ('tu)[[u E (x - y) 1 == [[u E

From ST 17 and the properties of ~

=

n

ST 19

[S(x)

ST 20

[[S(x) A S(y)] ~ [(x n y)

ST 21

[S(x) ~

[x

(x

[0 =

(x

1\,

[[UEX] A [uEY]]lll

xl

A '"

[u E yll]]]

we deduce ST 19-5T 21.

x)]]

=

(y n x)]]

n 0)]]

9.4 Product Sets

As soon as notation for singletons {x} and unordered pairs {x, y} have been introduced, notation for ordered pairs can be introduced. I denote the ordered pair in which x is the first component and y the second by (x, y) and define it as in SO 7.

Elementary Set Theory

SO 7

[[z

= (x, y)] ==

157

[5(z)

1\

= {{x}, {x, y}}]]]

[z

Then KA 6 and the existence of {x} and {x, y} imply the existence of a set that equals {{x}, {x, y}}. Since {x} and {x, y} are well defined, we deduce from KA 4 that there is only one such set. Consequently, we can introduce the term (x, y) in L via SO 7, and we can show ST 21. ST 22

[5(x, y)

1\

[(x, y)

=

{{x}, {x, y} }]]

Ordered pairs, although they look strange, have two important properties. First, they are sets; i.e., they belong to the universe. Second, they possess all the properties usually associated with ordered pairs. One of these is given as ST 23. ST 23

[[(x, y)

=

(u, v)]

==

[[x

=

u]

1\

=

[y

v]]]

To establish ST 23, observe first that ST 3, x = u, and y = v imply that (x, y) = (u, v). Next suppose that (x, y) = (u, v). Then by ST 3, {x} = {u} or {x} = {u, v}. If {x} = {u}, x = u by ST 11. If {x} = {u, v}, x = u, or x = v. If x = v, x = u also by ST 12. Hence (x, y) = (u, v) only if x = u. By ST 3, we must also have {x, y} = {u} or {x, y} = {u, v}. If {x, y} = {u}, then {x, y} = {x}, and by ST 3, {u, v} = {x}. Hence, by ST 12, Y = x and x = v; that is, y = v. If {x, y} = {u, v}, y = u or y = v. If y = u, then y = x, {u, v} = {x}, and v = x by ST 12. So y = v again, and we have shown that (x, y) = (u, v) only if y = v. A product set is a set of ordered pairs. To he more precise, we proceed as follows: Let x and y be sets. The cross product of x and y is a set of ordered pairs denoted by x x y and defined by SO 8. SO 8

[[5(x)

1\

5(y)] ::::> [[z 1\

=

(x

x

y)]

==

[5(z)

1\

[('fu E x)('fv E y)[(u, v) E z]

('fw E z)(3u E x)(3v E y)[w

=

(u, v)]]]]]

It is not obvious that there is a set with the properties specified in SO 8. We shall sketch an outline of a proof that such a set exists. 2 In doing this, we illustrate the meaning of KA 9. To show that x x y exists, we must first show that if x and yare sets, there is a set c such that (i) (Vu E x)(Vv E y)(3z E c)[z

=

(u, v)].

Let u E x be given and let cp abbreviate [z and (Vv E y) (3z)cp. Hence, by KA 9, (ii) (3d)[S(d) /\ (Vv E y)(3z E d)[z

=

=

(u, v)]].

(u, v)]. Then cp is a ~o formula

158

Chapter 9

Next let I/J abbreviate [5(d) /\ (Vv E y)(3z E d)[z formula and (Vu E x)(3d)l/J. Hence, by KA 9, (iii) (3e) [5(e) /\ (Vu

=

(u, v)]]. Then

I/J is a do

E x) (3d E e) I/J].

By KA 7, (3e)[5(e) /\ (Vd E e)(Vz E d)[z E ell. By combining assertions (ii) and (iii), we see that this e satisfies (i), as was to be shown. Let e be a set that satisfies (i). By KA 8 there is a set 1 that satisfies (iv) [5(/) /\ (Vz)[[z E f]

==

=

[[z E e] /\ (3u E x)(3v E y)[z

(u, v)]]]].

By combining assertions (i) and (iv), we see that, according to SD 8,

[I =

(x

x y)].

The uniqueness of (x x y) follows from KA 4. Hence, for pairs of sets, the term (x X y) is well defined by SD 8. Moreover, standard arguments suffice to prove ST 24. ST 24

[[S(x) /\ S(y)]

::::>

[S((x x y)) /\ (Vu)[[u E (x X y)]

==

(3v E x)(3w E y)[u

=

(v,w)]]]]

We can also show that if x is a set of ordered pairs, there is a pair y, z in the universe such that [x c (y X z)]; i.e., ST 25

[[S(x) /\ (Vu E x)(3e E u)(3v E e)(3w E e)[u ~ (3y)(3z)[[5(y) /\ 5(z)] /\ [x c

=

(v, w)]]

(y X z)]]]

Finally, ST 24 and the properties of /\ imply ST 26 and ST 27.

= 0]

ST 26

[[[S(x) /\ S(y)]

ST27

[[[x c u] /\ [y c v]]::::> [(x

::::>

[[x

v [y X

= 0]]]

y) c (u

X

::::>

[(x

X

y)

= 0]]

v)]]

9.5 Relations and Functions

A binary relation is a set of ordered pairs. Conversely, a set of ordered pairs is a binary relation. In symbols this becomes [rel(R)

==

[5(R) /\ (Vu E R)(3e E u)(3x E e) (3y E e)[u

=

(x, y)]]].

If R is a binary relation, then by ST 25 there are sets A R and BR such that R c (A R X BR). We use A R and BR to define the domain of R, dom(R), and the range of R, rang (R): [[z

= dom(R)] ==

[5(z) /\ [(Vx E z)(3y E BR)[(x, y) E R] /\ (Vx E AR)(Vy E BR)[[(x, y) E R] ~ [x E z]]]ll,

Elementary Set Theory

[[z

=

rang(R)]

==

159

[5(z) 1\ [(Vy E z)(3x E AR)[(x, y) E R] 1\

('Ix E AR)(Vy E BR)[[(x, y) E R]

::J

[y E z]]]]].

The pair (A R , BR ) is not uniquely determined. However, dom(R) and rang(R) are the same, no matter how we choose A R and BR • E 9.3 [[R t

C

Let x and y be sets and suppose that (x

x

y)] /\ (Vu) ('Iv) [[(u, v) E R t ]

==

[flu E x] /\ [v E y]] /\ [u E vl]]].

Then Rt is a set of ordered pairs. Hence it is a relation. Also (Vu)(Vv)[[(u, v) Rt ] == [u E v]]]. Next suppose that

E

(x x y)] ~ [[(u, v) E

[[R= c (x x y)] /\ (Vu)(Vv) [[(u, v) E R=]

==

[flu EX] /\ [v E y]] /\ [u

= vl]]].

Then R= is a set of ordered pairs. Hence it too is a relation. Also, (Vu)(Vv)[[(u, v) E (x X y)] ::J [[(u, v) E R=] == [u = vl]]. In concluding, we note that the existence of x, y and x x y, together with KA 8, ensure that Rt and R= denote objects in the universe.

For any given set x, a relation R in (x x x) is said to be reflexive if and only if (Vu E x)[(u, u) E R]. R is symmetric if and only if (Vu E x)(Vv E x)[[(u, v) E R]

::J

[(v, u) E R]].

R is antisymmetric if and only if (Vu E x)(Vv E x)[[[(u, v) E R] 1\ [(v, u) E R]]

::J

[u = v]].

Finally, R is transitive if and only if (Vu E x)(Vv E x)(Vw E x)[[[(u, v) E R] 1\ [(v, w) E R]]

::J

[(u, w) E R]].

We shall often refer to these properties of relations. Here we merely observe that, if [x = y] in E 9.3, RE need have none of the properties above, whereas R= has three of them: reflexivity, symmetry, and transitivity. A relation with the properties of R= is called an equivalence relation. A relation that is reflexive, antisymmetric, and transitive is called a partial order. One example of a partial order is the predicate ~ in (111 I x 111 I). A partial order R in (x X x) is a total order if and only if (Vu E x)(Vv E x)[[(u, v) E R)]

v [(v, u)

E

R]].

The predicate ~ is a total order in (111 I X 111 I). A unary function F is a binary relation with the following property: [[[ (x, y) E F] 1\ [(u, v) E F]]

::J

[[x

=

u]

If F is a unary function and (x, y)

::J

E

[y = v]]].

F, we usually write y

=

F(x). We

160

Chapter 9

also write F('): A -4 B to say that F is a function with dom(F) = A and rang(F) c B. Finally, we shall insist that [func(F)

== [rel(F)

=

[x

1\

1\

(Vz

E

F)(Vw E F)[[[[z

u]] ::::> [y

=

= (x, y)]

[w

1\

= (u, v)]]

v]]]].

9.6 Extensions

In the preceding sections we considered unordered and ordered pairs of sets and unions and intersections of pairs of sets. The results we obtained can be generalized to n-tuples of sets. We consider triples of sets here; I leave the general case to the reader. Unordered and ordered triples are defined so that [{Xl,Xl,X3}

=

({Xl,Xl}

U

and

{X3})]

=

[(X l ,Xl,X3 )

((X l ,Xl ),X3 )].

Similarly, unions and intersections of triples of sets are defined so that

Finally, the product of a triple of sets is defined so that [(Xl x Xl X x 3 )

=

((Xl X Xl) x X3)].

For later reference we note that the operations u and n are associative; see ST 28 and ST 29. ST 28

[[[S(x)

1\

S(y)]

1\

S(z)] ~ [((x U y) U z)

=

(x u (y u z))]J

ST 29

[[[S(x)

1\

S(y)]

1\

S(z)] ~ [((x

=

(x n (y

(J

y)

(J

z)

(J

z))]J

In addition, n distributes over u and u distributes over n as in ST 30 and ST 31. ST 30

[[[S(x)

1\

S(y)]

1\

S(z)] ~ [(x

(J

(y u z))

=

((x

(J

y) u (x

(J

z))]J

ST 31

[[[S(x)

1\

S(y)]

1\

S(z)] ~ [(x

U

(y

(J

z))

=

((x

U

y)

U

z))]]

1\

[[v

=

y]

1\

[w

(J

(x

Finally, ordered triples satisfy ST 32. ST 32

[[(u, v, w)

=

(x, y, z)]

==

[[u

=

x]

=

zJ]]]

Sets of ordered triples are ternary relations. Some of these in tum are binary functions. In the next section we shall find sets in the universe that can represent natural numbers, and we shall construct the binary functions that add and multiply them.

Elementary Set Theory

161

9.7 Natural Numbers

There are many ways of defining sets that can represent natural numbers in our universe. One of them is as follows. We begin by defining a unary functional symbol as in SO 9. SO 9

=

[5(x):::> [[z

Y'(x)]

==

[5(z)

1\

[z

=

(x U {x} )]lll

For all sets x there is a set z that equals (x u {x}). Since this set is uniquely determined, !/ is well defined on sets by SO 9. We also have ST 33. ST 33

[5(x):::> [[y E Y'(x)]

==

[[y E x] V [y

= x]lll

Next we define two useful unary predicates, trans (.) and ord('), in SO 10 and SO 11. SO 10

[trans(x)

SO 11

[ord(x)

==

[5(x)

1\

== [trans(x)

('tty E x) ('ttz E y) [z E xlll 1\

('tty E x) trans(y)]]

SO 10 asserts that x is transitive, and SO 11 insists that x is an ordinal number. Since 5(0), '" [y E 0], and ['" [y E 0] ::::> [[y E 0] ::::> ('v'z E y)[z E 0]]],

we have trans (0), Similarly, trans(0), '" [y E 0], and ['" [y E 0] ::::> [[y E 0] ::::> trans(y))) imply ord(0). Also, [trans(x)::::> trans(!/(x))] and [ord(x)::::> ord(!/(x))] are easy consequences of SO 10, SO 11, and ST 33. For later reference, we record two of these observations in ST 34 and ST 35. ST 34

ord(0)

ST 35

[ord(x):::> ord(Y'(x))]

Finally, we define the unary predicate nat( . ) in SO 12. SO 12

[nat(x)

==

[ord(x) 1\

1\

[[x =

('tty E x)[[y

0] v

= 0]

[(3y E x)[x = Y'(y)]

v (3z E y)[y = Y'(z)llllll

In the intended interpretation of KPU, nat(x) asserts that x is a natural number. Evidently, we have ST 36 and ST 37. ST 36

nat(0)

ST 37

[nat(x):::> nat (Y'(x))].

Furthermore, since [ord(x) /\ [y ST 38.

E

xl] materially implies ord(y), we have

162

Chapter 9

ST 38

[[nat(x)

A

~

[y E x]]

nat(y)]

The natural numbers in KPU have many interesting properties. The first three (ST 39-ST 41) are easily proved. They show that the natural numbers in KPU with the obvious interpretation satisfy P I, P 2, and P 8 of T(P I, ... , P 9) (i.e., of the T(P) in section 6.1). ST 39

[nat(x) ~ "" [9'(x)

= 0]]

ST 40

[[nat(x)

A

nat(y)]

~

[[9'(x)

ST 41

[[nat(x)

A

nat(y)]

~

[[x E 9'(y)]

=

9'(y)]

==

~

[x

=

y]]]

[[x E y] V [x

=

y]]]]

The fourth property, described in ST 42, can be established with the help of KA 5. It asserts that the natural numbers in KPU satisfy the principle of

complete induction. ST 42

Let IjJ be a wff in which x is free. Then

[(V'x)[nat(x)

~

[(V'y E x)ljJx(Y)

~

1jJ]]

~

(V'x)[nat(x)

1jJ]].

~

In outlining a proof of ST 42, we illustrate the meaning of KA 5. We begin by observing that if we substitute"" [nat(x) :=> t/J] for qJ in KA 5, the assertion in KA 5 becomes materially equivalent to [(V'x)[(V'y E x)[nat(y) :=> t/Jx(Y)] :=> [nat(x) :=>

t/J))

:=>

(V'x)[nat(x) :=> t/J)).

Hence, TM 5.3 and TM 5.15 allow us to assert (i) [(V'x)[nat(x)

:=>

[(V'y

E

x)[nat(y) :=> t/Jx(Y)] :=>

Next, we observe that [[A :=> [B :=> C]] is a tautology and deduce, first, that

:=>

[A

t/J)) [[B

:=>

:=>

(V'x)[nat(x) :=> t/J]].

:=>

[C

:=>

D]]

== [B

[[nat(x) :=> [[ y E x] :=> nat( y))) :=> [nat(x) :=> [[[ y E x] :=> [nat ( y) :=>

==

:=>

D]]]]

t/Jx( y)))

[[y E x] :=> t/Jx(Y)]])]

and then by ST 38, PRI2, PLA 4, and TM 5.10, that (ii) [nat(x)

:=>

[(V'y

E

x)[nat(y) :=> t/Jx(Y)]

x)t/Jx(Y))).

:=>

[B

==

E x) [nat(y) :=> t/Jx(Y))))

==

[nat(x) :=> (V'y E x)t/Jx(Y))).

From assertion ii and the fact that [[[A isa tautology, we deduce that (iii) [[nat(x):=> (V'y

== (V'y E

C))

:=>

[[A:=> B]

==

[A :=> C]])

But, if that is so, (i), (iii), the tautology [[[A:=> B] == [A :=> C]] :=> [[A :=> [B :=> D)) == [A :=> [C :=> D]]], and TM 5.15 suffice to establish the validity in KPU of the principle of complete induction.

Elementary Set Theory

163

Theorem ST 42 and easy arguments demonstrate that the natural numbers in KPU also satisfy the principle of induction expressed in P 9 (section 6.1); see ST 43. ST 43

Let ljJ be a wff in which x is free. Then [[ljJx(0)

1\

("Ix) [nat(x) ::) [ljJ ::)

ljJxVI'(x))]]] ::) ("Ix) [nat (x) ::) ljJ]].

If ST 43 is false, (3y) '" [nat(x) PRJ 1, (3x)[nat(x) /\ [('v'y

E

~

1/1]. Hence, by ST 42, PLA 3, and

x)l/Ix(Y) /\ "'1/1]].

From this we deduce that either'" I/Ix(0) or (3x)[nat(x) /\ [1/1 /\ '" I/Ix(Y(x))], i.e., that the antecedent in ST 43 is false. Hence it must be true that if the antecedent in ST 43 is satisfied, then 1/1 holds for all natural numbers. To establish the KPU equivalents of P 3-P 6, we must show that there are functions in KPU that add and multiply natural numbers. We begin with addition. To characterize addition of natural numbers in KPU, we must demonstrate the existence of a function F(·) that, for all pairs of natural numbers x and u, satisfies the following conditions: (i) F(x,

0)

= x.

(ii) F(x, Y(u)) = Y(F(x, u)). The existence of such a function in KPU is not obvious, so a few remarks are called for. Let x and y be given natural numbers and suppose that we have constructed F(x, u) for u E y recursively, in accordance with the stated conditions. Then F(x, 0) = x, F(x, Y(0)) = Y(x), F(x, Y(u)) = Y(F(x, u)) for all u E y·and F(x, y)

=

Y(F(x, v))

for the v E y at which Y(v) = y. All these sets are well defined in KPU and satisfy F(x, z) E F(x, u) if z E u, and F(x, u) E F(x, y) for all natural numbers z, u E y. From the preceding observations we deduce first that (y x F(x, y)) and (u x F(x, u)) exist for every u E y. Then we denote by Fx ~ y and Fx ~ u the sets consisting, respectively, of all ordered pairs (v, F(x, v)) with v E y and v E u, and use KA 8 to demonstrate that Fx ~ y and Fx ~ u are well-defined in KPU for y and every u E y. Finally, we let z denote the union of all the elements of z and use KA 7 to ascertain that Fx ~ y and Fx ~ u and

U

U

U

164

Chapter 9

U(U Fx u

~ y) and

U(U Fx

~ u) are well defined in KPU for y and every

E y.

It is now easy to verify that

(iii) F(x, u)

=

(x u

(U (U Fx

~ u»)

for every u E y and that

From this it follows that any function F(· ) which, for a given pair x and y, satisfies conditions i and ii for all u E y must also satisfy conditions iii and iv. Since the converse is obviously also true, we may assert that, for any given pair of natural numbers x and y, a function F( . ) will satisfy conditions i and ii for all u E y if and only if it satisfies conditions iii and iv. The existence of a· function F(.) that satisfies conditions iii and iv for every pair of natural numbers x and y is no more obvious than the existence of an F(.) that satisfies conditions i and ii. Hence, the theorem STM 1 requires a proof. STM 1 There exists a function F(· ) in the universe of KPU that satisfies the condition (i) (Vx)(Vy)[[nat(x) /\ nat(y)]

::::>

[F(x, y)

=

(x

u

(U (U F

x

~ y)))]]

Moreover, the restriction of F(.) to pairs of natural numbers is uniquely determined.

We shall prove this theorem in several steps. We begin with the uniqueness part. Suppose that there are two functions F(.) and G(·) that satisfy condition i of the theorem. We shall use ST 43 to demonstrate that they must agree on pairs of natural numbers. To that end, let x and y be natural numbers. Then F(x, 0) = x = G(x, 0). So suppose that "" [y = 0] and that F(x, u) = G(x, u) for all u E y. Then F(x, y)

=

(x

u

(U (U Fx

~ y») =

(x

u

(U (U Gx

~ y») =

From this and from ST 43, we conclude that (Vy) [nat(y) ::::> [F(x, y) = G(x, y)]].

Since x was chosen arbitrarily, we must also have (Vx)(Vy) [[nat (x) /\ nat(y)] ::::> [F(x, y)

as was to be shown.

=

G(x, y)]],

G(x, y).

Elementary Set Theory

165

To establish the existence of F(·), we first define a do predicate in KPU: [P(x, u, z,f)

== [nat(x)

1\

[nat(u)

1\

[nat(z)

1\

[('v'v E u) [f(v) = (x u

1\

[z

1\

[[func(f)

(U (U f

1\

[dom(f)

=

u]]

~ v)))]

= (xu(U(Uf~u)))]]]]]]].

This predicate has several interesting properties. The first is an easy consequence of the uniqueness of F( .), which we established above: (ii) [[P(x,u,z,f)

1\

P(x,u,i,g)]:::> [[z

=

i] 1\ [f=g]]].

The second is an obvious consequence of the definition of P: (iii) [[P(x, u, z, f)

1\

[v E u]] :::> P(x, v,f(v),f ~ v)].

From these two properties of P we deduce that (iv) [[P(x, u, z,f) 1\ [P(x, v, i,g) 1\ [v E u]]] :::> [[w E v] :::> [f(w)

= g(w)]]].

Next we shall demonstrate that, for every pair of natural numbers x and y, there exists a unique z and an f such that P(x, y, z, f). Our proof is obtained by induction on y. Let x and y be given natural numbers and suppose that we have shown that (v) ('v'u E y) (3zJ (3fu)P(x, u, zu' fu).

Moreover, let v E y be such that Y'(v) defined by [cp(u, z)

==

[[[u E y] 1\ nat(z)] 1\ [[[u

=

= y, and let cp(.) be the do predicate v] 1\ [z

=

zv]] V P(x, u, z,fv ~ u)]]].

Then from (ii)-(v), it follows, first, that fu = fJ ~ u for all u E v, and then that ('v'u E y)(3z)cp(u, z)

and ('v'u E y)[[cp(u, z) 1\ cp(u, w)] :::> [z

=

w]].

But if that is so, we can use, first KA 9 to establish a set B( y) such that

t~e

existence in KPU of

('v'u E y)(3z E B(y))cp(u, z)

and, then, KA 8 to establish the existence of an f in KPU such that [[func(f)

1\

[dom(f)

=

y]] 1\ ('v'w) [[w E f]

==

[[w E (y x B(y))] 1\ cp(w)]]].

166

Chapter 9

It follows from the definition of cp(.) and from conditions ii-v above that Zu

=

for all u f(y)

E

f ~u

and

f(u)

fv ~ u

=

y. Consequently, we may extend f to !/( y) by

= (xu(U(Uf~Y)))

and deduce that (vi) P(x, y,f(y),f ~ y). From conditions ii, v, and vi and from ST 42, it follows that, for every pair of natural numbers x and y, there exist a unique z and an f such that P(x, y, z,f), as was to be shown. The preceding observations imply that there is a function F(·) in KPU that is well defined on pairs of natural numbers by (vii) [[nat(x) /\ nat(y)]

~ [[F(x,y)

= z]

==

(3f)P(x,y,z,f)]]

and satisfy condition i of STM 1. To wit: if x, y, and z are natural numbers and f is a function with domain y that satisfies P(x, y, z, f), then F(x, y) = (x u

(U (U f ~ y)))

and, by (ii) and (iii), F(x, u) = f(u) for every u E y. Hence, F(x, y) = (x u

(U (U F

x

~ y))).

So much for STM 1. Next we let F(·) denote a function in KPU that satisfies condition vii and hence condition i of STM 1. Moreover, we let R+(x, y, z)

=df

[nat(x) /\ [nat(y) /\ [nat(z) /\ [F(x, y)

Then the operation SO 13

+

=

zm].

on natural numbers is well defined by

[[nat(x) /\ nat(y)]

::::>

[[z

=

(x

+ y)] ==

R+(x, y, z)]]

To see why, consider E 9.4. Let 0 =df 0, 1 =df 9'(0), and (n + 1) =df 9'(n) for n 1 = {O}, 2 = {O, I}, and (n 1) = {O, 1, ... ,n}. Furthermore F(n, 0) = (n u 0) = n: E 9.4

F(n, 1)

+

= (n u ({O} u {O, F(n, O)})) = (n U ({O} U {F(n,O)})) = (n U {n}) = (n + 1)

=

2, 3, .... Then

Elementary Set Theory

167

= (n u (( {O} u {O, F(n, 0) }) u = (n U ( {n} U {n u {n} } )) = ((n U {n}) U {n U {n} }) = ((n + 1) + 1) = (n + 2).

F(n, 2)

( {I} u {I, F(n, 1) })))

But if our definition of + is well defined by SO 13, the KPU analogues of P 3 and P 4 become ST 44 and ST 45.

= (x + 0)]]

ST 44

[nat(x)::::> [x

ST 45

[[nat(x) /\ nat(y)] ::::>

[(x

+ Y'(y)) = Y'(x + y)]]

To establish the KPU equivalents of P 5 and P 6 we first record the theorem STM 2. There is a binary function G in the universe of KPU such that

STM 2

[Inat [G(X,y) =

.Y, (G(x,z) + X)]}

The restriction of G to natural numbers is uniquely determined. Here Z E y.

UZEY (G(x, z) + x) denotes the union of all the sets (G(x, z) + x) for When y = 0, this union is empty; and when v E y and Y(v) = y, the

union equals (G(x, v) + x). The proof of the existence of G(') is analogous to the proof of STM 1. Hence for brevity's sake, I leave the proof of STM 2 to the reader. Next I introduce a ternary predicate R. by

R. (x, y, z)

=df

[nat(x) /\ [nat(y) /\ [nat(z) /\ [G(x, y)

=

z]]]].

Then R. is well defined and the operation . on natural numbers can be defined by SO 14. SD14

[[nat(x) /\ nat(y)]::::> [[z

=

(ry)]

==

R.(x,y,z)]]

To see why, consider E 9.5. E 9.5

Let 0, 1, 2, and (n

G(n, 0)

=

+ 1) be as defined in E 9.4. Then

0

= (G(n, O) + n) = n G(n, 2) = ((G(n,O) + n) u (G(n, 1) + n)) = (n U (n + n)) = (n + n).

G(n, 1)

But if this is so, then the KPU equivalents of P 5 and P 6 can be stated, respectively, as ST 46 and ST 47.

168

Chapter 9

~

=

ST 46

[nat(x)

ST 47

[[nat(x) /\ nat(y)] ~ [(x- 9"(y))

[(x· 0)

0]]

=

((x- y)

+ x)))

From ST 36-ST 47 and the obvious equivalent of P 7, ST 48, we see that there is an interpretation of P 1, ... , P 9 in KPU in which 0, 5, and < correspond, respectively, to 0, !/, and the restriction of £ to the natural numbers in KPU, and in which + and· correspond to the operations + and· defined above. [nat(x) ~ "" [x E

ST 48

0]]

Specifically, if ~ is any model of KPU, we can use the individuals in I~ I that satisfy nat(·) to represent natural numbers and we can associate 0, 5, oo Ilx m - yll = 0, i.e., if and only if, for every e > 0, there is an m O so that, for all m > m O, II~ - yll < e. DO 2 Let x m ERn, m = 1,2, .... The sequence x m is convergent if and only if there is a vedor x E Rn such that x m converges to x. The vedor x is the limit of x m, and we write limm-->oo x m = X. DO 3 If X is a subset of R n, X is closed if and only if ev,ery convergent sequence of vedors in X has a limit in X. It is open if and only if its complement is dosed. Finally, it is bounded if and only if there is an integer N such that Ilxll :::; N for all x E X.

Note that conditions i and iii of UT 1 imply that the limit of a convergent sequence is uniquely determined. Note also that a finite union (intersection) of closed (open) sets is closed (open). Moreover, note that Rn as well as 0 are both closed and open, R~ is closed, and R~ + is open. In addition, for a, hER and a < h, the set {x E R : a < x < h}, denoted (a, h), is open, and the set {x E R : a ~ x ~ h}, denoted [a, h], is closed. UT 2 Suppose that x m ERn is a bounded sequence. Then x m contains a convergent subsequence. Furthermore, x m converges to x ERn if and only if the limit points of all convergent subsequences of x m equal x.

We shall encounter many functions on Rn to R. They can be characterized in various ways. DO 4 Suppose that X eRn and that f( . ): X ~ R. Then f ( . ) is increasing (decreasing) if x :::; y implies that f(x) :::; (~) f(y). It is strictly increasing (decreasing) if x :::; y, x =f. y imply f (x) < (> ) f (y). Finally, f is monotonic if it is increasing or decreasing.

Consumer Choice under Certainty

179

UO 5 Suppose that X c R" and that f (. ): X -. R. Then f (. ) is continuous at y E X if and only if, for every sequence x m E X such that lim m--> 00 x m = y, limm-->oo f(x m ) = f(y). Moreover, we say that f is continuous if it is continuous at every y E X. UO 6 If Xc R" and g(.): X -. Rm, g(.) is continuous if and only if gi(·) is continuous, i = 1, ... , m.

Since Illyll - Ilxmlll ~ IIY - xmll, it is clear that 1 /1 is a continuous function on R". It is also clear that a convergent sequence is bounded. From this and Ixy - xmyml ~ Ilyllllx - xmll + Ilxmlllly - ymll, it follows that the innerproduct function is continuous on Rn x Rn. Also, if X eRn, f( 0); X ~ R, g( 0): X ~ R, h( 0); X ~ Rm, and F("); {y E Rm ; y = h(x) for some x E X} ~ R k are continuous functions, then (f + g)(o)(=f(o) + g(")), (fg)(.) (= f(· )g( 0)), and F(h( 0)) are continuous on X, while (f/g)( 0) (= f( 0)/g( 0)) is continuous on {x EX; g(x) =1= a}. Finally, if A is a closed, bounded subset of X, then {y E Rm ; y = h(x) for some x E A} is closed and bounded, and if B is an open subset of Rm, {x EX; h(x) E B} is open. 0

UT 3 Let X be a closed, bounded subset of R". If f(·): X -. R is continuous, there exist y, z E X such that f(y) ~ f(x) ~ f(z) for all x E X.

Certain classes of sets and functions are particularly important in economic theory. We define some of them below. UO 7 Let X be a nonempty subset of R". Then X is convex if and only if x, y and A E [0, 1] imply AX + (1 - A)y E X.

E

X

UO 8 Let X be a nonempty convex subset of R" and let f(·); X -. R. Then f(") is (strictly) quasi-concave if and only if x, !/ E X, x ::J:. y, and A E (0, 1) imply f(AX

+ (1

-

A)y)

~

(» min(f(x),f(y)).

f(·) is (strictly) concave if and only if x, !/ Af(x)

+ (I

- A)f(y) ~ «)f(AX

+ (I

E X,

x ::J:. !/, and A E (0, 1) imply

- A)y).

The closure of a set X consists of the set X itself and all points that are limits of convergent sequences of points in X. The interior of a set is the complement of the closure of the set's complement. It is easy to show that both the closure and the interior of a convex set are convex. It is also easy to verify that if X c Rn is convex and f( 0): X ~ R is quasi-concave, then the set {xEX;f(x) ~f(y)} is convex for all yEX. Finally, it is clear that a concave function is quasi-concave. The converse is untrue, as witnessed in X = R;+ and f( 0) : R;+ ~ R defined by f(x) = (Xl Xl)l. This function is strictly quasi-concave but not concave.

Chapter 10

180

There is one property of convex sets and several properties of concave functions that we shall frequently use. We record them in UT 4-UT 6 for ease of reference. UT 4 Suppose that X is a convex subset of R" and that y E R". There is a z E R" - {O} such that X is contained in {x E R": zy ~ zx} if and only if y does not belong to the interior of X. UT 5 Suppose that X is a nonempty convex subset of R" and that f('): X ~ R is concave. Then f (.) is continuous on the interior of X.

A function on an open subset X of R to R is differentiable if its first derivative is continuous on X. It is twice differentiable if its second derivative is continuous on X. Twice-differentiable concave functions can be characterized as in UT 6. UT 6 Suppose that X is a nonempty, open, convex subset of R and let f('): X ~ R be a twice-differentiable function. Then f (.) is concave if and only if f" (x) ~ 0 for all x E X. It is strictly concave if f"(x) < 0 for all x EX.

To condude our discussion of universal terms and theorems, we look at three useful lemmas. The first, UT 7, concerns a property of certain sequences of sets. 1 UT 7 For each pair (p, A) E R~ X R+, let F(p, A) = {x E R~ : px ~ A}. Suppose (pm, Am) E R~+ x R+ is a sequence that converges to (pO, AO) E ((R~ {O}) x R+ +) u (R~ + x R +). If XO E F(pO, AO), then there exists a sequence x m such that x m E F(pm, Am) and lim m _ oo x m = xO.

The second, UT 8, establishes an equivalence relation among real-valued functions on R ~ . UT 8 Suppose that f('): R~ ~ Rand g('): R~ ~ R are continuous, strictly increasing functions. Then there exists a continuous, strictly increasing function G('): {range of g(')} ~ R such that, for all x E R~,f(x) = G(g(x» if and only if, for all y E R~, {z E R~ :f(z) ~ f(y)} = {z E R~ :g(z) ~ g(y)}.

The third, UT 9, describes necessary and sufficient conditions that a vector in R~ solves a quasi-concave programming problem. 2 In the statement of the lemma a function on an open subset X of R" to R is taken to be differentiable if its partial derivatives are continuous on X. UT 9 Let A c Rn be open and suppose that R~ c A. Moreover, let U(·): A ~ Rand G('): A ~ R be differentiable quasi-concave functions such that eiG(x)l()x i =I=- 0 for all x E {x E R~ : G(x) ~ O}, i = 1, ... , n. Finally, suppose that there is an x E R~ - {O} such that G(x) > o. If XO E R~ and oU(xO)loxi =I=- 0 for some i for which Xi > 0, then a necessary and sufficient condition that XO maxi-

Consumer Choice under Certainty

181

mizes U('), subject to the conditions x A E R+ such that

°

8U(xO)

-8-

°. -8C(xO) 8 - ~ 0,

+A

Xi

i

=

E R~

and C(x)

~ 0,

is that there is a

1, ... , n

Xi

O O f X? (OU(X ) + AO. 8G(X )) = 0, ox; ox;

i=l

and AOG(XO)

=

0.

10.2 A Theory of Choice, T(H 1, ... , H 6)

Next I present the axioms of the theory of consumer choice under certainty. The axioms chosen as a basis for the theory are less general than they could have been. However, they allow me to present the theory as simply as possible. 10.2.1 Axioms

The axioms of the theory of consumer choice under certainty concern the characteristics of various undefined terms named commodity bundle, price, consumer, and consumption bundle. These terms satisfy the postulates H 1-H6. X E R~.

H1

A commodity bundle is a vector

H2

A price is a vector p E

H3

A consumer is a triple (V('), X, A), where

X c R~; H4 R~+

A E R+; and

V(·): X -+ R+.

A consumption bundle is a vector c E R~ which, for some pair (p, A) E x R+, satisfies c E X, pc ~ A, and V(c) = maxxEr(p,A) V(x), where r(p, A)

{x EX: px ~

H5

R~+.

X

=

A}.

= R~.

H 6 V( . ) is continuous, strictly increasing, and strictly quasi-concave and has differentiable level sets in (R~ - R~+) - {0}.3

In reading these axioms, note that UT 4 and the properties of V(·) postulated in H 6 ensure that every commodity bundle is a consumption bundle in some (p, A) situation; i.e., T 10.1.

182

Chapter 10

If x

T 10.1

E R~,

there is a pair (p, A)

E R~+

x R+ for which x is a consumption

bundle.

Example E 10.1 illustrates this fact. Let n = 2 and let V(x) = (Xl + 2)(xz + 2), x E R~. If Xo E Ri, it is a consumption bundle for P = (1, (x? + 2)/(x~ + 2)) and A = x? + Pzx~. To see why, let ,1.0 = x~ + 2 and apply UT9 with G(x) = (x? - Xl) + [(x? + 2)1

E 10.1

(x~

+ 2)](x~

- xz).

The properties of V(·) also imply T 10.2. For each pair (p, A) E R~+ x R+ there is one and only one consumption

T 10.2 bundle.

To see why, observe first that p > 0 and the continuity of the inner product, respectively, imply that r(p, A) is bounded and closed. From this, UT 3, and the continuity of V(·) follows the existence of a consumption bundle for (p, A). The uniqueness of this consumption bundle is a consequence of the convexity of r (p, A) and the strict quasi-concavity of V(· ). T 10.3 follows from T 10.1 and T 10.2. Let f(·): R~+ x R+

T 10.3

-+ R~

be so that, for every pair (p, A)

E R~+ X

R+,

f (p, A) is the consumption bundle corresponding to (p, A). Then f (.) is well defined and maps R~+ x R+ onto R~.

It is usually difficult to describe the way a given consumer's consumption bundle varies with (p, A). However, as E 10.2 shows, it is easy for the consumer in E 10.1. E 10.2 Let n = 2 and V(x) = (Xl + 2)(xz + 2), x ~ 0, as in E 10.1. In addition, let f (p, A) be the consumption bundle corresponding to (p, A). Then

if PI ~ pz + A12, pz > 0, A ~ 0 + (PZlpl) - 1, (AI2pz) + (Pllpz) if 0 < p < (pz + (AI2), PI + (AI2)) (Alpl'O) if pz ~ PI + A12, PI > 0, A ~ O. (O,AIPZ)

f(p,A)

«AI2PI)

= {

1)

To see why, apply UT9 with G(x) = A - PIX I - pzXz and ,1.= 2lpz, [A + + PZ))/2PIPZ and 21PI according as ft (p, A) = O,f(p,A) > 0, or fz(p,A)

2(PI

=

O.

10.2.2 The Intended Interpretation of T(H 1, ... , H 6)

One can find many different interpretations of T (H 1, ... , H 6) in the economic literature. We shall discuss several of these interpretations in this and the next chapter and others in later chapters. Traditionally, the components

Consumer Choice under Certainty

183

of x in H I have been interpreted as denoting units of ordinary commodities such as apples, oranges, cheddar cheese, wheat flour, pencils, shoes, or hockey sticks. The units of measurement could be a pound for the first four, a pair for shoes, and the natural unit for pencils and hockey sticks. In modern treatments of consumer theory, the components of x can also denote such varied items as hours of leisure, gallons of gasoline, (driven) miles of a privately owned car, and hours of various recreational activities. For each component of a commodity vector, there is a component of the price vector p in H 2 which denotes the number of units of account that is needed to purchase one unit of the commodity in question. The unit of account may be an American dollar, a Norwegian krone, a horse, or a cow, depending on which real-life situation is to be described. A consumer is usually considered to be an individual living alone-or a family living together with a common household budget-who has available funds equal to A. This supply of funds may be taken to be the consumer's income during a certain period, or alternatively, the amount that the consumer has decided to spend on commodities. The function V(·) in H 3 is the consumer's utility function. Economists used to insist that the value of V(·) at a vector x measured the utility which the consumer derived from consuming x. Today V(·) is taken to be a utility indicator which shows how the consumer ranks different commodity vectors. We shall later see that the latter is the only sensible interpretation of V(·). The set X consists of all currently available commodity bundles. I insisted that X be R~ in order to simplify the presentation of the theory. In mathematically more advanced discussions of consumer choice, X is taken to be a closed, convex subset of Rn whose shape or form reflects various unspecified constraints on what commodity bundles the consumer can consume. Some of these constraints may be physiological. Others may be technical. Some may be prescribed by law and others may be imposed by religious customs. What they are will differ from one model of the axioms to another. Finally, in the traditional interpretation of T(H I, ... , H 6), it is assumed that the consumer in the market will purchase the commodity bundle that maximizes the value of V(·) subject to his budget constraint, px ~ A and x ~ O. Therefore, a consumption bundle in H 4 is a commodity bundle which the consumer would want to purchase in a certain (p, A) situation. Theorem T 10.2 showed that the consumption bundle is uniquely determined. Below, T 10.5 adds that the consumer's choice of c is independent of the unit in which we measure p and A and that the consumer always spends all his funds on c.

184

Chapter 10

10.2.3 Sample Theorems

We can derive many theorems from H I-H6. In this section I present theorems that provide interesting characterizations of two new concepts, the consumer's set of consumption bundles and the demand function. The consumer's set of consumption bundles, C(A), consists of all commodity bundles that are consumption bundles relative to some pair (p, A) E R~+ x R+, with A = A.

01

02 Let f(·): R~+ x R+ -+ R~ be such that, for each (p,A) E R~+ x R+, f(p,A) is the consumption bundle corresponding to (p, A). Then f (.) is the demand function. Moreover, f(·, A): R~+ -+ R~ is the consumer's demand function.

According to T 10.3 there is only one function with the properties specified in 02. Hence, the name "the demand function" is appropriate. T 10.3 also implies T lOA. T 10.4

C(A)

= {x E

R~: x

= f(p, A) for some p E

R~+}.

The demand function has many interesting properties, the most important of which are described in T 10.5-T 10.8. T 10.5 Let f(·): R~+ x R+ -+ tinuous and satisfies (i) pf(p, A) (ii) f(Ap, AA)

= A,

(p, A)

E R~+

= f(p, A), A >

R~

x

be the demand function. Then f(·) is conR+~

and

0, (p,A) E R~+

x R+.

The monotonicity of V(·) and the fact that r(Ap, AA) = r(p, A) imply, respectively, the validity of conditions i and ii. Thus to establish T 10.5 we need only sketch a proof of the continuity of [(.) at a given pair (pO,AO) E R~+ x R+. To that end, let (pm,Am) E R~+ x R+ be a sequence of pairs that converges to (pO, AO). We must show that the sequence x m = [(pm, Am) converges to XO = [(pO, AO). Since pO > 0, the x m are uniformly bounded. By UT 2 there is a subsequence x mk and a vector x'" such that limk-+ It. Also let Zmk = U(lIt) for It < k ~ It+ 1, t = 2, 3, .... Then Zmk E G(x mk ) for all k and limk -+ oo Zmk = yO. To conclude the proof of the continuity of g('), we let w = x or z according as yO = XO or yO =I XO and note that, by condition (vi), pmkym k ~ pmkwm k, since w mk E G(x mk ) and ym k = g(pm k, x mk ). Hence pOy'" ~ pOyo. The

last inequality, y'" E G(XO), and condition (vi) imply that y'" and from UT 2 follows the continuity of g( . ). T 10.12

Let g( . ):

R~ +

x

R~ -+ R~

=

yO. From this

be as specified in T 10.11 (i) and (ii), and let E R~ : y = g(p, x)

p = {p E R~+ : L/=l Pi = I}. If x E R~ and x =I 0, then {y for some pEP} = {y E R~ : V(y) = V(x)}.

By letting A vary with p in some determinate way, say A( . ), we can also make f (., A ( . )) trace out the indifference surfaces of V(·). This fact, stated in T 10.14, is a simple corollary of T 10.12 and 10.13. 4 Let [(.): R~+ x R+ -+ R~ be the demand function and let g('): R~+ x be as described in T 10.11 (i) and (ii). Moreover, let A('): R~+ x R~+ x R+ be so that

T 10.13

R~ -+ R~

R+ -+

[(p,A(p,pO,AO))

= g(p,f(pO,AO)),

(p,pO,AO) E R~~

x R+.

Then A(') is well defined and continuous. In addition, (i) A(p,pO,AO)

~ (:;:::) AO

if p

~ (:;:::)pO

with equality holding only if [(pO, A 0) (pO, A 0) E R~~l,

= g(p,f(pO, A 0)). Finally, for every

188

Chapter 10

(ii) A(' / PO, A 0) is concave and homogeneous of degree 1.

To establish TIO.I3, observe that TIO.S and TIO.II imply that A(p, PO, A 0) = pg(p,f(po/ A 0)) for all (p/ PO, A 0) E R~~ x R+. From this and from the continuity of g(.) and the inner product, it follows that A(·) is well defined and continuous. Next note that, for any triple (p/ PO, A 0) E R~~ x R+ such that pO ~p and [(po,AO) =l=g(p,f(po,AO)), TIO.S and T 10.11 imply that A

° = pO[(po/ A 0) < pOg(p,f(po/ A 0)) ~ pg(p,f(po/ A 0)) = A(p, PO, A 0).

Similarly, when p ~ pO and [(PO, A 0) =1= g(p,f(po/ A 0))/ A(p, PO, A 0) = pg(p,f(po/ A 0))




1].

, ... ,

C,,) E R+,

Suppose also that V(·, Pl) is strictly concave for every Pl

E

R~\.

Cn,l(r,A,Pl)

at any triple (r, A, Pl) in the region of R+ x

R~~l

where

Cn(r, A, Pl)

> O.

In our interpretation of ~ as the last period in the consumer's life, ~ is a measure of the number of periods left to live. This number decreases with the consumer's age. T 11.8 can therefore be interpreted as saying that if a consumer in two consecutive periods faced the same (r, A), his expenditures on commodity bundles in the first period would be less than in the second. It is important to note that the conclusion of T 11.8 is independent of the relative value of ex and r. Note also that T 11.8 does not suggest that consumption increases with age. Hence it does not contradict the conclusion of T 11. 7 (iii). Instead, T 11.8 implies that, if two groups of consumers with independent and stationary preference orderings of R~ differ only in that the members of one group are older than the members of the other, then the older group will have larger expenditures for current consumption than the younger group. The validity of T 11.8 is a simple consequence of the following observation: In the region of R+ x R~tI where Cn(r, A,]\) > 0, the components of CH(r, ., PI) are strictly increasing functiqns of A. This fact is due to the strict concavity of V(·, PI)' as is easy to verify. For brevity I omit the proof. Suppose that W(C) = show that

E 11.5

Cl(r,A)

and

= Ali~

i l a -

Ii=l

ai-llog Ci , C E R++.

Then it is easy to

207

Time Preference and Consumption Strategies

i

=

2, ... , ~

in accordance with both T 11.7 (iii) and T 11.8. Note also that, for ~ ~ 3, C2 (r,A)

=

(X(1

=

(X(1

=

(

+ r)CI(r,A)

I

+ r)A i~

1 - L~

(Xi-I

1) i-I

i=1 (X

A(1

+ r) I~-I .L (Xi-I, 1=1

where (1 - (lI(Ll=1 (Xi-I )))A(1 + r) is the value of the consumer's net worth in period 2 if he consumed AIL1=1 (Xi-l in the first period.

11.4 Consumption Strategies and Price Indices

We shall now answer the questions concerning price indices, current consumption, and the cost of living by establishing and commenting on a theorem whose idea originated with Robert Strotz (1957, pp. 269-285). Our proof of this theorem is based on the proposition T 11.9. T 11.9

Let f('): R~+ x R+ ---+ R~ be the demand function and suppose that

V(·) is homothetic; Le., suppose that, for all x E R~, all A E R++ and all Z E R~, V(z) ~ V(x) if and only if V(AZ) ~ V(h). Then there exists a continuous function g('): R~+ ---+ R~ such that

f(p,A)=g(p)'A,

with(p,A)ER~+

xR+.

(11.17)

The proof of T 11.9 is easy: Let g(p) = f(p, 1), P E R~+, and pick an arbitrary pair (p, A) E R~+ x R+. Then A -lpf(p, A) = 1 implies that V(A-1f(p,A)) ~ V(f(p, 1)). Hence, V(f(p,A)) ~ V(Af(p, 1)). Similarly, Apf(p, 1) = A implies that V(Af(p, 1)) ~ V(f(p, A)). From these two inequalities and the strict quasi-concavity of V('), we deduce that f(p, A) = Af(p, 1). If V(·) is homothetic, it need not be true that, for all x E R~+, V(AX) = ). V(x); i.e., V(·) need not be linearly homogeneous. 2 However, there exists

a linearly homogeneous function G('): R~ ---+ R and a strictly increasing, continuous function H('): {range of G(')} ---+ R such that V(x) = H(G(x)). We use this observation in proving the next theorem,3 T 11.10. Suppose that n = ~m; let f('): R~+ x R+ ---+ R~ be the demand function; let J;('): R~+ x R+ ---+ R~, i = 1, ... , ~, be such that f(') = ([I (.), ... ,f~(.)); and, for each i = 1, ... , ~, let T 11.10

(11.18)

208

Chapter 11

Suppose also that V(·) is separable; i.e., suppose that there exist strictly increasing, continuous, strictly quasi-concave functions Uj ('): R~ ~ R and a strictly increasing, continuous function F( .) such that V(x)

=

F(U l (Xl)' ... ' U~(x~)),

X

E R~.

(11.19)

Finally, assume that the U j ( · ) are homothetic. Then there exist continuous functions, Pj ( · ) : R':\ ~ R++, k/·): R~+ x R+ ~ R+ and gj(.): R':\ ~ R~, that are, respectively, homogeneous of degree 1, 1, and - 1 and that satisfy the relations Cj(p, A)

=

kj(P l (Pi), ... , P~(p~), A)

(11.20)

and j;(p, A)

= gj(pJCj(p,A)

(11.21)

for all (p,A) E R~+ x R+ and i

=

1, ... ,~.

When the UJ') are linearly homogeneous, F(') must be strictly quasiconcave. Note, therefore, that we can without loss in generality assume that F(') and the U j have been chosen such that the U j ( ' ) are linearly homogeneous and satisfy Uj(O) = 0, i = 1, ... , ~. With this assumption made, we first sketch a proof that equation 11.19 implies equation 11.20: Let t(·): R~+ X R+ --+ R~ be the function that, for each pair (pj, A) E R~+ x R+, satisfies the following conditions: (i) t(pj,A) E rJpj, A), where rj(pj,A)

(ii) Uj(t(Pj, A))

= max

=

{y E R~: PjY ~ A}

Uj(y).

yeri(pi,A)

Then t(·) is well defined and continuous on R~+ x R+. Since U j ( ' ) is linearly homogeneous, there exists a continuous function gj('): R~+ --+ R~ such that, for all (pj, A) E R~+ x R+ and A E R++, (11.22)

From now on we consider equations 11.22 to hold for i = 1, ... , Next let pj E R~+ and i

=

1, ... , ~.

~.

(11.23)

Since the U j(·) and gj(') are continuous functions of their arguments, and since Uj(gj(pJ) > 0 for all i = 1, ... , ~, and pj E R~+, the Pj(') are well defined and continuous on R~ +. In addition, since the UJ') are linearly homogeneous and since gj(APj) = A-lgj(pj) for A E R++ and for all i, the ~(.) are linearly homogeneous. Thus the Pj ( ' ) have the properties that the theorem requires. To establish the existence of the kj ( ' ) , proceed as follows: Observe first that, for a given p, the function of (A l' ... , A~), F( Ui (g 1 (Pi )A 1 ), ... ,

Time Preference and Consumption Strategies

209

U;;(g;;(p;;)A;;)), is strictly quasi-concave, increasing, and continuous on Ri. Hence there exists a continuous function k('): Ri+ x R+ ~ Ri which satisfies the following conditions: ;;

(i)

I ki(P (Pl)' ... , P;;(p;;), A) = A, i=l l

(ii) F(U l (gl (Pl ))k l (Pl (PI)"'" P;;(p;;), A), ... , U;;(g;;(p;;))k;;(Pl (PI)"", max

P;;(p;;), A)) =

F(U l (gl (PI ))X l , ... , U;;(g;;(p;;))x;;).

xeR\.L1=1 x;=A

Next observe that, for all (p, A) E R~+ x R+, i

=

I, ... ,

~.

(11.24)

°

To see why, fix p and A and let A? = ki(PI (PI)'"'' P;;(p;;), A 0) and Ai = Pih(P, A 0), i = I, ... , ~. Let Yi = gi(Pi)A? and Xi = h(P, A 0), i = I, ~. 00"

Then PiYi = A? and hand, PiXi = A; and V(x)

=

If=1 A? = AO imply that V(y) ~ If=l A; = A ° imply that

V(x). On the other

F(UI(XI),o .. , U;;(x;;))

~ F(U l (gi (PI )A;),

, U;;(g;;(PI )A~))

~ F(UI (gi (PI )A?),

, U;;(g;;(PI )Ag)) = V(y).

Hence V(x)

=

V( y) and x

=

y, as equation 11.24 requires. But then

C/p, A) = ki(P1 (PI), ... , P;;(p;;), A),

(p, A) E R~+ x R+,

i = I,

00"

~,

since Pigi(Pi) = I, i = I, ... , ~. Hence equation 11.20 is valid too. The proof of T 11.9 can now be concluded as follows: Equations (11.20) and (11.24) imply the validity of equations 11.2 I. Moreover, the linear homogeneity of the Pi ('), equations 11.18 and 11.20, and T 10.5 imply that the ki (') are homogeneous of degree I. Finally, the relations in equation 11.22 imply that the gi(') are homogeneous of degree - 1. Hence the gi('), the Pi ('), and the ki (') have the properties which the theorem requires of them. Theorem T 11.10 insists that if a consumer's preferences are separable and homothetic, there exists a sequence of price indices, one index for each period, such that the consumer can deal with his budgeting problem in two stages. To wit: According to equation 11.20, knowledge of the values of the respective price indices suffices for him to determine his optimal consumption strategy; and according to equation 11.2 I, for each i = I, ... , ~, the funds he has allocated to consumption in period i and knowledge of the i-period prices suffice for him to determine his consumption bundle in period i.

210

Chapter 11

Thus T 11.10 provides sufficient conditions that there exist price indices such that the consumer upon knowledge of the values of these indices and current-period prices can determine his consumption bundle for the current period. To show that the given price indices can be used to measure the price of living in the respective periods, we must establish another theorem, Tl1.11. T 11.11 Let V('), F('), and the U i (') be as in T 11.10 and suppose that the U i (') are linearly homogeneous and satisfy Ui(O) = 0, i = 1, ... , ~. In addition, let gi('): R~'+ --+ R~ and Pi ('): R~+ --+ R++, i = 1, ... , ~, be as described in equations 11.22 and 11.23. Then there exists a continuous function h('): Ri+ x R+ --+ Ri that is homogeneous of degree 0 and satisfies

(i) h(ql, (ii) F(h(ql,

,q~,A) E {z E Ri: ,q~,A))=

it

qiZi::::; A};

max

F(z);and

ZE{YER~ :L~=lqiYi~A}

(iii) hi(Pl(Pl), ... ,P~(p~),A) = Ui(gi(pJCi(p,A)), i described in equation 11.18.

=

1, ... ,~, where Ci (') is as

It follows from T 10.5 that there exists a continuous function h(') that satisfies conditions i and ii. To establish condition iii we recall that F(') must be strictly quasi-concave and observe that, since the U i (') are linearly homogeneous, .;

.;

I Pi(Pi) Ui(gi(pJCi(p, A)) = i=1 I Ci(p,A) = A. i=1 Consequently, F(UI(gl(PI)CI(p,A)), ... , U.;(g.;(P.;)C.;(p, A))) ~ F(h(PI(PI), ... ,p.;(p.;),A)).

(11.25) Bya similar argument, based on equation 11.20 and the properties of ki (') and F( . ), it follows that F(UI (gl (PI ))PI (PI )h l (PI (PI)"'" p.;(P.;), A), ... , U.;(g.;(P.;))P.;(p.;)h,;(P1 (PI)"'" p.;(P.;), A)) ~ F(UI (gl (PI ))C I (p, A), ... , U.;(g.;(P.;))C.;(p, A)).

(11.26)

From equations (11.25), (11.26), and (11.23) and the strict quasi-concavity of F( . ), we conclude that condition (iii) of T 11.11 is true as stated. It follows from T 11.11 and Pi(Pi) = (1 + r)-(i-l)Pi(Pi) that Pi(Pi) is the unit price at Pi of i-period utilities. This observation and the definition of a

Time Preference and Consumption Strategies

211

price-of-living index (see section 10.3) show that, in terms of the cost of i-period utilities at p?, Pi(Pi)/Pi(p?) represents the price-of-living index at Pi' In fact, the value of this index equals the value of Pi(pJ/Pi(p?) and P/pJ Pi(p?)

=

P()U ( ( 0)) i Pi i gi Pi .

12

Risk A version and Choice of Safe and Risky Assets

In his Yrj0 Jahnsson Lectures, Kenneth Arrow proposed a simple and beautiful model of how consumers choose their equilibrium balance sheets (Arrow 1965, pp. 28-44). The model is simple because it requires so few assumptions. It is beautiful because it yields such interesting insights into choice under uncertainty. We shall develop a modified version of Arrow's model to be used as a theoretical basis for an empirical analysis of consumer choice among risky and nonrisky assets. To develop our version of Arrow's model, I begin by presenting Arrow's model as an axiomatic system. The axioms of this system include all the axioms of the standard theory of consumer choice. In addition, they specify that the consumer's utility indicator is an integral, i.e., an expected utility function. I show in the first half of the chapter that Arrow's and John Pratt's theorems concerning risk aversion and choice of risky and nonrisky assets (see Arrow 1965, pp. 43-44, and Pratt 1964, pp. 128, 135-136) are logical consequences of the axioms. Our system of axioms can be interpreted as describing a consumer's choice of an equilibrium balance sheet in a world in which there is one safe asset, one risky asset, and no debt instruments. Since consumers in the real world have numerous assets from which to choose, we conclude this chapter by discussing a second axiom system which permits consumers to choose among several risky securities. The second system, when properly interpreted, provides the modified version of Arrow's model that we shall use as a basis for our statistical analysis in chapter 28. Arrow's and Pratt's theorems establish a definite relationship between the structure of a consumer's demand functions for safe and risky assets and the structure of his risk preferences. At the end of the chapter we see that natural analogues of these theorems can be derived from the axioms of the modified version of Arrow's axiom system if and only if the consumer's utility function belongs to a class of functions that possess a certain separa-

Risk Aversion and Choice of Safe and Risky Assets

213

tion property which D. Cass and J. E. Stiglitz (1970, p. 128) have invented. Our results represent extensions of theorems due to Cass and Stiglitz (1970, p. 142) and to O. D. Hart (1975, pp. 615-621). In chapter 28 we shall analyze statistically data on the balance sheets of u.s. consumers at the end of 1962 and 1963. There we postulate that u.s. COnsumers are Arrow consumers in the sense that. their choices of risky and nonrisky assets can be described by the modified version of Arrow's model, and we assume that our balance-sheet data represent observations on the respective consumer's equilibrium balance sheets. Based on these assumptions, we establish plausible hypotheses concerning the structure of u.s. consumers' risk preferences. We also compare the relative risk aversion of individuals in different consumer groups. 12.1 An Axiomatization of Arrow's Theory

Arrow's theory of how a consumer chooses an equilibrium balance sheet can be viewed as a model of a simple axiomatic system. 12.1.1 The Axioms

The axioms of this system and the associated defined terms of T(H 1, ... , H 6), i.e., consumer, and the consumer's consumption bundle. These axioms of T(H 1, ... , H 6). For ease of reference

theorems involve the uncommodity bundle, price, terms satisfy the first four we rename them AI, ... ,

A4: R~.

A1

A commodity bundle is a vector x E

A2

A price is a vector p E

A3

A consumer is a triple (V(· ), X, A), where

X c R~; A4

A E R+; and

V(·):

R~+.

X ---+ R.

A consumption bundle is a commodity bundle c which, for some pair x R+, satisfies the conditions:

(p, A) E R~+

c E X, pc

~ A

and V(c)

=

max

V(x),

xer(p,A)

where r(p, A)

=

{x EX: px ~ A}.

The undefined terms also satisfy three axioms added by Arrow. These open up the somewhat sterile theory of consumer choice to new and interesting theorems:

214

Chapter 12

AS

n=2;X=R~;andp=(l,a)E{1}x R++.

A 6 There exists a nondegenerate probability distribution F(·): R+ ~ [0, 1] with compact support and a thrice-differentiable function U(·): R+ ~ R such that, for x E R~, V(X 1 ,X2 )

A7

= fooo

U'(·)

>

U(X 1

°and

+ x2 r )dF(r). U"(·)

< 0.

12.1.2 The Intended Interpretation

The above axiom system can be interpreted as describing different phenomena. Among the possible interpretations, there is a subset of intended interpretations. One of these can be described as follows: Name the first component of x J.1 and the second component m. Let m denote a risky asset such as shares, J.1 a nonrisky asset such as cash. Since the first component of p is "I," think of J.1 as the unit of account, and let a denote the number of units of account for which one unit of m exchanges. Finally, let "a consumer" denote a family unit which orders pairs (J.1, m), according to the values assumed by a function V( and which has a certain number of units of account A to spend on J.1 and m. With each pair (J.1, m), the family associates a random variable J.1 + mr with the probability distribution, F((o) - J.1)/m): [J.1, (0) ~ [0, 1]. The pairs (J.1,m) are for sale in the current period, and for each value of r, J.1 + mr denotes the number of units of account which (J.1, m) will command next period. The family chooses (J.1, m) so as to maximize the value of V( subject to the constraints J.1 + am = A and (J.1, m) ~ 0. With this interpretation of the axioms A I-A 7, the system describes a consumer's choice among risky and nonrisky assets. In a world where there is one safe and one risky asset and in which the consumer cannot borrow, this choice is the choice of an equilibrium balance sheet, with A representing the consumer's net worth. In fact, in such a world we can think of the consumer as having already made his current-period consumption-savings choice and as now allocating his end-of-period net worth ( == beginning-ofperiod net worth plus current-period savings) between J.1 and m. For each value of r and for a chosen pair (J.1, m), J.1 + mr represents the value of the consumer's net worth at the beginning of the next period. When we later speak about choice among risky and nonrisky assets and about consumer aversion to risk we shall always be referring implicitly to the interpretation sketched above. Arrow's model differs from this interpre0

),

0

)

Risk Aversion and Choice of Safe and Risky Assets

215

tation in that he insists that a == 1. This difference is important because, if we do not allow a to vary over R+, we can not justify interpreting Arrow's measures of risk aversion as measures of risk aversion in our system. We also cannot establish the analogues of Arrow's and Pratt's theorems concerning the relationship between a consumer's absolute and proportional risk aversion and his choice of j1 and m as functions of A. For ease of reference in chapter 28 and to simplify our subsequent discussion, we shall henceforth write (j1, m) for x. This change in notation permits us to talk about j1 as the safe asset and m as the risky asset without referring back to the interpretation of A I-A 7 given above. 12.1.3 Sample Theorems

To give a preliminary idea of what the axioms imply about consumer behavior, I conclude this section by stating two theorems. Proofs of these and other theorems in the chapter are given in the appendix, section 12.7. The first theorem T 12.1, shows that V(·) has the standard properties of a utility function (e.g., convex indifference curves between j1 and m) and that the consumer's demand functions for j1 and m are continuous and differentiable. Both U(·) and V(·) are strictly increasing, thrice-differentiable, strictly concave functions on R+ and Ri, respectively. Moreover, there exists a vector-valued function

T 12.1

(/1,m)(·):R++ xR+-+Ri

which is continuous everywhere, differentiable on {(a, A) E R+ + x R+: (/1, m)(a, A) > O}, and for each and every pair (a, A) E R++ x R+, satisfies c = (/1, m) (a, A), where c is the consumption bundle at (a, A).

The next theorem T 12.2, and corollary T 12.3, provide an example of some interesting implications of Arrow's system that cannot be derived from the axioms of the standard theory of consmer choice. Note that T 12.2 insists no matter how close a is to Er, if a < Er and if A > 0, m(a, A) > 0. 1 T 12.2 Let Ef(r) denote if a < Er. T 12.3

So' f(r) dF(r). For all (a, A) E R; +,

For all A E R+, lim m(a, A)

m(a,

A) > 0 if and only

= o.

aTEr

j1

When a consumer invests in a pair (j1, m) with m > 0, he gambles, since variable. This gamble is favorable, fair, or unfair according

+ mr is a random

216

Chapter 12

as /1 + mEr is >, =, or < /1 + am. Theorem T 12.2 shows that the consumer will acquire a pair (/1, m) with m > 0 only if this pair represents a favorable gamble. A consumer who refuses gambles that are either fair or unfair is said to be "risk-averse." Hence in our interpretation of T(A I, ... , A 7), T 12.2 insists that the consumer is risk-averse. E 12.1 Let U(A) = log A for A E R++ and assume that r takes on the values 1.6 and 1.4 with probability 0.75 and 0.25, respectively. Then, for all pairs (/l, m) that satisfy /l + am = A and (/l, m) ~ 0, we have EU(/l + mr) = 0.75 log(A + m(1.6 - a)) + 0.25 log(A + m(1.4 a)). It is easy to verify that the value of m that maximizes the right-hand side of this equation, subject to 0 ::::;; m ::::;; Ala, is given by 0 m(a, A)

=

1 ( a2

for a ~ 1.5 5 a - 1.55 _

AI a

Since Er

3a

+ 2.24

)

A

for 1.544828 ::::;; a


] Rpn(A) for all A E R+; (ii) Il.(a, A)/A ~ Iln(a, A)/A for all (a, A) E R++ [and > if 0 < Iln(a, A) < A].

x R+

For completeness and to justify later examples, we note here that T 12.5 and 12.6 have interval analogues. Specifically, they are true if the domain of R(·) (or Rp (·)) is taken to be an interval [h, d] with h ~ 0 and d > h, and if the arguments of m(') (or /1(.)/ A) are allowed to vary in a region in R++ x R+ where {x: x = A + m(a,A)(r - a) for some r E support of F(·)} is a subset of [h, d]. (See also in this respect Pratt's theorems 6 and 7, 1964, pp. 135-136.) 12.3 The Fundamental Theorems of Arrow

It is disappointing to have to insist that the two consumers in T 12.5 and the two in T 12.6 have the same probability distribution F( .), because in an interpretation of A I-A 7 that might have empirical relevance, F( . ) must be a subjective probability distribution that is likely to vary from one individual to the next. However, while we may be unable to use T 12.5 and T 12.6 as a basis for comparing different individuals' risk aversion, we can use them to determine how a consumer's optimal portfolio varies as A changes. 12.3.1 Risky Assets and Absolute Risk Aversion

To see how T 12.5 can be used to determine how a consumer's investment in risky assets varies with A, suppose that consumer I in T 12.5 has the ordering VII!', m) =

f'

U,(!'

+ mr) dF(r),

(/1, m) E R~

of (/1, m) pairs, and suppose that the second consumer has the ordering V n (!', m)

=

L~

U n (!'

+ mr) dF(r),

(/1, m) E R~

where Ull(A) = U.(y + A), A E R+, for some y > O. Then we can think of consumer II as consumer I with y more units of account, since Rn(A) =

Chapter 12

220

and mn(a, A) = mI(a, Y + A) for all pairs, (a, A) E R++ x R+ with mn(a, A) < A/a. When we think of consumer II this way, T 12.5 becomes a statement about the relationship between the monotonicity of R(') and that of m(a, '). Properly refonnulated for our purposes, this statement is T 12.7. R1(y

+ A)

For any Arrow consumer the following relations hold: In the set {(a, A) E > O}m(a, .) is a strictly increasing (constant, (strictly decreasing)) function of A if and only if R(') is a strictly decreasing (constant, (strictly increasing)) function of A on R+. T 12.7

R~+: (j.l, m)(a, A)

The sufficiency part of this theorem was established by Arrow (1965, p. 43), the necessity by Pratt (1964, p. 136). Consider the consumer in E 12.1. His U(A) = log A for A E R++, and r takes the values 1.6 and 1.4 with probability 0.75 and 0.25, respectively. Moreover, R(A) = A- t and R'(A) < 0 for all A E R++. Since, for 1.544828 < a < 1.55, a 2 - 3a + 2.24 < 0, we see that, for all (a, A) E R++ x R+ with 0 < m(a, A)


O.

12.3.2 Safe Assets and Proportional Risk Aversion

Theorem T 12.6 can also be made to describe the way a consumer's portfolio changes with A. To show this, we redefine Un (') as follows:

for some k > o. Then consumer II can be thought of as consumer I with a multiple of his original holdings of units of account, since Rpn(A) = RpI(kA) for all A E R+, and tln(a, A)/A = tlI(a, kA)/kA for all pairs (a, A) E R~+ such that 0 < tln(a, A) < A. When we think of consumer II in this manner, T 12.6 becomes a statement about how the monotonicity of Rp (') is related to that of the function tl(a, . )/(.) and, a fortiori, to that of the function tl(a, . )/am(a, '). We record this relationship in T 12.8. T 12.8

For any Arrow consumer, the following relations hold on the set > O}:

{(a, A) E R~+ : (j.l, m)(a, A)

(i) j.l(a,· )/am(a, .) is a strictly increasing (constant, (strictly decreasing)) function of A if and only if Rp ( .) is a strictly increasing (constant, (strictly decreasing)) function on R+; and (ii) j.l(a,· )/(.) is a strictly increasing (constant, (strictly decreasing)) function of A

if and only if Rp (') is a strictly increasing (constant, (strictly decreasing)) function on R+.

Risk Aversion and Choice of Safe and Risky Assets

221

300 250 200

E 1

o

50

100 150 200 250 300 350 400 450 500 A

Figure 12.1

Example E 12.5 below illustrates how m(a, .) and jl(a, . )/am(a, .) might vary with A when both R' (.) < 0 and R~(') < O. E 12.5 Let U(A) = _e A -', A E R++, and suppose that r can take the values 1.1 and 0.9 with probability 0.55 and 0.45, respectively. Then Rp(A) = 2 + A-I, A E R H , and R~(') < O. Moreover, if we let a = 1, then the way the consumer's optimal portfolio varies with A can be characterized as in figures 12.1 and 12.2 below, where we have plotted m(l, .) and /1(1, . )lm(l, .) against A. 2

We have stated T 12.7 and T 12.8 in terms of strictly increasing (constant, (strictly decreasing)) functions instead of positive (zero, (negative)) derivatives because it is awkward to state them in terms of derivatives. To see why, note that, if R(') is strictly decreasing (constant, (strictly increasing)) on R+, then i3m(a, ')/i3A > 0 (=0,«0)) on {(a, A) E R~+: (jl, m) (a, A) > O}. Theorem T 12.7 shows that the converse is true; it does not, however, show that i3m(a, .)/i3A > 0 (= 0, ( < 0)) for the relevant pairs (a, A) implies R'(A) < 0 (= 0, (> 0)) for all A E R+. The most we can demonstrate is that if i3m(a, .)/i3A > 0 ( < 0) (or ~ 0, ( ~ 0)) for all relevant pairs (a, A), then there does not exist an interval [IX, Pl c R+ such that R' (A) ~ 0 (~O) for all A E [IX, Pl. Analogous remarks apply to T 12.8. Taking the preceding into consideration, we can state T 12.9 which is a corollary to T 12.8 (ii). T 12.9

For any Arrow consumer the following relations hold: In the set,

{(a, A) E R~+ : (/1, m)(a, A) > a}, (AI/1(a,A)) o/1(a,A)loA > 1 (=, « 1)) if and only if Rp (') is a strictly increasing (constant, (strictly decreasing)) function on R+.

Chapter 12

222

1.12 1.11

1.10 1.09

1.08 1.07

E 1.06 ~ 1.05

104 1.03

1.02

""-

1.01

"----

1.0

_

0.99 0.98+--~____,.__~-_._-,.-----,--_._-._____,-__,

50

100 150 200

Figure 12.2

Arrow established the if part of this corollary; i.e., he showed that (AIJ1(a,A))oj.t(a,A)IOA> 1 for all relevant pairs (a, A) if Rp (') is a strictly increasing fundion on R+ (see Arrow 1965, pp. 43-44). To Arrow this

result was significant for several reasons. First, according to him a consumer's U function must be bounded; and if it is, the consumer's Rp ( ' ) cannot have a limit above 1 as A tends to 0 and cannot have a limit below 1 as A tends to infinity. Therefore, "it is broadly permissible to assume that relative risk aversion increases with" A (Arrow 1965, p. 37). Second, studies "of the movements of cash balance holdings, wealth and income (taken as a measure of wealth) by Selden, Friedman, Latane, and Meltzer, by different methods and under different assumptions agree in finding a wealth elasticity of demand for cash balances of at least 1." From these two observations, one conclusion emerges: "The notion that security, in the particular form of cash balance, has a wealth elasticity of at least one, seems to be the only ... explanation of the historical course of money holdings" (Arrow 1965, p. 44). Arrow hypothesized that most consumers' Rp functions are strictly increasing fundions of A and justified the hypothesis, as noted above, partly on theoretical grounds and partly on empirical evidence. He also hypothesized that the R functions of most consumers are strictly decreasing functions of A. This hypothesis, he thought, seemed to be "supported by everyday observation" (Arrow 1965, p. 35). In our empirical analysis of consumer choice of risky and nonrisky assets we shall test these hypotheses.

Risk Aversion and Choice of Safe and Risky Assets

223

Theorems T 12.7- T 12.9 say little about oj1/oA (only that oj1(a, A)/ > 0 if R~(A) ~ 0 for all A E R+). They say nothing about oj1/oa and om/oa; only by involving T 10.16 can we infer from T 12.7 that om(a,A)/oa < 0 for all (a,A) E R~+ such that (j1, m)(a,A) > 0 if R'(A) ~ 0 for all A E R+. Even so, these theorems provide all we need know about Arrow consumers in order to carry out our empirical analysis. oA ~ 0 for all (a, A) E R~+ with (j1, m)(a, A)

12.4 New Axioms

In sections 12.1 - 12.3 we discussed a model of consumer choice in which the consumer allocated his net worth between a risky and a nonrisky asset. Since in reality consumers choose among many risky assets, we next discuss a model in which the consumer allocates his net worth between one safe and several risky assets. In doing so, we hope to establish analogues of T 12. I - T 12.9 that could be relevant to consumer choice in the actual world. Instead of considering a model in which the number of risky assets is finite but otherwise undetermined, we simplify the exposition by taking the number to be 2. With only obvious changes in notation, our results are valid for a model in which any number of risky assets exist. A quick look at A I -A 7 shows that we only need to rephrase A 5 and A 6 to obtain a model of consumer choice among one safe and two risky assets. The new versions of these axioms are A 5"" and A 6"". AS'"

n

=

3, X

=

R~,

and p = (l,a)

E

{I} x R~+.

A 6'" There exists a nondegenerate probability distribution F('): R~ ~ [0, 1] with compact support, and a thrice-differentiable function U(·): R+ ~ R such that, for x EX, V(X"X2' x,)

~ f.OO f.OO U(x, + x 2', + X '2)dF(", '2)' 3

Again we name the first component of x j1 and refer to it as the nonrisky asset. We name the last two components of x m = (m l , m 2 ) and refer to

them as the risky assets. In addition, we write am for a l m l + a2m2 and mr for m l r t + m2r2' The analogue of T 12.1 in the new axiom system is T 12.10. Suppose that A I-A 4, A 5"", A 6"", and A 7 hold. Then U(·) and V(·) are strictly increasing, thrice-differentiable, strictly concave functions on R+ and R~, respectively. Moreover, there exists a vector-valued function (fl, m) (.): R~+ x R+ ~ R~ which is continuous everywhere, differentiable in {(a,A) E R~+ x R+:

T 12.10

224

Chapter 12

(11, m) (a, A) > a}, and for each and every pair (a, A) E Ri+ x R+, satisfies c (11, m) (a, A), where c is the unique consumption bundle at (a, A).

=

We shall henceforth refer to the components of (/1, m) ( .) as the consumer's demand functions. They satisfy T 12.11, which is an analogue of T 12.2. Suppose that A I-A 4, A 5", A 6", and A 7 hold, and let (11, m) (.) be the consumer's demand functions. Then, for all (a, A) E Rt+, mea, A) = a if and only if E'1 ~ a 1 and E'2 ~ a2. Moreover, if'1 and'2 are independently distributed relative to F('), then, for all (a, A) E Rt+, mj(a, A) > a if and only if E'j > aj, i = 1, 2.

T 12.11

It may seem surprising that we can have Er 1 ~ a l' Er2 > a2, and m(a, A) > are not independently distributed relative to F(·). Here is an example. In reading it, note that the m 1 component of the optimal portfolio is positive when A E (51.85185185,200) even though Er 1 = 0.0875
a and R~(A) > a for all a < A ~ 200, and '1 and '2 are not independently distributed. If we also let a = (0.1, 0.2), then we can easily show that

a if a ~ A

11(0.1, 0.2,A)

= { 2A

_ 200

a m 1 (0.1, 0.2, A)

=

{

=

3.25 (200 - A)

175 {

a~A
0, then m(a, A) represents a favorable gamble; that is, m(a, A) < m 1 (a, A) Er 1 + m 2 (a, A) Er2' The last inequality is a consequence of U(A)


0 for A E [0,199.9998656) and 0. Finally, if the domain of U(·) is [- rxlf3, 00) and y E [-1,0), then R'(') < and R~(') < 0. Only in this last case are both R'(') and R~(') negative. Note therefore that our empirical work in Chapter 28 is based on hypotheses that are true only if these two derivatives are both negative. The preceding ideas are illustrated in E 12.8, which concerns a consumer in our first axiom system, i.e., in A I-A 7. Note that the utility functions in E 12.8 are translates of the utility function in E 12.1, and the probability distribution in E 12.8 is the same as the probability distribution in E 12.1.

°

°

°

°

E 12.8 Let £ > 0 and consider a consumer in A I-A 7 with utility function U(A) = log(£ + A) and probability distribution

F(r)

=

{

o

if 0 ~ r < 1.4

0.25

if 1.4

1

if 1.6 ~ r

~ r

< 1.6

His demand function for m in the set B = {(a, A) E R~ + : (/1, m) (a, A) > O} is given by A m(a,

)

=

1.55 - a (1.6 - aHa - 1.4) (£

A)

+ .

Note that R'(A) < 0 and R~(A) > 0 for all A E R+ and that om(a,A)/oA > 0 and o(m(a, A)/A)/oA < 0, as they should be according to T 12.7 and T 12.8. Note also that the utility function is obtained from T 12.14 (i) by letting'}' = -1, f3 = 1, ~ = £, and D = 1.

234

Chapter 12

Next consider an individual with utility function U(A) = log( - I ; + A) on A > 1;. To allow him to be a consumer in A I-A 7 we change the domain of definition of U(·) to [ 0 and o(m(a,A)/A)/oA > 0, in accordance with T 12.7 and T 12.8, since R'(A) < 0 and R~(A) < 0 for all A E R+. The utility function of the second consumer is obtained from T 12.14 (i) by letting y = -1, f3 = 1, ex = -e, and D= 1.

The sensitivity of R(') and the consumer's choice among safe and risky assets to translations of the argument of the utility function is remarkable. For that reason the following observation is in order: The utility function in T 12.14 (i) is a utility function with displaced origin and constant proportional risk-aversion function. If a consumer has a utility function that satisfies T 12.14 (i) for some values of D, IX., f3, and y, then the constant proportional risk aversion is the reason why his preferred risky-asset mix does not vary with his net worth. The displaced origin accounts for the fact that the consumer's mix of safe and risky assets varies with his net worth. To see how, let a E Ri+ and Ai E R++, i = 0, 1, be such that 0 < (j1, m) (a, A i), i = 0, 1, then 1

IX. IX.

_

m(a,A ) -

+ 1 ° + f3A f3A Om(a,A ).

The right-hand side of this equality need not equal (A 1 / A O)m(a, A 0). Theorem T 12.14 has a converse that is formulated in T 12.15. T 12.15 Let U(·): R+ ---+ R be thrice differentiable, with U'(·) > 0 and U"(·) < O. Furthermore, let fl' denote the set of all nondegenerate probability distributions F(.): R~ ---+ [0, 1] with compact support. Finally, for each F(') E fl', let (Il, m) (', F):

Rt+ ---+ Rt denote the demand function of a consumer in A I-A 4, A 5", A 6", and A 7 with utility function U(·) and probability distribution F(·). Suppose that, for each F( .) E fl' and for all (a, A) E {(a, A) E Rt + : (11, m)(a, A, F) > O}, there exists a constant k = k(a, F) which depends on (a, F) but not on A such that m 1 (a,A,F)

=

k(a,F)m 2 (a,A,F)

Then U,(·) must satisfy either condition (i) or (ii) of T 12.14.

When reading T 12.14 and T 12.15 note that the utility functions in E 12.1 and E 12.5 have the separation property. This property specifies a

Risk Aversion and Choice of Safe and Risky Assets

235

relation between U(·) and characteristics of (Ji, m) ( . ,F) that (Ji, m) ( . , F) must satisfy for all F E ff but only for those values of (a, A) where, for a given F('), (Ji, m) (a, A, F) > O. It is, therefore, not surprising that (Ji, m) (0.1,0.2, .) in E 12.6 does not display the required characteristics as A varies over [51.85185185, 100]. 12.6.3 Arrow's Theorems and the Separation Property It is true that if a consumer has a utility function with the separation property, his demand for Ji and m and his absolute and proportional risk-aversion function will satisfy T 12.7 and T 12.8-to wit T 12.16. T 12.16 Let (/1, m) (.): R~+ x R+ -+ R~ be the demand functions of a consumer in A 1-A4, AS", A 6", and A 7, and let 13 = {(a,A) E R~+ : (/1, m)(a, A) > O}.

Suppose that, for all (a, A) a but not on A, such that m1(a,A)

E

13 there exists a constant

k

= k(a),

which depends on

= k(a)mz(a,A).

Then in 13 the following relations hold: (i) m(a,') is a strictly increasing (constant, (strictly decreasing)) function of A if

and only if R(') is a strictly decreasing (constant, (strictly increasing)) function on R+; (ii) /1(a,' )/(.) is a strictly increasing (constant, (strictly decreasing)) function of A

if and only if Rp (') is a strictly increasing (constant, (strictly decreasing)) function on R+; and (iii) /1 (a, . )/m(a, .) is a strictly increasing (constant, (strictly decreasing)) function

of A if and only if R p (') is a strictly increasing (constant, (strictly decreasing)) function on R+.

Since m 1 (a, .) in T 12.16 is a constant multiple of m 2 (a, .) when m(a, .) in T 12.16 (i) is

(Ji, m) (a, .) > 0, it is clear that, what is asserted of true of m 1 (a, .) and m2(a, .) as well.

It is also true that if a consumer's utility function does not have the separation property, we can find a pair (a, F) such that his demand for f.1 and mand his absolute and proportional risk-aversion functions do not satisfy T 12.7 and T 12.8-to wit T 12.17. Let U(·): R+ -+ R be a thrice-differentiable function, with U,(·) > 0 and U" (.) < O. In addition, let :f' denote the set of all nondegenerate probability distributions F('): R~ -+ [0, 1] with compact support. Finally, consider the set of all consumers in A I-A 4, A 5", A 6", and A 7 with utility function U(·) and some distribution F(') E :f', and let their demand functions for (/1, m) be denoted by (/1, m)('): R~+ x R+ x :f' -+ R~, as in T 12.15. Suppose that there exist two

T 12.17

236

Chapter 12

triples, (a, A 0, F) and (a, AI, F), such that 0 < A 1, and m1(a,AO,F)

--=-----:c o-

m2(a,A , F)

=I

° < AI, 0 < (j.l, m)(a, Ai, F), i = 0,

ml(a,A1,F) l'

m2(a,A, F)

Then there exist distributions P(·), p*(.) E § and vectors a* and aJf.Jf. E R~+ such that 0 < (j.l, m)(a*, Ai, P), 0 < (j.l, m)(aJf.Jf., Ai, P*), i = 0, 1, m(a*,AO,P) ~ m(a*,A1,P)

and m(a**,A1,P*) ~ m(a**,Ao,P*).

Moreover, if m(a,AO,F) ~ m(a,At,F),

i.e., if am (a, A 0, F) ~ am(a, AI, F), then there exists a pair (a, F) E R~+ x § such that 0 < (j.l, m)(a, Ai, F), i = 0, 1, and m(a,A1,F)


0 there is a (j > 0 such that x, y E X and Ilx - yll < (j imply that I f(x) - f(y)11 < e. 12.7.1 Proof of T 12.1

Since F(.) is a probability distribution with compact support; since U(·) has continuous first, second, and third derivatives; and since a continuous function on a compact set is uniformly continuous and has compact rangeas witnessed in UT 3, the obvious inequalities show that V(·) is thrice differentiable. It is well known that the conditions specified for U(·) in A 7 imply that U(·) is strictly increasing and strictly concave. Since F(.) is nondegenerate, it follows that V(·) must be strictly incresasing. Moreover, the inequality, AV(JlO, mO) =

faoo

.,:; faoo =

+ (1

(AU(Jlo

U(AJlo

V(AJlo

+

- A) V(Jl 1 , m 1 )

+ mOr) + (1 + (1

- A)Jl'

- A) U(Jl'

+ m ' r)) dF(r)

+ (AmO + (1

(1- A)Jl 1 , ).,mo

+

- A)m')r)dF(r)

(1- ).,)m 1 ),

238

Chapter 12

which is valid for any two pairs (/10, mO), (/11, m1) E R; and for any A E [0,1], implies that V(o) is a concave function. If (/1°,mO) =1= (/11,m 1) and if A E (0, 1), the inequality is an equality only if, for all r in the set where F( 0) increases,

Since F( is nondegenerate, it is easy to check that such an equality cannot hold for all r in the set of increase of F( 0), and that V( 0) is strictly concave. The continuity of (/1, m) (0) on R++ X R+ is a consequence of the preceding results and T 10.5. The differentiability of (/1, m) (0) on B = {(a, A) E R++ x R+ : (/1, m)(a,A) > o} is shown as follows: For any (a, A) E R++ X R+, the pair (/1, m)(a,A) can be found by maximizing EU(A + m(r - a)) subject to O· ~ m ~ A/a to find m(a, A) and by then setting /1 (a, A) = A - am(a, A). In B, m(a, A) satisfies both the first-order necessary condition, 0

E(r -

)

a)U'(A

+ m(a,A)(r -

a))

=

0,

for a maximum and E(r - a)2U"(A + m(a,A)(r - a)) < O. From these conditions and the Implicit Function Theorem, UT 11, it follows that m( 0) is differentiable in B. Since /1 (a, A) = A - am(a, A), /1( 0) must also be differentiable in B. 12.7.2 Proof of T 12.2

Let (a, A)

E

R++

x

R+

be such that m(a, A) > 0, and let (/1, m) =

(/1, m)(a, A). It follows from A 4, the definition of (/1, m)( 0), the mono-

tonicity of V( 0), the strict concavity of U( 0), and the nondegeneracy of F( 0) that U(A) ~ EU(/1

+ mr) =

EU(A

+ m(r -

a))


0, and from U' ( 0) > 0, we deduce a < Er.

Conversely, suppose that (a, A) E R;+, and that a < Er. Also suppose that m(a, A) = o. Then the necessary condition for a maximum of EU(A + m(r - a)) at m = 0, E(r -

a)U'(A) ~ 0,

shows that Er ~ a, which is a contradiction. Thus a < Er and m (a, A) = 0 cannot both hold.

Risk Aversion and Choice of Safe and Risky Assets

239

12.7.3 Proof of T 12.4

It is clear that if H"(A)/H'(A) = U"(A)/U'(A) for all A constants d and b with b > 0 such that

E

R+, there exist

H(A) = d + bU(A),

In this case, therefore, W(tI, rn) = d + bV(tI, rn) for all (tI, rn) E R~. Clearly, the function G(t) = d + bt, t E {range of U(o)} u {range of V(o)}, is strictly increasing and thrice differentiable. To establish the converse, let N be so large that {support of F( c [0, N], and let 0

Ru(A)

=

- U"(A) -U-'(A-)-

and

RH(A)

=

)}

-H"(A) H'(A) ,

If Ru ( 0) =I RH ( 0), then there is an interval [a, {3] where one function is larger than the other. Suppose that (12.1)

Next, let A ° = (a + {3)/2 and pick rno > 0 so small that 2rnoN < ({3 - a)/2. Also pick aO < Er so large that 0 < rn(aO, A 0) < rno and tI(aO, A 0) > o. That such an aO exists follows from the definition of (tI, rn) ( 0), from the monotonicity of V( 0), and from T 12. I - T 12.3. Finally, define the functions,

and

If, as we now assume, V( 0) and W( 0) represent identical orderings of (tI, rn) pairs, V u ( 0) and V H( 0) will both assume their maximum value at rn(ao, A 0). Since, by choice of (aO, A 0), (tI, rn)(ao, A 0) > 0, we must have V~(rn(aO, A 0)) - V~(rn(aO, A 0)) = 0 - 0 = o. On the other hand, if equation 12.1 holds, then (see equation 20 in Pratt 1964, p. 129) U'(y) u' (z)

H'(y)

< H' (z)

for all z < y,

z, Y E [a, {3].

This inequality implies that, for rn E (0, rno),

240

Chapter 12

V~(m) -

_ -

V~(m)

_ ° (U'(A O + m(r E(r

a)

aD)) _ H'(AO

U'(AO)

+ m(r H'(A O)

aD)))




O} am (a, A)

E(r-a)U"(A + m(a,A)(r-a))

oA

E(r-a)2U"(A + m(a, A) (r-a)) E(R(A) - R(A + m(a, A)(r-a))) (r-a) U'(A + m(a, A) (r-a)) E(r-a)2U"(A + m(a, A) (r-a))

Since F(') is nondegenerate, it is easy to show that the last fraction is positive if R(. ) is a decreasing function in R+. (See Arrow 1965, p. 43.) 12.7.7 Proof of T 12.8 and T 12.9

Note first that

and

aII - a lam = (am) _ 2 { aml -} oA oA

= (am)-2 {A all aA

_ II}

=

II -II ( 1 - aII) } (am) _2 { (A - I l )aoA oA

= (~)2 am

1 a(1l A) .

aA

From these equalities, which hold for all (a, A) E Rt+ with (II, m) (a, A) > 0, it follows that, to prove T 12.8 and T 12.9, we need only prove T 12.8 (ii). We shall do so for strictly increasing Il(a, . )/(') and R p (') only. The other two cases can be handled similarly.

Risk Aversion and Choice of Safe and Risky Assets

243

Consider an Arrow consumer, and denote his U, R p , /1, and m functions by U1(·), R pl (·), /11 ( .), and m l (·). Assume also that /11(a, . )/(.) is an increasing function for all (a, A) E {(a, A) E R~ + : (/11' ml)(a, A) > o}, and that there exist A If. and A If.lf. E R+ such that A If. < A If.lf. and such that Rpl(A If.) ~ Rpl(A If.lf.). Consequently, there also exists an interval [iX, f3] c [A If., A If.If.] on which Rpl (·) is a nonincreasing function. Next consider a second Arrow consumer with U, R p , /1, and m functions UII (·), R pII (·), /111 ( .), and mIl ( .), respectively, where

for some constant k > 1 that satisfies kiX < f3. It is easily shown that and that, for all (a,A) E R++ with 0 < mIl (a, A) < A/a, mn(a, A) = k-1ml(a, kA), and /1n(a, A)/A = /11(a, kA)/kA. Moreover, since R pl (·) is a nonincreasing function of A on [iX, f3], it follows that RpII(A) ~ Rpl(A), A E [iX, f3k- 1], and hence RpII(A) = Rpl(kA)

U{(y) ~ U{I(Y)

f

-U' ) -.. ;: : -U' () I(Z

II Z

11 or a z < y,

Z,

Y E [iX, {3k- 1].

(12.3)

Finally, let A ° = ({3k- 1 + iX)/2; let 0 < 2moN < ({3k- 1 - iX)/2, where N is as in the proof of T 12.4; choose aO < Er so large that 0 < mIl (aO, A 0) < mO; and define the functions V i (·), j = I, II, as in the proof of T 12.5. It then follows from equation 12.3 that 8V1(m, aO, A 0) 8m -

Vn(m, aD, A 8m

0)

~ 0,

and hence that ml(ao, A 0) ~ mn(ao, A 0) = k-1ml(aO, kA 0). From this we deduce that aOml(aO, A O)/A ~ aOml(aO, kA O)/kA and that /11 (aO, A O)/A ~ /11(ao, kA O)/kA 0. The latter inequality contradicts our original assumption, and thus we have shown that /1(a, . )/(.) cannot be a strictly increasing function on the relevant set unless Rp (·) is an increasing function on R+. To establish the converse, we observe (dropping subscripts and writing Z for A + m(a, A) (r - a), that for all (a, A) E R~+ with (/1, m) (a, A) > 0,

°

8(/1(a, A)/A) 8A

=

A

-2

°

{A 8/1(a, A) _ ( A)} = !!- { ( A) _ A 8m(a, A)} 8A /1 a, A 2 m a, 8A

a Em(r - a)2 U" (z)

A2

°

+ EA (r -

a) U" (z)

E(r-a)2U"(z)

_ a E(Rp(A) - Rp(z)) (r - a) U' (z) - A2 E(r - a)2U"(z)

Chapter 12

244

It is easy to show that, if Rp (') is a strictly increasing function on R+, the last fraction must be positive. 12.7.8 Proof of T 12.10

The proof of the monotonicity, concavity, and differentiability properties of U(·) and V(·) can be taken verbatim from the proof of T 12.1. The same is true of the continuity of (/-1, m) ( '). The differentiability of (/-1, m) ( .) on 13 = {(a, A) E Rl+ : (/-1, m)(a, A) > O} is shown as follows: For any (a, A) E R; + X R+, the vector (/-1, m) (a, A) can be found by first maximizing EU(A + m(r - a)) subject to 0 ~ m and am ~ A to find m(a, A) and then setting /-1 (a, A) = A - am(a, A). In 13, m(a, A) satisfies the first-order necessary conditions, E(r j

-

aJU'(A

+ m(a,A)(r -

a))

=

0,

j

=

1,2,

(12.4)

and the inequality [E(r 1 - a 1 )(r2 - a 2 ) U"(A



Next, let Nand M be so large that {support of F(")} c [0, N] X [0, M]; let = (ex + f3)/2; pick mO > 0 so small that m?N + m~M < (f3 - ex)/4; and choose aO < Er so large that 0 =f. m(aO, A 0) < mO and ll(aO, A 0) > o.

AO

246

Chapter 12

That such an aO exists follows from the definition of (j1, m)('), T 12.1012.12 and the monotonicity of V(·). Finally, define the functions, EUI(Ao WI(m) =

+ m(r U{(A 0)

aD))

mER;

,

and

°+

_ EUn(A m(r Wn(m ) U{I(A 0)

aD))

,

If G(') exists, both WI (') and Wn (') take on their maximum value at Therefore, since j1(aO, A 0) > 0,

m(ao, A 0).

j

=

(12.7)

1, II.

On the other hand, if equation 12.6 holds, then U{(y) U{ (z)


0 and U" ( .) < o. Also for any function! (. ) that is integrable with respect to F('), let EF!= J~ J~ !(rl,r 2 )dF(r 1,r2)' Finally note that if U"(· )/U'(·) is a constant on R+, U'(·) must have the form specified in T 12.14 (ii). Hence for the purposes of this proof we may assume that U" ( . )/ U' ( . ) takes at least two values on R+. The proof of T 12.15 is obtained in several steps. First a few preliminary remarks: A consumer in A I-A 4, A 5'1-, A 6'1-, and A 7 with utility function U(·) and probability distribution F(·) will, when faced with a pair (a, A) E R~+ x R+, choose m so as to maximize EFU(A + m(r - a)) subject to o ~ m and am ~ A. We denote this choice of m by m(a, A, F). For each F(·), the theorem concerns only those values of (a, A) E R~+ with the property that 0 < m(a, A, F) and am (a, A, F) < A. With respect to such pairs, m(a, A, F) is the unique solution of EF(ri - aJU'(A

+ m(r -

a)) = 0,

i

=

1,2.

(12.8)

248

Chapter 12

Note that in equation 12.8 the random variable that enters is r - a and not r. Let z be the random variable (r - a), and suppose that z relative to F( . ) has the distribution (jl,O)

(-W l , -W2)

with (jl, v, Wl' W2 ) > 0, lllU - ll3Wl > F( . ), equation 12.8 can be written as U'(A

+ mlu) =

°and ll2V -

ll3 Wl) , ( lllU U (A - mw)

ll3W2

>

O. For this

(12.9)

and U' (A

+ m2 v) = (ci~2) U' (A -

(12.10)

mw).

If U(·) has the property specified in the statement of the theorem, we can find a constant k such that, for all A for which there is an interior solution to equations 12.9 and 12.10, there is a one-dimensional m = m (A) which satisfies U'(A

+ mu) = (ci~') U'(A -

m(w,

+ lew 2»)

(12.11)

and U'(A

+ kmv) = (ci~2) U'(A -

m(w,

+ lew 2 ).

(12.12)

These equations play an important role in our proof of T 12.15. Next let y = A - m(A)(w l + kw 2 ), x = A + m(A)u, and z = A + km(A)v and assume that y < x < z and that R(y) =j:. R(z). Then we can use equations 12. I I and 12.12 and the Implicit Function Theorem to show that dm(A)/dA satisfies the following two equations: dm(A)

~=

{

R(y) uR(x)

+

(w l

R(x)

+

kW2)R(y)

}

(12.13)

Risk Aversion and Choice of Safe and Risky Assets

249

and dm(A)

dA

{

=

R(y) - R(z) kvR(z)

}

+ (WI + kWz)R(y)

.

(12.14)

From equations 12.13 and 12.14 it follows that Adm(A)/dA - m(A) satisfies A dm(A) _ m(A) = { Rp(Y) - Rp(x) } dA uR(x) + (WI + kwz)R(y)

(12.15)

and A dm(A) _ m(A) = { Rp(Y) - Rp(z) } dA kvR(z) + (WI + kwz)R(y) .

(12.16)

Since R(y) i= R(z), equations 12.13 and 12.14 imply that R(x) i= R(y). Hence, we can divide 12.13 into 12.15 and 12.14 into 12.16 to get

from which we deduce that xR(x)

= _ {(Z

- Y)R(Y)R(Z)} R(z) - R(y)

+ {ZR(Z)

- YR(Y)} R(x). R(z) - R(y)

(12.17)

So much for preliminary remarks. In the first step of the proof of T 12.15 we shall show that, for any triple y, x, z E R+ which satisfies Y < x < z and R(y) i= R (z), R (x) must satisfy equation 12.17. This can be established as follows: Let y, x, z E R++ be numbers which satisfy Y < x < z and R(y) i= R(z). Then note that there are numbers A, m, k, u, v, WI' Wz, n l , n z, n 3 that are all positive and satisfy the equations A + mu = x; A + kmv = z; A - m(w l + kwz) = y; n l + n z + n 3 = 1, and equations 12.11 and 12.12. To show why, we first choose IX and f3 so that U'(x) U'(y)

Then 0 < so that WI U

=

IX

U'(z) U'(y)

and

--=IX

f3
1. Hence Xl QX4 and Xl is indirectly revealed preferred to x 4. By exploiting the structure of 13, we can extend the sequence of pairs and show first that

x 4Q(0, 1,0) and then that (0, 1, O)Qx l . Hence X4QX I , which contradicts T 13.1 (ii), and XIQX I , which contradicts T 13.1 (i).

13.2 The Fundamental Theorem of Revealed Preference

Axioms 5 1-5 11 constitute the first step in our search for an answer to Q 1. In this section we shall carry out the second step; i.e., we shall show that if 5 1-5 11 hold, then there is a function V(·) with the properties postulated in fI 6 such that the value of f (.) at any (p, A) E R~ + x R+ is the vector in r (p, A) at which V(·) takes its maximum value. To obtain this result we must first establish seven auxiliary theorems concerning properties of two families of sets that we define as follows: 02

Suppose XO

S-(xo)

= {y E R~

E R~.

Then 5+ (XO)

= {y E R~

: yQXO} and

: xOQy}.

The basic idea of our argumentation in step 2 is to find a function V(·) whose family of level sets coincides with the family of closures of 5+ (. ). Our seven auxiliary theorems show that if 5 1-5 11 hold, then, for any XO E R~ - {o}, the closure of 5+ (XO), denoted 5+ (xo), has the necessary characteristics of the level sets of a utility function that satisfies fI 6. 13.2.1 A Rough Contour of S+(XO)

We begin by sketching the outline of 5+ (XO) for some given but otherwise arbitrary vector XO E R~. Suppose that XO = f(pO, A 0) for some (pO, A 0) E R~+ x R+. Then the following assertions are valid:

T 13.2

(i) {x E R~ (ii) S+(XO)

: XO

~ x, x

is convex;

i=

XO} c S+(XO);

264

Chapter 13

(iii)

S+(XO)

C

{x E R~ : AO ~ pOx};

(iv) if XO E R~+, if a E some AE R++.

R~+

and

and S+(XO)

C

{x E R~ : axo ~ ax},

then a =

Apo

for

Here assertion (i) is an immediate consequence of S 4, S 6, and S 7, and assertion (iii) follows from 01 and S 9. Also, assertion (ii) can be established by referring to 01, S 7, and the fact that, if x, y, z E R~, if z = h + (1 - A)y for some A E (0,1), and if z = f(pZ,AZ), then either pZx ~ AZ or pZy ~ AZ or both. If x E 5+ (XO) and y E S+(XO), either one of these inequalities yields zQxo. Finally, assertion (iv) follows from the following arguments: Suppose that a # Apo for all A E R++ and let z = f(a, axO). Then S 7 implies that z # XO and 0 1 shows that z E 5+ (xo). Next let p(t) = pO + t(a - pO) and observe that the continuity of f(') implies that there is a to E (0, 1) such that x(p(t°), p(t°)XO) # xO. Finally, let w = x(p(t°), p(tO)XO) and observe that S 9 and the equality (1 - to)pOw

+

=

tOaw

(1 - to)pOXO

+

tOaxo

implies that aw < axo = az. Since by construction wE S+(XO), the last inequality contradicts the relation S+(XO) C {x E R~ : axo ~ ax}. We conclude that the assumption a # Apo for all A E R+ + is untenable. In reading T 13.2, note that T 13.1 (i) implies that XO f/= S+(XO). Also, assertions T 13.2 (i) and (ii) imply that XO E S+(XO) and that S+(xo) is convex. Other properties of 5+ (. ) are described in the next theorem. In the statement of T 13.3 below a lower boundary point of S+(xo) is defined as follows: D 3 ~ E S+(xo), then x E S+(XO)

and x

Xl is a lower boundary point of S+(XO) if and only if imply that x = Xl.

~ Xl

We shall see in T 13.3 that, if XO =j:. 0, then S+(XO) possesses numerous lower boundary points. Specifically, for all x E R~+, there is a unique positive constant A-depending on x-such that AX is a lower boundary point of S+(XO).

T 13.3

such that

If XO =j:. 0, and if a < x, then there exists a finite positive number A(X) A(x)x E S+(xo) and such that

(i)

AX E S+(XO)

if A >

(ii)

AX ~ S+(XO)

if A < A(x).

Also, either A(X)X =

A(X);

XO

and

or A(X)Xi
O. Finally we let ZO = A(X)X and claim that either ZO = XO or ZO ~ {y E R~ : y ~ XO}. A proof of this assertion is given below. Suppose first that XO = AX for some A > o. Then T 13.2 (i) and (iii) imply that ZO = xo. Suppose next that ZO =f. xo. Suppose that ZO ~ xo. We shall obtain a contradiction which shows that the last hypothesis is false. Let An, n = 1, 2, ... be a sequence of numbers such that An < A(X) and lim n --+ CXJ An = A(X). Moreover, for each n = 1, 2, ... , let yn E R~ and pn E R~+ be such that Anx = yn and yn = f(pn, pnyn). Finally, let pZ E R~+ be such that ZO = f(pZ, pZZO). We can, without loss in generality, assume that the pn's are chosen such that, for all n, II pn I = I pZ II. Then the pn contain a convergent subsequence, pn k , which, by S 8, S 7, and the fact that ZO E R~+, must converge to pZ. But if that is true, then ZO ~ XO and ZO =f. XO imply that pZzo > pZxo, and hence that there exists an m so large that, for all nk ~ m, pnkyn k > pnkxo. Thus, for all nk ~ m, yn k E 5+ (XO), which is impossible since Ank


A(x). This fact, however, is an immediate consequence of T 13.2 (i) and T 13.5 (i) below and needs no additional proof here. The set of lower boundary points of 5+ (XO) contains XO and the set {z E R~+ : z

=

A(X)X for some X E R~+},

where A('): R~+ ~ R++ is as described in T 13.3. The latter set together with XO and the boundary of R~ provide us with the rough contour of 5+ (XO) for which we searched. To get a better idea of the contour of 5+ (XO) we shall next establish several salient characteristics of the lower boundary points of 5+ (xo). 13.2.2 Salient Characteristics of the Lower Boundary Points of 5+ (xo)

Lower boundary points of 5+ (XO) have interesting characteristics. Theorems T 13.5 and T 13.6 will attest to that. But first a definition and a useful auxiliary theorem: 04

If yl = [(p,A I ) and y2 = [(p,A 2), then y 2Hy i if and only if Al < A2.

The relations Q and H are related in interesting ways-to wit T 13.4. T 13.4 If y, z E R~ and yQz, there exist u, v yQv and vHz.

E R~

such that yHu and uQz, and

266

Chapter 13

To prove this theorem suppose that y = f(pY,AY), Z = f(pZ,AZ) and pYy~ pY z . Then the existence of u is obtained by the following argument: Let p(t) = pZ + t(pY - pZ) and observe that, for some to E (0, 1) and w = f(p(to), p(to)z), w #- z. For this w, p(to)w = p(to)z and S 9 imply that pY w

= p(to)w +

(1 - to) (pY - pZ)w

< p(to)z +

(1 - to) (pY - pZ)z

Hence, if we let u = f(pY, pYw), then yHu and uQz. To show that v exists when pYy ~ pYz, proceed as follows: Let x = rxy + (1 - rx)z for some rx E (0, 1) and let pX E R~+ and AX E R+ be such that x = f(pX, AX). Then pYy ~ pYx and, by S 9 and the equality pXx = rxpx y + (1 - rx)pXz, pXx > pXz. Consequently, there is an AV > AZ such that if v = f (pZ, A V), then pXx > pX v . For this v we find that yQx, xQv, and vHz, and hence that yQv and vHz. The arguments used above can be generalized to establish the existence of u and v when y is only indirectly revealed preferred to z. Those details I leave to the reader. Suppose that xo, Xl E R~ and that XO =1= lower boundary point of S+(XO). Then

T 13.5

Xl.

Suppose also that

Xl

is a

(i) S+ (Xl) C S+ (XO); and

(ii) XO ¢ (S- (Xl)

U

S+ (Xl)).

To prove T 13.5 (i) we pick an x 2 E S+(x l ) and find an x 3 such that X2 QX 3 and x 3 Hx l . Since Xl E 5+ (XO), every neighborhood of Xl contains a vector u E S+(XO). The relation x 3 Hx l implies that there is a u E S+(XO) such that x 3 Qu. Hence X2 QX 3 , x 3 Qu, uQxo, and x 2 QXO, as was to be shown. To establish T 13.5 (ii) we must show that neither xlQXO nor XOQx l obtains. To do that, we first observe that if z E R~, D 3, S 4-S 7 and S 10 imply that zQxo and xlHz cannot both be true. Hence S+(XO) c {x E R~ : pIX ~ A I}. From this and T 13.4 it follows that it is not the case that Xl Qxo. Suppose next that XOQx l . Then by T 13.4 there exists x 2 E R~ such that XOQx 2 and x 2Hx l . This implies that there is a u E S+(xo) such that x 2Qu. But then XOQx 2, x 2Qu, and uQxo, which contradicts T 13.1 (i); i.e., XOQx l cannot 'obtain. Suppose that xO, Xl E R~ and that XO =1= Xl. Suppose also that Xl =1= 0 and is a lower boundary point of S+(XO).

T 13.6 Xl

Then S+(x l

)

=

S+(XO) and S+(x l

)

=

S+(XO).

Consumer Choice and Revealed Preference

267

A proof of this theorem is given in Stigum (1973, pp. 417-4ZZ). Since the proof is both lengthy and involved, I shall only mention here the main ideas of the proof when xo, Xl E R~+. Suppose that Xo = f(pO, A 0), Xl = f(pl, A I), and xO, Xl E R~+. Then 5+(x l ) C 5+(xO) by T 13.5 (i). Hence to establish T 13.6 we need only demonstrate that 5+ (XO) C 5+ (Xl ). The difficult part of the proof that 5+ (XO) C 5+ (Xl) consists in showing that XO E 5+ (Xl). To do that we begin by constructing several sequences of functions on [0, 1] to R~ and R+ with interesting properties. Let p(t) = pI + t(pO - pI), t E [0, I], and t;:' = (k/z m), k = 0, I, ... , zm, m = 0, I, .... Also let A m(t~)

=

p(t~)XI

=

=

p(t;:')!(P(t;:'-I)' Am(t;:'_l))'

plX I

and Am(t;:')

k = 1, ... , Zm.

Finally, let

k = 0, I, ... , Zm - 1 and

t E [0,1] For all m and t, xm(t) Qx l . Also, the A m( . ) are equicontinuous and contain a subsequence, A mj(.), that converge uniformly on [0, 1] to a continuous function A('): [0, 1] ~ R+. Hence, if we let x(t)

= !(p(t),A(t)),

t E [0, 1]

then x(t) is well defined and continuous on [0, 1] and satisfies

t E [0, 1]. To show that XO E 5+(x l ), it suffices to show that x(l) = xO. Evidently xOHx(l) cannot obtain, since then XOQx l . So to ascertain that x(l) = xO, we must show that x(I)HxO cannot happen either. For that purpose we first define a second pair of sequences of functions on [0, 1]: Let t;:' = k/z m be as above. Also let zm(t;:'), k = 0, ... , zm, be a sequence of vectors in R~ such that plX I = plzm(t'{'), p(t'{')zmwn = p(t'{')zmWn, p(t1'm-I)Zm(t1'm-l) = p(t1'm_l)zm(I), and such that zm(t;:') = !(p(t;:'), A-m(t;:')), where

Chapter

A-m(t)

268

13

=

p(t)zm(t;:'),

t;:'-l ~ t ~ t;:',

k = I, ... , 2 m .

Finally let

t E [0,1]. For all m and t, x 1Qz m(t). Also, the A-m(t) are equicontinuous and contain a subsequence, A -mj(t), that converge uniformly on [0, 1] to a continuous function A-(o): [0,1] -+ R+. Hence the function, z(t) =

f (p(t), A - (t)),

t E [0, 1]

is well defined and continuous on [0, 1] and satisfies

t E [0,1]. To show that x(I)Hx o cannot happen, it suffices to show that x(t) = z(t), t E [0,1]. This we do by showing that A(o) and A-(o) are solutions to the same differential equation, dA(t)

dt =

(pO -

t E [0,1]

p1 )x(t),

(13.1)

and by observing that, because of the Lipschitz condition we impose on f( 0) in S II, there can be only one solution to equation 13.I. I shall demonstrate that A( 0) satisfies equation 13.1 and leave it to the reader to prove that A - ( 0) satisfies the same equation. Now it is a fact and easy to verify that p(t)x(t) ~ p(t)x(s),

t,

5 E

(13.2)

[0, 1]

and that A( 0) is a concave function. From this and the continuity of A( 0), it follows that the right-hand and left-hand derivatives of A( 0) are continuous from the right and from the left, respectively, on [0, 1]. Consequently, it suffices to establish equation 13.1 for t E (0, 1). To that end, fix to E (0, I), let F(t) = p(to) xU) - x(to) t - to _ A(t) - A(to) t - to -

(0

P -

P

1) (t) X

,

(13.3)

and let A+(to) and A_(to) denote the right-hand and left-hand derivatives of A( 0) at to. Then equations 13.2 and 13.3 and the continuity of x(t) imply that

Consumer Choice and Revealed Preference

F(to +)

=

lim

~

F(t)

269

0,

(13.4)

F(t) ~ 0,

(13.5)

t>t o , t--+to

F(to-)

=

lim t 0, F(to - h) - F(to

p(to)

+ h) = -h- {(x(to) -

x(to - h)) - (x(t o + h) - x(to))} ~ O.

(13.6)

Similarly, from equation 13.3 and well-known properties of concave functions, it follows that (13.7)

But if equations 13.3-13.7 are valid, then equation 13.1 must be valid as well. To conclude the proof of T 13.6 for the case Xo > 0 and Xl > 0, we now pick an x 2 E S+(XO) and find an x 3 E R~ such that X2 QX 3 and x 3 Hxo. There is an mj so large that x 3 Hx mj (l) and x mj (1)HxO. Hence x 2 Qx mj (l) and x mj (I)Qx l , which proves that S+(XO) c S+(x l ). 13.2.3 Characteristics of Vectors in S+(XO) u (R~ - S+(XO))

So much for lower boundary points of 5+ (XO). Characteristics of the other points in S+(xo) are described in T 13.7. T 13.7 Then

Suppose that xo,

Xl E

(i)

Xl E

S+(XO) or

(ii)

Xl E

S+(XO) only if S+(x l

Xl

R~

and that XO

=1= Xl.

Suppose also that Xl

E S+(xo).

is a lower boundary point of S+(XO); and )

c S+(XO).

The validity of T 13.7 (i) is an immediate consequence of 0 3, T 13.2 (i), and T 13.5 (i). The validity of T 13.7 (ii) results from the following arguments: Suppose that xlQXO and that x 2 E S+(x l ). If X2 QX l , then x 2 QXO as well. So suppose that X2 QX l does not hold. Then, by the first half of T 13.4, there exists an x 3 E R~ such that Xl Hx 3 and x 3 QXO. By arguments analogous to the arguments we used to show that x(l) = XO in the proof of T 13.6, this x 3 must satisfy both X2 QX 3 and x 3 QXO. Hence x 2 E S+(XO), as was to be shown. We have shown that, if Xl E S+(XO) and if Xl is not a lower boundary point of S+(xo), then Xl ¢ S-(XO). Also, if Xl is a lower boundary point of

270

Chapter 13

S+(XO), then neither xlQXO nor XOQx l obtains. In T 13.8 we show that, if Xl ¢ S+(XO), then XOQx l . T 13.8

Suppose that XO

(i) S-(XO) n S+(XO)

(ii) S-(xo)

u

S+(XO)

E R~

= 0; and = S-(XO) U

- {O}. Then the following assertions are valid:

S+(xo)

=

R~,

where 5- (xo) denotes the closure of 5- (xo).

We begin by proving T 13.8 (i). Suppose that Xl E S-(XO). Then S 9 implies that Xl ¢ S+(XO), and T 13.5 (ii) implies that x 2 is not a lower boundary point of S+(xo). Hence x 2 ¢ S+(XO). To establish T 13.8 (ii) we let z = [(p, AZ) and consider the function [(p, .): R+ ~ R~. For A large enough, p[(p,A) »pxo. Therefore for A large enough, [(p,A) E S+(xo). Since [(p, .) is continuous, there is a smallest A, say AY, such that [(p, AY) E S+(XO). Let y = [(p, AY). Then the arguments used to show that x(l) = XO in the proof of T 13.6 can be used to prove the following assertions: (i) if z

= y, z E

S-(XO) n S+(xo);

(ii) if zHy, then z E S+(XO); and (iii) if yHz, then z E S- (XO). Since there is no need to elaborate, this establishes T 13.8 (ii). 13.2.4 The Fundamental Theorem

So much for auxiliary theorems. Next we define an ordering of commodity bundles and show that it is induced by a utility function. 05

For x, y

E R~,

x::S y if and only if S+(y) c S+(x).

It follows from T 13.2 (i) and from T 13.5- T 13.8 that -< is a complete, reflexive, and transitive ordering of R~. Also T 13.2 (i), T 13.6, and T 13.7 imply that

{y

E R~

: X -< y} = S+(x);

and T 13.2 (i) and T 13.6- T 13.8 imply that {y

E R~

:y

-< x} =

S-(x).

Since S- (x) and S+ (x) are closed, we can apply UT 10 (section 11.2) and deduce that there exists a continuous function V(·): R~ ~ R+ such that, for

Consumer Choice and Revealed Preference

all x, y E R~, x

-< y if and only if V(x)

~ V(y).

271

But if that is so, then T 13.2

(i), T 13.6, and T 13.7 imply that V(·) is strictly increasing, and T 13.2 (ii), T 13.6, T 13.7, and 59 can easily be seen to imply that V(·) is strictly

quasi-concave. Finally, T 13.2 (iv) and 5 7 show that V(·) must have differentiable levels sets in R~. The properties of V(·) and T 13.2 (i) and (iii) imply that f(') is a demand function relative to V(·). These results we record in the Fundamental Theorem of Revealed Preference, T 13.9. T 13.9 Suppose that S 1-S 11 hold, and let :$ be as defined in 05. Then there exists a strictly increasing, continuous, strictly quasi-concave function V(·): R~ -+ R+ with differentiable level sets in R~ such that x:$ y iff V(x) ~ V(y),

x, Y E R~

(13.8)

and V(j(p, A))

= max

V(x),

(p,A) E R~+ x R+.

(13.9)

xer(p,A)

V( . ) is uniquely determinded up to a monotone, strictly increasing transformation.

Here several mental notes are called for. The V(·) of T 13.9 is uniquely determined up to a monotone, strictly increasing transformation relative to -oo V(1 + (lin), 1 + (lin)) = 23. The demand function f(') : R~+ x R+ ~ R~ determined by V(·) satisfies 56-510, as witnessed by the following equations for (p, A) E {I} x R+ + x R+:

(~+ (~)(p - 1), (~) +(~)(; - 1)) A

A

3

3

if

0< 1- - < P < 1 + - , A 2 + 6A( P + 1) + 9 p2- 46p + 9 ~ 0;

(~ +

P - 1,

(~) +

(; - 1))

if 0 < 1 -

~


5

and - ~ - ~ - or 1 3 P 2 (A, 0)

if p

and A

(0, -A) P

~ 1

JP < A

0 and either p +

+ JP


Vk(X k ) for some

=

1, ... , m, and

k E {I, ... , m}}.

The definition of Q explicates in what sense a Pareto-optimal allocation in E cannot be improved upon. There are many allocations in Q. We shall discuss some of them to make sure that the optimality of an x E Q is not misunderstood. If x E A and Vi(X i ) < Vi(X i ) for some pair (i,j), then at x consumer i envies consumer j.l There are allocations in Q at which one consumer envies another; for example, x = (Xl, ... ,x m ), with xi = 0 for j t= i and Xi = w. In fact, envy is a characteristic feature of most allocations in Q-see T 14.2. T 14.2 If x E Q, there is a pair (i, k) such that consumer i envies nobody and nobody envies consumer k.

To see this, observe first that if everybody envies somebody, then there is a sequence {t l' ... , tk } C {I, ... , m} such that Vti(X ti ) < Vti(Xt;+I), i = 1, ... , k - 1, and Vtk(X tk ) < Vtk(X tl ). Furthermore, there is a z E A so that Zi = Xi if i E {1, ... ,m} - {tl, ... ,td, zt; = Xti +l , i = 1, ... , k - 1, and Ztk = xtl. But then Vi(zi) ~ Vi(x i ) for all j and Vti(zt i ) > Vti(X ti ) for i = 1, ... , k, which contradicts the Pareto-optimality of x. The existence of a consumer whom nobody envies can be established in the same way. When x E A and Vi(X i ) ~ Vi(X i ) for all j # i, i = 1, ... , m, x is equitable. If x E Q and x is equitable, then x is fair. T 14.3

There exist fair allocations.

To see why, let Wi = wlm, i = 1, ... , m, and let (p, Xl, ... , x m ) be a competitive equilibrium relative to the distribution of w given by (w l , ... , wm ).

284

Chapter 14

For all pairs (i,j), px i = px i . Hence Vi(X i ) ~ Vi(X i ) and VJ(x J) ~ VJ(x l ). From this and from the fact that competitive equilibria belong to Q, it follows that (Xl, ... , x m ) is a fair allocation. 14.2.2 Pareto-Optimal Allocations and Competitive Equilibria

The fact that competitive equilibria belong to Q is one of the fundamental building blocks of welfare economics. It also provides an explication of why a competitive equilibrium allocation of w cannot be improved upon. We record this fact in T 14.4. T 14.4 If (p, Xl, .•. , x m ) is a competitive equilibrium relative to some distribution of w, then (Xl, ... , x m ) E Q.

The proof of T 14.4 is easy: Suppose that there is a z E A such that Vi(zi) ~ Vi(x i ) for all j and Vk(Zk) > Vk(X k) for some k. The first inequality implies that pzi ~ px i and the latter implies that pZk > px k. Hence pw = L~l px i < pZi = pw-a contradiction that demonstrates that there is no z E A with the required properties. The optimality of competitive equilibria can be explicated in another interesting way. Let x be an arbitrary allocation and let 5 be a subset of {I, ... , m}. In the jargon of game theorists, the consumers whose indices are in 5 form a coalition. This coalition blocks the given allocation if there exist vectors Xi, i E S, such that Xi E Xi, Vi(X i ) ~ Vi(X i ) for all i E 5 with strict inequality for some i, and LieS Xi = LieS Wi. The set of all allocations that cannot be blocked by any coalition is called the core of E. Evidently the core of E is a subset of Q. Moreover, the competitive equilibria in E belong to the core of E-to wit T 14.5.

Lr=l

T 14.5 If (p, Xl, ... , x m ) is a competitive equilibrium in E, then belongs to the core of E.

(Xl, ... , x

m

)

The properties of core allocations ensure that once the consumers have exchanged commodities in accordance with a competitive equilibrium price, trading will cease. With only obvious modifications, the arguments used to establish T 14.4 can also be used to prove T 14.5. Therefore, I leave the proof of T 14.5 to the reader and conclude our discussion of Pareto-optimal allocations with another fundamental theorem of welfare economics, T 14.6. T 14.6 If i E Q, then there exists apE R~+ and a distribution of w relative to which (p, i) is a competitive equilibrium.

Consumer Choice and Resource Allocation

285

Thus any Pareto-optimal allocation can be sustained as a competitive equilibrium. To prove T 14.6, we let i be as specified in the theorem and define G c Rn by

G

=

{z

E

Rn: there exist Xi

Z

= it,

Xi, V' (x')

E

R~, i = 1, ... , m, such that

> V'(i ' ), and

Vi(X i ) ;;. Vi(ii), i

= 2, ... , m}.

It follows from the monotonicity and strict quasi-concavity of the V i (.) that (i) z E G, vERn and z ~ v imply v E G; and

(ii) G is convex. Moreover, it follows from the Pareto-optimality of i that w ¢ G. But then, by UT 4 (section 10.1), there is a pO E Rn - {a} such that pOz ~ pOw for all z E G. We observe that, by (0, po E R~. In addition, for all i= I, ... ,m,

pOii

=

min yE

pOy.

(14.2)

{XE R':-: Vi(X):;?; V(.ii)}

To ascertain the validity of equation 14.2, we suppose that there is a j and an xi such that pOx i < pOii and such that Vi(x i ) ~ Vi(x i ). Also we let 8 = pOii - pOx i and we let b E R++ be so small that, if zi = (x{ + b,xt ... ,x~), then pOii - pOzi ~ (2/3)8. Finally, we let Zk =i k for k f= j and 1 and, if j f= I, we let Zl = (x~ + b, xi, ... , x~). Then the monotonicity of the V i (.) implies that, if z = I.~1 Zi, then z E G and paz < pOw-a contradiction that establishes equation 14.2. Next we shall show that po > o. Suppose that p? = 0 and that pf > O. Let xi be such that xl > 0 and observe that we can find b1 , b2 E R++ such that if x = (x 1 '''''xn ) with Xk = xl- b1 , Xi = XI + b2 and Xl = XI for all 1 = I, ... , n; 1 f= i, k, then Vi(x) > Vi(x i ) and pox < pOxi-a contradiction, which shows that pO > O. To conclude the proof, we let iiJi = ii, i = 1, , m and observe that po > 0 and equation 14.2 imply that, for j = I, , m, Xi is consumer j's consumption bundle in the price-income situation (pO, pOiiJ i ). This and the fact that (Xl, ... ,i m ) is an allocation imply that (pO, i 1, ... m ) is a competitive equilibrium relative to (iiJ 1, ... , iiJm).

,x

Chapter 14

286

14.3 The Formation of Prices in an Exchange Economy

In section 14.1 we established the existence of competitive equilibria in E in order to establish the objective possibility of our exchange scenario for E. In section 14.2 we delineated various characteristics of resource allocation in E to determine the nomological adequacy of the same scenario. It remains to be seen whether there exists a scheme by which the consumers in E can find competitive equilibrium prices. We discuss that problem in this section. 14.3.1 On the Stability of Competitive Equilibria

Some hundred years ago, Leon Walras suggested that consumers in E could find competitive equilibrium prices by hiring an auctioneer who would call out a price p, record the resulting excess demand, and-without any trading having occurred-raise or lower his price quotations according as excess demand was positive or negative. Supposedly, after sufficiently many price quotes, the auctioneer would zero in on a price pO at which excess demand was zero. Only then would trading occur. The resulting allocation, together with pO, would constitute a competitive equilibrium in E. We shall describe the behavior of Walras's auctioneer symbolically by the following system of ordinary differential equations: dp(t) _

( )

dt - z p,

where z(·):

(R~

t ~ 0

(14.3)

- {OJ) --. Rn is as described in section 14.1.2; Le., for each

p E (R~ - {o}), m

z(p) =

I

(p(p, pw i ) -

Wi),

i=l

where p(p, pw i ) denotes consumer j's consumption bundle at (p, pw i ) when his choice is restricted to be in Xi = {x E Xi : X ~ 2 Ir=l Wi}. Walras thought that equation 14.3 would have a solution p(.): R+ --. R~+ such that p(t), as t --. 00, would converge to a pO E R~+ at which z(pO) = O. We next ask whether Walras's conjecture is reasonable. In reading our answer, it is important to keep in mind that, at any p E (R ~ - {O}), z(.) satisfies conditions i-iv listed in section 14.1.2; Le., z(·) is continuous, homogeneous of degree 0, and satisfies pz(p) = 0 and Zj(p) > 0 if Pj = 0, j = 1, ... , n. Moreover, Ilz(') II is bounded.

Consumer Choice and Resource Allocation

287

Walras's idea is useless for £' s consumers if equation 14.3 does not have a solution defined on all of R+. To help our intuition about the existence of such a solution, we next record and discuss two useful universal theorems, UT 13 and UT 14. Suppose that F('): R~+ -+ Rn is continuous. Moreover, suppose that there are constants M and K E R++ such that, for ally, y0 E R~+,

UT 13

IIF(y)11

~

M

(14.4)

and (14.5)

Finally, for each (~, x) E R+ X R~+, let J(~, x) be the interval (~ - a(x), ~ + a(x)), where a(x) = (minH;i,;;nx/M + 1). Then there exists a unique family y(t; ~,x) of solutions to the differential equation, dy dt

=

(14.6)

F(y),

defined for all y(~;~,x)

(~, x) E

R+ x R~+ such that

= x

and y('; ~,x) :J(~,x) -+ R~+.

Moreover, the functions on

{(t;~, x) E

R x R+ x R~+ : t E J(~, x)} to R~+ and Rn,

y( .) and dy( . )/dt, are continuous.

In the statement of the theorem, (~, x) detennines the initial condition, (~;~, x) = x, which insists that the solution to equation 14.6 pass through x at "time" t equal to ~. Moreover, the interval J(~, x) is chosen such that we can be sure that, for all t E J(~,x), y(t; ~,x) E R~+. To show why, we consider the integral equation, y(l) = x

+

I:

tE

F( y(s)) ds,

[~, a).

(14.7)

If y('): [~, a) ~ R~+ is continuous and satisfies equation 14.7, then = x and dy(t)/dt = F(y(t)), t E [~, a). Conversely, if y('): [~, a) ~ R~+ is a continuous solution to equation 14.6 that satisfies y(~) = x, then y(~)

y(t)-x=

t I

It

dy(s) ~ ~ds= ~ F(y(s))ds,

which demonstrates that y(') is a solution to equation 14.7 as well. Now equation 14.7 makes sense only for intervals [~, a) in which the values of y(') belong to R~+. Since, by equation 14.4, a solution to equation 14.7 must satisfy the inequalities

288

Chapter 14

I !I(t)

xII :;;;

-

f:

II F(!I(S)) II ds :;;; M(t -

e),

we see that [~, ~ + a(x)) is such an interval. Analogous remarks are valid for (~ - a(x), ~] as well. A function F('): R~+ ~ Rn which satisfies equation 14.5 is said to be Lipschitzian. The condition that F(.) be Lipschitzian is important for the conclusions of UT 13. If F(.) does not satisfy equation 14.5, the solution to equation 14.7 need not be unique and it need not vary continuously with the initial conditions (~, x). Suppose that n = 1 and let] Then consider the equation

E 14.2

=

R+ and F(y)

=

dy dt = (y - 1)2/3.

If b >

1 and ~

y(t; 0, b)

=

= 1+(

3,

(y - 1)2 / Y E RH

.

(14.8) 0,

t + 3(b -

3)3 ,

1)1 /

3

t

~

a

(14.9)

is the only solution to equation 14.8 through (0, b). If a < b :;:; 1, ~ andt = -3(b - 1)1/3 ~ 0, then, for each a> T, I y(t; 0, b)

=

+ ((t ~ 3(b -

1 when {

1

1)1 /3)/3)3 when

a :;:;

= 0,

t :;:; T,

t :;:; t < a

+ ((t -

a)/3)3 for a :;:;

t

is a solution to equation 14.8 through (0, b).

If F(·): R~+ ~ Rn is differentiable in an open neighborhood of XO E R~+, there is a compact neighborhood of XO in which F(·) is Lipschitzian. Also if F(·) is differentiable in R~ +, then F(·) is Lipschitzian on any compact subset of R~+. In a subset A of R~+ in which F(·) is Lipschitzian, UT 13 is valid with R~+ replaced by A; i.e., through any point in A there is one and only one solution in A to equation 14.6. Moreover, the solution varies continuously with the initial conditions. For example, in E 14.2, (y - 1)2/3 is not Lipschitzian in a neighborhood of y = 1. However, it is Lipschitzian in any compact connected subset of R++ that does not contain {1}. Note, therefore, that the solution in equation 14.9 varies continuously with b on {r E R+ + : r > 1}. According to T 10.5, z(·): R~+ ~ Rn is continuous. However, z(·) need not be Lipschitzian on R~+, and it need not be differentiable in any open subset of R~+. When z(·) is neither Lipschitzian nor differentiable, we need UT 14 below to characterize the solutions to equation 14.3. The

Consumer Choice and Resource Allocation

289

significance of the last half of the theorem will be borne out in T 14.7, where we insist that if a solution to equation 14.3 starts on a given sphere 5, it stays on the sphere for all relevant t. UT 14 Suppose that F('): R~+ ~ Rn is continuous. For each point (~,x) E R+ x R~+, there is a maximal interval (a, 13) with a < ~ < 13 on which there is a continuous solution to equation 14.6, y(';~, x): (a, 13) ~ R~+. If 13 < 00, then given any compact set A c R~+, there is atE (a, 13) at which y(t;~, x) ¢: A.

Thus if y('; ~,x): (oc, fJ) ~ R~+ and if fJ < 00, then as t ~ fJ either y(t; ~,x) tends to the boundary of R~+ or II y(t;~, x) II tends to 00 (or both). Proofs of UT 13 and UT 14 can be found in Graves (1956, pp. 152 and 159-160). It is surprising that there need not exist a solution to equation 14.6 that is defined for all t E (oc, 00). Here is an example of Hirsch and Smale's, which shows that this may be the case even when the function F(') is differentiable (Hirsch and Smale 1974, p. 171). E 14.3 Suppose that n equation

dy dt = 1

=

1 and let F(y)

=

1

+ y2, Y E R++. Then consider the

+ y2.

A solution to this equation is of the form y(t; ~,x)

=

tan(t - c),

where c is a constant determined by the values of ~ and x. Such a solution cannot be extended over an interval larger than (c - (nI2), C + (nI2)) since y(t;~, x) tends to ±oo as t tends to c ± (nI2).

According to UT 14 and the properties of z(·) there is, for each (( p) E R+ X R~+, a continuous solution to equation 14.3 that passes through p at t = ~ and is defined on some maximal interval (oc, fJ). In T 14.7- T 14.8 below we demonstrate that fJ = 00. It is awkward to write p(.; ~, p) for a solution to equation 14.3. Hence in the sequel we shall say that "p(.): (oc, fJ) ~ R~+ is a solution to equation 14.3 through (~, p)" if ~ E (oc, fJ) and p(t) = p(t; ~, p) for all t E (oc, fJ). T 14.7

Suppose that p('): (a, 13) ~ R~+ is a solution to equation 14.3 through R~+. Then, for all t E (a, 13), p(t)'p(t) = P'p.

(O,p) E R+ x

Hence a solution to equation 14.3 that begins on the sphere {p E R~+ :p' p = p' p} stays on the sphere for all t E (oc, fJ). The proof is easy: Since pz(p) = 0 for all p E R~+,

f

.L .

1=1

.(t) dpi(f) = 0

PI

dt

'

t E (oc, fJ) .

290

Chapter 14

Hence there is a constant k such that, for all t E (a, 13), p(t). p(t) = k. Evidently k = p.p. T 14.8 Suppose that p(.): (ct, 13) ~ R~+ is a solution to equation 14.3 through (0, p) E R+ x R~+. If (ct, 13) is the maximal interval corresponding to (0, p), 13 = 00.

The proof goes as follows: Suppose that 13 < 00, that pO E R~ is a limit point of p(t) as t approaches f3, and that p? = 0 for some i. Moreover, let ME R++ be such that IIz(p)11 ~ M for all p E R~ - {O}, and suppose that a < t < u < 13. Then (see equations 14.7) II p(u) - p(t)11 .,;

f

Ilz(p(s))11 ds ,;;; M(u - f).

From these inequalities it follows that if tk , k = 1, 2, ... , is a sequence of points in (a, f3) that tend to 13, then the corresponding p sequence, P(tk)' k = 1, 2, ... , is a Cauchy sequence. 2 Consequently, pO must be the only limit point of p(t) as t tends to 13. But if that is the case, p? = 0 is an impossibility since then for t close enough to 13, Zi(P(t)) > O. Suppose that p(.): R+ ~ R~+ is a solution to equation 14.3 through (O,p) E {o} x R~+. If z(p) = 0, p(t) = p for all t E R+. If z(p) f:. 0, there mayor may not exist a pO E R~+ such that z(pO) = 0 and such that limt ---. oo p(t) = po. We give evidence of this fact in E 14.4 and E 14.5. E 14.4

Consider the consumers in E when n = 2, and let and CZ = p' p. Then - rg(r) = Zl (1, r) and

r

= PZlpt,

g(r) = Zz (1, r),

dr dt =

c- 1 (1

+ r Z ) 3/2g (r).

(14.10)

Since g(') is continuous on R++, positive for small r, and negative for large r, it is easy to verify that Walras's conjecture holds for this economy; i.e., for each r E R++ there is an rO E R++ such that g(rO) = 0 and such that if r('): R+ ~ R++ is a solution to equation 14.10, through (O,r) E R+ x R++, then limt oo r(t) = rOo (This observation is due to Arrow and Hurwicz; theorem 6, --+

1958, p. 541.)

We may conclude from the preceding example that Walras's conjecture is valid for a two-commodity exchange economy. However, it is invalid for a three-commodity economy, as can be seen from an example by H. Scarf, E 14.5. In reading this example, note that the V i (.) do not satisfy H 6. I have simplified the structure of the utility function only for simplicity's sake. A more elaborate example can be found in Scarf 1960 (pp. 163172).

Consumer Choice and Resource Allocation

291

E 14.5 Consider a three-commodity, three-consumer economy in which the consumers' utility functions and initial resources are given by min(x"x2)' = min(x 2, X3), = min (x, , X3),

V'(x) =

V

2

(x)

V 3(x)

and w t = (1,0,0); and w 2 = (0,1,0); and and w 3 = (0,0,1).

x E Rt, x E Rt, x E

R;,

Assume that p' p = 3 and that Pt . P2 . P3 =I 1. Then z(p)

= ( __P_2_ + _P_3_,_P_t Pt

+ P2

Pt

+ P3

Pt

+ P2

P2

P_3_,_P_2 + P3 P2 + P3

Pt

P_t_). + P3

It is easy to verify that P = (1, 1, 1) is the only equilibrium price on the sphere {p: P' P = 3}, and that a solution to equation 14.3 through (O,p) E R+ x RL satisfies p(t) . p(t) = P' P and Pt (t) . P2 (t) . P3 (t) = Pt' P2 . P3' Hence, for this economy Walras's conjecture is false.

From E 14.5, it follows that the consumers' utility functions must satisfy stronger conditions than H 6 for W alras's conjecture to be valid in E. Various conditions have been suggested (see, for instance, Karlin 1959, pp. 312-313). In T 14.9 I suggest an additional condition on z(·). This condition would be satisfied in E if we could find positive constants aij and a strictly concave function, U(·): R+ ~ R+ such that n

Vi(X l

, ... ,

xn ) =

L

aijU(xj)'

i = 1, ... ,m.

j=l

The theorem is due to K. Arrow and L. Hurwicz (1960, p. 640), the proof to Bent Birkeland. Suppose that p(.): R+ ~ R~+ is a solution to equation 14.3 through R~+. Suppose also that, for all P" E R~+ and P E {p E R~: P' P = p'p}, if z(p"j = 0, p"z(p) > unless z(p) = 0. Then there is a pO E R~+ such that z(pO) = and lim, oo p(t) = pO.

T 14.9

(0, p)

E

{O} x

°

°

-+

To prove T 14.9 we pick a p'" E R~ + such that p".. p'" = p' p and z(p"') = O. Moreover, we let V(p): R~ ~ R+ be defined by V(p) = lip - p"'11 2 and observe that dV( p(t))/dt = - 2p"'z( p(t)), t E R+ +. Hence either z( p) = 0 and V(p(')) is a constant or V(p(')) is a decreasing function of t in some maximal interval [0, IX). If z(p) = 0 or if z(p) #- 0 and Z(p(IX)) = 0 for some finite IX, there is nothing to prove. In the remainder of the proof we therefore assume that IX = 00 and hence that z(p(t)) #- 0 for all t E R+. There is a sequence tk E R+ such that t k + 1 ~ tk+l' k = I, 2, ... , and such that limk-+ O. T15.5 Suppose that belt (.) and beI 2 (') are belief functions over 0, with basic probability assignments m l (. ) and m 2 (.) and focal elements AI' ... , A k and Bl , ... , B" respectively. Moreover, suppose that

L

m l (AJm 2 (Bj)


0 and P(B) > 0, T 15.15-a version of the Bayes theorem must be true. P(BIA)P(A)

If Bi E ff, i = I, ... , n; if Bi n Bj = 0 when i =I j, i, j §" and A c U~=l Bi ; and if P(Bi ) > 0, i = 1, ... , n, then

T 15.15

A

E

=

I, ... ,

n;

if

n

P(A)

=

L P(AIBi)P(Bi ),

i = I, ... , n

i=1

and (if P(A) > 0) P(BjIA)

=

P(AIBj)P(Bj )

l~ P(AIBi)P(Bi )·

(15.13)

In most applications of the Bayes theorem P(B;) is referred to as the prior measure of Bi , Le., the measure an individual would assign to Bi if he had no

The Measurement of Probable Things

329

observations to guide him in assigning probabilities to events. In addition, P(B)A) is referred to as the posterior measure of Bj , i.e., the measure that an individual with prior probabilities P(Bi ), i = 1, ... , n, would assign to Bj after having observed A. Finally, the conditional probabilities P(AI Bi ), i = 1, ... , n, measure the likelihood of A as a function of the Bi . Hence P(A ,.) is often called a likelihood function. In most applications the conditional probability measures, P(· IBi ), i = 1, ... , n, are known probability measures. E 15.16 Two indistinguishable urns B1 and B2 are standing in front of a blindfolded man. Each urn contains 100 balls that are identical except for color. Some are red; others are white. B1 contains ten red balls and B2 contains eighty red balls. The urns are shaken well and the blindfolded man picks a red ball from one of them. What is the probability that he sampled from B1 ? In this case our prior measures of the Bi are given by P(B1 ) = P(B2 ) = 112. Also P(red ball IB.)

=

0.1

=

0.8.

and P(red balllB2 )

Hence our posterior measure of B. is as follows: P(B.I red ball)

=

0.1' 0.5 0.1' 0.5

+ 0.8' 0.5

1 9

15.3.3 Posterior Probabilities and Conditional Belief Functions

It is interesting to contrast the way information is processed in T 15.15 with the way evidence is pooled in T 15.5. In T 15.15 the information we have and the information we gain are treated asymmetrically. We begin with the prior measures P(Bi ), i = 1, ... , n, observe A and compute the posterior probabilities P(~IA), j = 1, ... , n. Moreover, we never question whether we actually have observed A. In T 15.5 when we form the orthogonal sum of belt (.) and beI 2 ('), the evidence on which belt (.) is based and the evidence on which bel 2 ( .) is based are treated symmetrically. Moreover, neither belt (.) nor bel 2 ( .) insists that some particular subset of 0 must have happened. The preceding contrast notwithstanding, there are situations in which the orthogonal sum of two belief functions can be interpreted as a posterior belief function. For example: Let (0, ff') be an experiment that satisfies GS 1 and GS 2, and let belt (.): ff' -. [0, 1] be a belief function with basic probability assignment m t (.): ff' -. [0, 1]. In addition, let E E ff' be an

330

Chapter 15

event that has a nonempty intersection with the core of bell ('), and let bel2 (. ): ~ -+ [0, I] be a belief function whose probability assignment, m2('): ~ -+ [0, I], is given by m2(A)

=

g

if A = E otherwise.

Finally, assume that E is a proper subset of 0 and that bell (P) < I. Then if m(') denotes the basic probability assignment of (bello bel 2) ('), m (0) = 0 and for all nonempty A E !F, m(A)

=

B....J"E=A m, (B)

L

=

Be.9'.BnE=A

I(1 -

m l (B)/(1 -

Be ....

~E=0 m, (B))

bell (EC)).

Moreover, (bello bel2) (A) =

L

m(D)

DcA

= (I - bell (P))-l

L

L

0#D,DcA Be.9',BnE=D

L

= (I - bell (P))-l

m l (B)

m l (B)

Be.9',0#BnE,BnEcA

=

(1 - belt (Ec))-l

L BcAUEC,B¢EC

= (I - bell (P))-t (bell (A

m l (B)

u P) - bell (P)).

In this case (bell 0 bel 2)(.) assigns numbers to events as if the only possible outcomes in 0 were the outcomes in E. That is, (bell 0 bel2) (. ) assigns numbers to events as if it were a genuine conditional belief function, e.g., a posterior belief function. Our observations concerning posterior probability measures and orthogonal sums of belief functions suggest that we define conditional belief as follows: Let (O,~) be an experiment that satisfies GS I and GS 2, and let bel('): ~ -+ [0, 1] be a belief function. In addition, let A and E be events such that bel(P) < I. Then the conditional belief in A given E is defined by bel(AIE) = (I - bel(p))-I(bel(A u P) - bel(P)).

(15.14)

It is easy to verify that bel( -IE): ~ -+ [0, I] is a well-defined belief function and that bel(AIE) reduces to bel(A n E)/bel(E) whenever bel(') is additive. We shall return to conditional beliefs in chapter 24. I

The Measurement of Probable Things

331

15.3.4 a-Additive Probability Measures Probability measures-as we define them-need not be countably additive. Therefore, when:#' is a a field, we usually also require that P( . ) satisfy a third condition, AK 3.

=

Suppose that Ai E $', i = 1, 2, ... , and that Ai II A j 0 for i =1= j and i, 1, 2, .... Suppose also that $' is a a field, and that P(· ) is a probability

AK 3

=

j

measure on (0, $'). Then

Conditions EX 1-EX3 and AK 1-AK3 constitute A. Kolmogorov's axioms of probability (Kolmogorov 1933, pp. 2 and 13). A triple (O,:#" P(·)) which satisfies them is a probability space. An important example of a probability space is the space (O,a(:#'),jl(')) described in E 15.17 and E 15.18. E 15.17 Let 0 = [0,1); let 5 denote the family of all subsets of 0 of the form [a, b), where ~ a < b ~ 1; and let $' be the smallest field that contains all the sets in 5. Moreover, let Ps ('): 5 -+ [0, 1] be defined by

°

=

Ps([a, b))

O~a 0, lim P({w

E Q: Ixn(w) -

x(w)1

>

e}) = 0.

We also say that the xn (·) converge with probability I to a random variable x(·) and write

=

lim xn(w)

x(w)

(15.21)

a.e.

if and only if, for all e > 0,

lim

peV,

{w En: Ix.(w) - x(w)l

>

e)

=

o.

The "a.e." in equation 15.21 is short for "almost everywhere" and the latter is short for "everywhere except in a set of P measure zero." To clarify the two concepts of convergence, we make the following observations: Let

n u {w n=l k=n 00

A(e) =

00

En: Ixk(w) -

x(w)1

>

e}.

Then the x n(·) converge to x(·) with probability I if and only if P(A(e)) = for all e > 0. A(e) consists of all w with the property that infinitely many of the x n(·) satisfy Ixn(w) - x(w)1 > t:. Its complement satisfies

°

U n {w E Q: Ixk(w) 00

A(e)C

=

00

x(w)1 ~ e}.

n=l k=n

there is an integer no(w) such that for all n ~ no(w) If x(w) = ct for all w, convergence of the xn (·) with probability 1 implies that, for each e > 0, there is a fixed (I) set, A(E)C, such that the values the xn(w) for all w E A(t:Y eventually lie within t: distance of ct. If the x n (·) just converged in probability to ct, the most we could assert would be that, for all (j > 0, there exists an integer n(c5, c:) such that, for all n > n(j, r,), For all w

E A(e)C

Ixn(w) - x(w)1 ~ t:. Moreover, P(A(E)C) = I.

ot

P({w E Q: Ixn(w) - ctl

>

e})


a

Moreover, for any i = 1, ... , n

°= and

lim Fxl ..... xJa 1 , ... , an) aj-+-oo

(15.28)

The Measurement of Probable Things

1

= lim

337

(15.29)

Fxlo ... ,x.,(a1, ... ,an ).

ai -+ co i=l, .. . ,n

T 15.19 summarizes these observations. Let (O,:F, P(·)) be a probability space, and let x i ('), i = 1, ... , n be random variables. Moreover, let Fxt ,.... xJ·) be the joint probability distribution of the xi (·) as defined in equation 15.26. Then Fxt, ... ,xJ·) is monotone, nondecreasing, continuous from below, and satisfies equations 15.27, 15.28, and 15.29.

T 15.19

An example of a two-dimensional distribution will be given in E 15.20. In reading the example, observe that lim b ---+ oo F1A1B(a, b) = F1)a), where F1)') is as described in E 15.19. This is as it ought to be, since if Fxt ,... ,xJ') is as defined in equation 15.26 and if Fxt, .... Xj

I.Xi+I .....

xJb)

= P({WEQ:X1(W)


oo

1=1

a.e.

(16.12)

and n

lim n- 1 n-->oo

L IA(o)(x(t, w)) = F

XI

(a)

a.e., a E R.

(16.13)

1=1

Our discussion of E 16.1 shows that T 16.6 and T 16.7 can be considered as asserting that numerous observations on the values assumed by one random variable can be used to determine the intrinsic probability of observing anyone of the events we associate with the random variable. Such an interpretation of T 16.6 and T 16.7 is interesting; e.g., it suggests that, by repeating an experiment in which we pick a ball from an urn with an uncertain composition of red and white balls, we would, after sufficiently many repetitions, eventually be able to determine the true proportion of red balls in the urn. Similarly, by tossing a coin sufficiently many times, we could determine the true probability of heads. These observations sound good, but can be misunderstood. Let us, therefore, take another look at E 16.1 and interpret it as describing an experiment in which a number is drawn repeatedly (with replacement) from an urn containing ten identical balls marked with the numerals 0, I, " ., 9. The probability of picking a prescribed infinite sequence of numbers is zero; i.e., the likelihood that we will observe the sequence Wo = 3333 .... is as large as the likelihood that we will observe WI = 454545 .... or W2 = 012345678901234 .... Yet W o will yield 3 as a limit in equation 16.12 and F (a) = 0 if a ~ 3 and F (a) = 1 if 3 < a ~ 9 as a limit in equation XI

XI

Chance

363

16.13, whereas W2 will yield the right limits in both equation 16.12 and 16.13. Note in this respect that WI yields the right limit in equation 16.12 and the wrong limit in equation 16.13. Therefore the exceptional points in T 16.6 need not coincide with the exceptional points in T 16.7. A second remark: In E 16.4 we picked the values of the x(t, .) in accordance with the Pt (') in equation 16.3. Evidently, we could also have

picked the values of the x(t, .) according to any other set of probabilities, {Po, ... , P9}' with 0 < Pi' i = 0, ... , 9. Doing so would in no way alter the set of possible number sequences that we might observe. However, it would change the set of exceptional points in both equation 16.12 and 16.13. 16.3.3 The Central Limit Theorem

A third remark: The limits in equations 16.10 and 16.11 are completely independent of observations made on a finite number of the X n ( .), since for any fixed positive natural number No n

lim n- l

L xj(w) =

No

lim n- l

i=l

L xi(w) + lim ni=l

= 0

+ lim n- l

n

l

L

Xi(W)

i=N o +l n

L

xi(w).

i=No+1

Even so, when the x i ( .) are independently and identically distributed with finite mean and variance, the limit in equation 16.10 can, within prescribed bounds, be determined for sufficiently large n. The reason why is explained in T 16.8, which extends the result we described in E 15.22. T 16.8 Suppose that x i ('), i = I, 2, ... , are independently and identically distributed random variables on a probability space (0, ~, P( . )). Suppose also that they have finite mean J1 and variance (J2. Then the random variables t 2:7=1 i (·) - J1)/(J), = 1,2, ... , converge in distribution to a normally distributed random variable with mean 0 and variance 1.

j;;«n-

x

n

E 16.5 will aid our intuition concerning the significance of T 16.8: E 16.5 We are asked to measure the probability that a given coin comes up heads. Since we have no idea of the physical constitution of the coin, we decide to determine the probability experimentally. We toss the coin once and record the result, 1 if heads and 0 if tails. Thereafter we repeat the experiment 9999 times. By then the first three decimals of the estimated frequency of heads seem to have settled on 0.376. So we decide to let P(heads) = 0.376. In order to apply T 16.8, we observe that in this experiment J1 = p and (J2 = p(1 - p), where p denotes the probability of our coin coming up heads.

364

Chapter 16

Table 16.1 An excerpt of results.

Number of tosses 1000 2000

3000 4000

5000

6000

7000 8000

9000

10.000

Frequency 0.382 0.3745 0.379 0.3815 0.3774 0.3795 0.377 0.376 0.3763 0.3762 of heads This fad. a little algebra. and a table of values of the normal distribution suffice to ascertain that the probability that (10.000)-1 IJ:?t OO xj ( . ) will differ from p by less than ± 0.01 is close to 0.95. So much for the Law of Large Numbers. In the next section we shall use its corollary, T 16.7, and the ideas of Doob's theorem. T 16.2. to give an empirical characterization of chance.

16.4 An Empirical Characterization of Chance There are probabilists, called frequentists, such as L. von Mises and Kolmogorov. who believe that the most reasonable way to interpret probability measures is the one suggested by T 16.7; i.e., the probability of an event A in a given experiment equals the frequency with which A would occur in repeated performances of the experiment. Von Mises even insisted that the concept of probability should only apply to problems in which either the same experiment repeats itself again and again or a great number of uniform elements are involved at the same time (von Mises 1951, pp. 8-9). Examples are: 1. A game of chance such as heads or tails.

2. A carefully defined actuarial problem such as: What is the chance that a 40-year-old man insured before his thirty-ninth birthday and living in Norway will die before he is 41 years old? 3. A carefully described mechanical or physical phenomenon such as the random motion of colloidal particles. 16.4.1 The Collectives of von Mises To arrive at a meaningful probability measure for experiments such as those listed above, von Mises introduced the idea of a collective. A collective is a sequence of real numbers satisfying several conditions which we detail below. In stating these conditions we use certain symbols, R, T, = {q>i('); i E T} denotes a gambling system (relative to some unspecified probability measure P( . )); d denotes a family of gambling systems; K(· ) denotes n

K(n; q>, x) =

I

q>i(X),

q> E d and x E R T

(16.14)

i=O

!JB denotes the Borel field of subsets of R, and Q(.) denotes n

Q(A, x) = limn- 1 IIA(xJ,

(16.15)

i=O

Then x

E

R T is a collective in the sense of von Mises if and only if

(i) Q(', x) is well defined and finitely additive on !JB, (ii) for all A E !JB and all q> E d n

Q(A, x)

= lim K(n; q>, X)-l

I

IA(xJq>i(X),

i=O

(iii) q> E d if and only if limn-+ex! K(n, q>,x) = 00. When x is a collective and A E !JB, von Mises insisted that Q(A, x) measures the chance of observing an outcome in A in the experiment associated withx. In order that von Mises' characterization of chance be meaningful, each and every purely random process with finite mean must possess a sample path that satisfies the conditions of a collective. For example, if {x(t, w); t E T} is a purely random process on a probability space (0, fF, P( . )), with flx11 < 00 and sample space (0., #), there must be a OJ E 0. such that, for all A E !JB and A = {w EO: x(l, w) E A}, (16.16)

p(A) = Q(A, w).

Unfortunately, von Mises' collectives fail to satisfy this condition, as evidenced in E 16.6. E 16.6 Let T = {O, 1, ... } and n = [0, 1]. In addition, let f!4 denote the Borel subsets of n and let J1(' ) be the Lebesgue measure on (n, f!4). There exists no OJ E n T such that, for all A E f!4, n

lim n- 1

2:

fA (Wi)

=

J1(A).

i=O

To see why, suppose that OJ E nT is such a sequence and let A = U~O {Wi}' Then A E f!4 and J1(A) = 0. Yet lim n- 1 2:7=0 IA(w i ) = 1.

366

Chapter 16

The crucial property of (0, fIJ, J1( •)) in E 16.6 is that, for all 0) E 0, E fIJ and J1( {O)}) = O. Since any other probability space (0, ff', P( . )), with {O)} E ff' and P( { 0) }) = 0 for all 0) E 0, provides exceptions to equation 16.16 similar to the exception exhibited in E 16.6, the example suggests that von Mises' characterization of chance is meaningful for probability spaces with {O)} E ff' for all 0) E n only if P( {O)}) > 0 for at least one 0). Unfortunately, von Mises' ideas do not fare well then either-to wit E 16.7.

{O)}

E 16.7 Let T and 0 be as in E 16.6 and let r!J denote the Borel subsets of O. Furthermore, let P(·) be a a-additive probability measure on (0, f!J) such that, for some Wo E 0, 0 < P( {Wo}) < 1. Finally, suppose that w is a collective, with P(A) = Q(A, w) for all A E r!J. Then 6J contains infinitely many components that equal woo Hence we can find a gambling system qJ such that, for all i E T with wj = wO, qJj(6J) = 1, and such that qJj(6J) = 0 for all other i. Then n

P({WO}) < 1

=

limK(n;qJ,w)-l

L I{roO} (w;) qJj(w), j=O

which shows that

w cannot exist.

16.4.2 Church's Concept of Chance

The preceding examples suggest that von Mises' concept of a collective must be modified in several ways. Specifically, we must ensure that the experiments to which the concept of a collective applies are uncomplicated. We must also impose constraints on, the set of eligible gambling systems. We shall next consider modifications of von Mises' collectives that are due to A. Church (1940, pp. 130-135). To introduce Church's collectives we begin by letting T = {O, 1, ... }, o = {tX l , ... , tXm}, ff' = ,qJ(0), and n"" = U~l Oi. Here, as well as in the remainder of the chapter, we shall think of n as the range space of a purely with OT and ff with the smallest (J field random process and identify that contains all the cylinder sets in nT. Next we let y and dr' respectively, denote a vector in 0 T and a family of gambling systems. The latter is defined as follows: Let qJ = {qJi('); i E T} be a gambling system (relative to some unspecified probability measure on (0, ff')). Moreover, let Ij; (. ): n'" ~ {O, I} be given by

n

Ij;(x) = qJi(X, y),

x E Oi, i = 1, 2, ....

Then qJ E d r if and only if Ij; (.) is recursive. Finally, we let K(') and Q(') be as described in equations 16.14 and 16.15 with d, R, and !!J replaced

Chance

367

by dr' 0, and ff, respectively, and we insist that a sample path W a collective in the sense of Church if and only if

E OT

is

(i) Q(', w) is well defined and additive on (0, ff); (ii) for somej E {I, ... ,m}, 0 < Q({cxj},w) < 1; and (iii) for all A E ff and all qJ E d r with lim K(n; qJ, w) = 00, n

lim K(n;

L fA (WJ({Ji(W) = Q(A, w).

({J, W)-l

i==O

The collectives of Church have many interesting properties. Those that are important to us are recorded in T 16.9. Let (0, ff, P(·» be a probability space with 0 = {(XI'"'' (Xm}' P( {(Xj}) < 1 for some j. Moreover, let F(' ) be the product extension of P(·) to (0, ff); i.e., let F(' ) be a O"-additive probability measure on (0, :F) that, for all {t I' ... , tk } c T and At j E :F, j = 1, ... , k, satisfies T 16.9 :F

= ;?l'(O), and 0
E d

r

with limK(n, q>, w)

=

00,

n

P(A)

=

lim K(n; q>,W)-1

L IA(wJq>j(w), i=O

Then C(slr , P(·» #- 0 and P(C(dr , P(· ») = 1. Moreover, if wE C(dr , P(· », OJ is nonrecursive; i.e., the set {i E T: wj = (Xj} is nonrecursive, j = I, ... , m.

Thus if (0, ff, P( . )) is any probability space with finite 0 and nondegenerate P(·) and if xU, w) = W" t E T and WE.o, then {x(t, w); t E T} is a purely random process on (0, :J;, F( .)), with the property that almost all its sample paths are collectives in the sense of Church. Moreover, these collectives are nonrecursive and hence nonprogrammable in the sense that no finite computer program exists that can reproduce a collective. The proof of T 16.9 is obtained in two steps. In the first we compute the probability of C(dr , P(· )). Let Q = {w

E.o: P(A) = Q(A,

w) for all A

E ff}.

Since ff contains only a finite number of sets, it follows from T 16.7 and the additivity of P( .) that F(Q) = 1.

368

Chapter 16

Next, for each

qJ E

dr' let

BqJ,~p = {w En: limK(n; qJ, w) = 00, and n

P(A) =1= lim K(n; qJ, W)-1

L I A (Wi)qJi(W) i=1

for some A

E

.?}.

We shall show that P(BqJ,~p) = O. This follows from T 16.2 when qJ is a gambling system relative to P(·). When qJ is not a gambling system relative to P('), p( {w E () : lim K(n; qJ, w) = oo}) < 1. For such a gambling system, the equality in equation 16.5 becomes an inequality; Le., for all BE:#', P( {w: i(w) E B}) ~ P( {w E,Q: x(w) E B}). This and T 16.7 imply that P(BqJ,~p) = O. Thus, for all qJ E dr' P(BqJ,~p) = O. Finally, observe that -UqJEd'r BqJ, ~P C C(dr , P(·)) and that there are only countably many qJ in dr. From this and the preceding comments, it follows that P(C(dr , P(· ))) = 1. In the second step we show that the sample paths in C(dr , P(·)) are nonrecursive. Suppose that W E C(dr' P( .)) and let y be an arbitrary vector in ,QT. Moreover, for each pair (i,j) and x E ,Qi, j = I, "" m, i = 1,2, ... , let qJ!('): ,QT ~ {O, I} be defined by qJd(Y) = 0 and

n-

. = {I

qJ!(X, y)

0

if Wi = (Xi otherwise.

Then qJi = {qJ! ( . ); i E T}, j = I, ... , m, is a gambling system and, for some j, lim K(n; qJi, w) = 00. Moreover, if W is recursive, the functions qJi(X) = qJ!(x, y),

X E ,Qi,

i = 0, I, ... ,

j = I, ... , m

are recursive too. Since the recursiveness of the qJj(.) and the unboundedness of one of the K(n; qJi, w) contradict the assumptions of the theorem, w cannot be recursive. The two conclusions of the theorem have a bearing on our search for an empirical characterization of chance. To wit: The fact that Church's collectives are nonprogrammable captures the idea that the events that we may define on a collective are fortuitous in the sense that they happen by chance and they carry no information as to the likelihood of future events. Furthermore, the fact that P(C(dr , P(· ))) = 1 captures the notion of chance that we met in our discussion of games of chance: Gambling systems are of no strategic value in sequences of games performed under uniform conditions. This is so even though we restrict the gambler's choice of gambling

Chance

369

systems to recursive q> ( •)' s. Any gambling system of which a gambler might conceive is likely to be recursive. With the preceding observations in mind, we can propose two concepts of chance. The first is akin to C. S. Peirce's notion of probability as the "would-be" of a chance mechanism (Peirce 1955, pp. 164-173): Let the triple (0, ff, P( . )) be a mathematical idealization of an actual experiment and suppose that the chance of an event in ff happening is a dispositional property of the chance mechanism that we associate with the experiment. Then, for all A E ff, T 16.7 justifies our interpreting P(A) as the chance of A occurring; i.e., in symbols, chance(A)

=

P(A).

(16.17)

Our second concept of chance is analogous to von Mises' notion of probability (1951, pp. 18-20). To wit: Let (0, ff) be an experiment with a finite number of outcomes and suppose that the chance of an event in ff happening is an intrinsic property of an ever-unfolding sequence of outcomes of repeated performances of this same experiment. Then if W E Q is a collective in the sense of Church and A E ff, T 16.9 and W E C(~, Q(', w)) justify our interpreting Q(A, w) as the chance of A occurring as w unfolds itself; i.e., in symbols, chance (A, w) = Q(A, w).

(16.18)

This is so even though Q(', w) need not equal the dispositional property of the chance mechanism associated with (0, ff). 16.5 Chance and the Characteristics of Purely Random Processes

I believe that T 16.9 and equation 16.18 give an empirical characterization of chance; the resulting concept captures the idea of chance embodied in a purely random process. It is, therefore, important to note that this opinion might not be shared by others. One reason for disagreeing with me would be that Church's collectives need not satisfy the Law of the Iterated Logarithm, T 16.10. Suppose that {x(t, w); t E T} is a purely random process on a probability space (0, /IF, P( . »). Also suppose that x(l, .) has finite mean J1 and variance (J2. Then

T 16.10

Ln (xU, w) P

{

w EO: lim sup

/l)

I I )1/2 (J(2n og og n i=l

}

= 1

= 1.

370

Chapter 16

In fact,

J. Ville

(1939) has shown that, for any triple (O,§',P(')) with and §' = &'(0), there are sample paths 6J E C (dr' P( , )) such that, with p = P( {I} ),

o = {O, I} lim sup

I,t (w, -

p)/(p(l - p)2n log log n)1/21

= O.

Be that as it may, Ville's result does not demonstrate that my seond concept of chance is inadequate. It establishes only that Church's concept of a collective provides an unsatisfactory characterization of the typical sample path of a purely random process. Since the irrelevance of Ville's result for our empirical characterization of chance is important to us, we shall dwell on it for a moment. Consider an axiomatic system with undefined terms-merkmalraum, experiment, probability, place selection-and axioms MA I-MA 4. A merkrnalraum is a finite collection of objects, M

MA 1

= {a l ' ... , am}.

MA 2 Let T = {O, I, ... }, M t = M, t E T, and Jlt = &>(M), t E T. Then an experiment is a pair (0., #), where 0. = e T M and # = e T Jlt , the smallest (J field that contains the cylinder sets in o..

Dt

Dt

t

MA 3 A probability is a (J-additive function F('): # ~ [0, 1] which satisfies two conditions: (1) F(o.) = 1 and (2) there exists a finitely additive function P(·): f?lJ(M) ~ [0, 1] such that P(M) = I, 0 < P( {tIj }) < 1 for some j E {I, ... , m}, and for all cylinder sets,

A

D

=

i=l •... n

At; x

D

teT

M t,

F(A)

=

D

i=1 •...• n

P(At;)·

t#t;

i=l, ...• n

MA 4

A place selection is a recursive function 00

!!An + l and a universal theorem of Doob's concerning convergence of martingales (see Doob 1953, theorem 4.3, p. 331). For brevity's sake I shall not elaborate.

Exchangeable Random Processes

401

Now on to the y of T 18.4, ECx l , and the structure of P(·ly). First, y and EC x 1 : Theorem T 18.5 demonstrates that the random variable y(.) in equation 18.12 exists and that it is a.e. equal to ECXl' From this, T 18.1 (iv), and the fact that C c C and y is C-measurable, it follows that ECXl = EC (E Cx 1 ) = ECy = y

a.e. (Pc measure)

as asserted above. Next we shall show that, conditional upon y, the pendently and identically distributed.

Xi

in T 18.4 are inde-

T 18.6 Let x I ( . ), Xl (. ), ... , be a sequence of exchangeable random variables on a probability space (!l, !F, P(' )) and assume that the Xj(') only take the values and 1. Moreover, let C be as defined in T 18.5 and let y = Eex l . Then, for k = 1, 2, ... , and any n different positive integers, ti , j = 1, , n,

°

=

P({w E n:xti(w)

=

1, i

y k(1 - y)n-k

=

1, ... , kandxtk./w)

a.e.

(P~(y)

=

O,j

=

1,

,n -

k}IY)

measure)

and a.e. (P~(y) measure).

The proof of T 18.6, which we sketch below, is due to D. Kendall who attributes the theorem to A. Renyi and P. Revesz (Kendall 1967, pp. 319-325). We begin by observing that if 1 ~ t 1 < t 2 ~ n, and if, for each n = I, 2, ... , N(n) = I?=l Xi' then with probability I Egln(xt 1 . Xt2 )

=

Egln(x 1 . Xn )

=

Egln(xn EiJln-lx 1 )

= Egln(xn • (n -

1)-1N(n -

= Egln(xn , (n -

1)-1 (N(n) -

= (n - 1)-1 (N(n) -

since N(n - I) 1

~

t1
(') denotes the prior measure on subsets of e. Finally, we denote by 0,

406

Chapter 18

(18.28)

where Iia - bll designates a suitable measure of distance between a and b. A Bayes estimate of () can also be many different things. However, most of the time the Bayes estimate of () is a function of the observations, 6B (X 1 , .•• , xn ), that satisfies the equation OB(X l' .. "X.) =

f..

II dcp.(lIlx 1 ,· ." x.).

To a classical econometrician the Bayes estimate is consistent if and only if it satisfies equation 18.28, with 6c (') replaced by 6B (·). But in the setting of Bayesian econometrics, the classical notion of consistency makes little sense. So a Bayesian will usually define consistency differently. A Bayesian econometrician envisions that nature picks a point (}O E e in accordance with a chance mechanism which is governed by an alias of his own prior probability measure qJ('). To him the pair ((}O, qJn(')) is consistent if and only if, for all bounded continuous functions f(·): e ~ R and almost all x E f!(OO (Q6b measure), (18.29)

where 0 0 ( . ) denotes a Dirac measure that assigns numbers to subsets of e in accordance with if (}O E A otherwise. Thus, to a Bayesian, ((}O, qJn(')) is consistent if and only if, for almost all sequences of observations in f!(oo (Q6b measure), the posterior probability measure qJn ( . ) eventually shrinks to a point measure at (}o. In most parametric statistical inference of interest to econometricians, the consistency of ((), qJn(')) implies the consistency in the classical sense of 6B (X 1 , ..• ,Xn) as well; i.e., if ((), qJn(')) is consistent, then for the given(!) () and all 8 > 0, equation 18.28 holds, with 6c (') replaced by 6B (·). It is, therefore, interesting to observe that for most parametric statistical inference in econometrics ((), qJn ( . )) is consistent for almost all values of () (qJ measure). This observation is due to Doob (1949, pp. 23-27) and is stated precisely in T 18.8. T 18.8 Let f![, 0, Qo, ({In' and 8B (x l' ... , x n ) be as described above. Suppose that :!l' and 0 are, respectively, Borel subsets of Rand R k for some k ~ 1, and let fJ8

Exchangeable Random Processes

407

denote the Borel subsets of e. Moreover, suppose that there is a function e x PI" ~ R such that, for all y E PI" and A = PI" n (-00, y),

q('):

Q(I(A)

=

fA q(O,x)dx

and such that, for each y E Q(O, y)

=

Q(I(PI"

n

f!{,

the function

(-00, y))

is measurable with respect to f!l. Finally, let cp(.): f!l measure on f!l and suppose that (i) if 01 , O2 E

e

=

denote the prior

and 01 i= O2 , then Q(ll =I:- Q(l2; and

(ii) there is a function g( . ): cp(A)

~ [0, 1]

fA g(O) dO,

e

~

A

E f!l.

R such that

Then, for almost all 0 (cp measure) (0, CPn(')) is consistent. Moreover, if 00 and (0, CPn(')) is consistent, then

Se Og(O) dO < lim

8B(X 1 , ... ,xn ) = 0

a.e. (Q~ measure)

as well.

Since the statement of Doob's theorem is easy to understand and since the proof would not add to our understanding of the import of the theorem, I shall omit the proof for brevity's sake. Doob's theorem allows us to compare the classical and Bayesian notions of consistency. According to the classical notion, 8c (') is consistent if it satisfies equation 18.28 for all 8 E e. However, since the classical econometrician does not know the true value of 8, the "for all 8" accompanying equation 18.28 does not carry much weight by itself. It is the assumption that there is really just one relevant value of 8 that makes equation 18.28 meaningful as a definition of consistency. As to the Bayesian notion of consistency, equation 18.29, note that Doob's theorem provides no information about the null-8-set which contains all 8 for which (8, 1 be so large that 9 F (E) E "Wk(R). Then it is true in d R that

E Wk(R)

and

A

('ilx E Wk(R))[[x E 9 F (E)]

==

(3z E 9(E X N))(3y E N)B(z, x, y)].

Hence, by LP, it is true in d' R that ('ilx E "Wk(R) )[[x E "9F (E)]

==

(3z E "9(E x N) )(3y E "N)"B(z, x, y)].

Since A is a hyperfinite subset of "E that belongs to "Wk(R), it follows from the preceding observation that A E "9F (E) is true in d.Ro The A in T 20.16 may contain infinitely many members. Hence "9F (E) need not equal 9 F ("E). In fact, "9F (E) = 9 F ("E) only if E is finite. To intuit the reason why, consider the following example. E 20.5 Let R, "R, and ,.(.) be as described in section 20.4.3 and let E = [0, 1]. Then A E "PJ>F(E) if and only if there is a Bo/I E i(PJ>F(E)) so that A = j(Bo/i)' If { Bn } is a member of the equivalence class of Bo/I and if, for all n E N, Bn = {O, lin, 21n, ... , I}, then Bo/I E i(PJ>F(E)) and j(Bo/I) contains infinitely many members.

20.7 Admissible Structures and the Nonstandard Universe

Let T = T(KPU, KA 10, NSA 1-NSA 19) and let d R and d' R be the structures for L described in section 20.5. Neither d R nor d' R is a model of T. Thus to complete our description of the nonstandard universe we must show that d' R has an extension that is a model of T. We begin with a few remarks concerning admissible structures.

Nonstandard Analysis

481

20.7.1 Admissible Structures

In section 20.1 we described the language of T, L, as a single-sorted language with two constants, 0 and 1, two binary functions, + and " two unary predicates, U and 5, and two binary predicates, < and E. We insisted that 0 and 1 be urelements and described the extension of +, " and < in the set of urelements. We also postulated that U, 5, and E satisfy the axioms of KPU, KA 10, NSA 1, and NSA 19. Since the extension of +, " and < in the universe of sets is irrelevant as far as T is concerned, we may think of a model of T as a quintuple, (A, A, 0/1, ,'f, E), where (i) A is a structure for a first-order language with two constants, 0 and 1, two binary functions, + and " and one binary predicate, card(IJIII). Finally, let H(k)lvltl be defined by the following recursive scheme:

T 20.17

= 0; G(f3 + 1) = G(O)

G(A)

= U

{x

c

IJIII u G(f3) : card(x) < k};

G(f3) if Ais a limit ordinal;

pc;'

H(k)!vltl =

U G(P)· P card(W("R)), the model of T that we described in T 20.17 becomes an extension of d'R' with +, ., and < restricted to *R. In thinking about the model of T described in T 20.17 and the extension of d'R proposed above, several observations are relevant. To justify the construction of H(k)IAlI' we must assume the validity of the Axiom of Choice. Hence we cannot use T to establish the existence of a model of T. Moreover, to find a k that is large enough for the construction of an extension of d'R' we may need Tarski's axiom concerning the existence of inaccessible cardinals. For a discussion of inaccessible cardinals, see Drake 1974 (pp. 65-68). The need for additional axioms to establish the existence of models of T is analogous to our needing KA 10 to construct a model of KPU in section 9.8 (see ST 49 and the discussion that followed).

21

Exchange in Hyperspace

In chapter 14 we studied the objective possibility and nomological adequacy of a scenario in which a finite number of consumers meet in the market to exchange goods at competitive equilibrium prices. The scenario has one disquieting feature: All consumers act as price takers even though anyone of them acting alone could, if he chose to, influence the price at which trading occurs. In this chapter we shall use nonstandard analysis to study exchange in an economy with infinitely many consumers. Our purpose is (I) to model behavior in economies in which a single consumer by his own actions cannot influence the price at which trading occurs; and (2) to show that in sufficiently large economies a core allocation is a competitive-equilibrium allocation. 21.1 The Saturation Principle

To study exchange in hyperspace we must first impose a new condition on superstructure embeddings and discuss useful topological characteristics of hyperspace. Throughout our discussion L is the first-order language described in section 20.1. Also d R = (W(R), N.i'l R ' FR, C R) and dO R = (W("R), N.i'I.R' FOR, COR) are the structures for L delineated in section 20.5.1. Finally, ",(.): W(R) -+ W("'R) is a superstructure embedding that satisfies LP. 21.1.1 The Saturation Principle

The new condition that we impose on superstructure embeddings is called the Saturation Principle. We formulate it as follows: SP Suppose that Ai E W( 'OR), i E N, is nonempty and internal. Suppose also that A o ::::> At ::::> •••• Then '" [ni e N Ai = 0]; i.e., there is an internal object that belongs to all the Ai.

Exchange in Hyperspace

485

As evidenced in the following theorem, this is a meaningful principle. T 21.1

There are superstructure embeddings that satisfy both LP and SP.

In fact, the superstructure embedding j(i(.)) constructed in section 2004.3 satisfies SP as well as LP. This is shown as follows: Let Ai' i E N, be a decreasing sequence of nonempty internal sets, and assume that, for some k > 1 and all i E N, Ai E "'Wk(R). Then there are sets Bij E Wk(R), j E N, i EN, such that, if 13i = {Bij} min(U(t, x), U(t, y)).

d. U (t, .) is "'-continuous. In our intended interpretation of E, T is an index set, each point of which names a particular consumer in E. Different points in T name different consumers; and each consumer in E is named by some point in T. The set "'R~ represents the consumers' commodity space and the value of A(') at t describes the initial commodity bundle of the tth consumer. Finally, for each t, U(t, .) is the utility indicator of consumer t. To study resource allocation in E we must first formulate the properties of U(·) symbolically: The monotonicity of U(t, .) is expressed by the '" transform of

1/11 (t,

U)

=df

(Vx E R~ )(Vy E R~ )[[[x ~ y] /\ '" [x

::) [U(t, x)




[UU, y) ~ UU, XU))]]]]].

By T 14.1 this assertion is valid in dR' By LP its'" transform, which is a symbolic rendition of T 21.23, is valid in d'R' A ".-Pareto-optimal allocation is a ". allocation X( . ) for which there exists no other'" allocation Y(·) such that, for all t E T, UU, XU)) ~ UU, YU)) with strict inequality for some t. If we let k 1 be as in the proof of T 21.23 and define

the set of "'Pareto-optimal allocations, Q, is given by Q

=

{X E A: '" (3Y E A)[(Vt /\ (3t

E

T) [UU, XU))


G z ::::> •••• Hence, by SP (section 21.1.1), there is an internal F(.) E nmeN Gm. For awE n - UmeN ((Am - D m) U (Dm - Am))' F(w) E n {.~: f(w) E~} and (hence) f(w) = T(w). From this and pv(n - UmeN ((Am - D m) U (Dm - Am))) = 1 it follows that f(w) = °F(w) a.e. (Pv measure). Hence F(·) is a lifting of f(·). Next we shall establish a theorem that relates d -measurable functions on an internal probability space (n, d, v( . )) to random variables on one of the standard versions of (n, ff'.~, Pv (· )). T 22.8 Let (0, s1, v(·)) be an internal probability space where .91 contains all internal subsets of 0 and let (0, ~"', Pv (·)) be the associated Loeb probability space. In addition, let (X, °ff.s;f' Ilv(')) be a standard version of (0, ff.s;f' Pv(')) and suppose that f(·): X -+ R. Finally, suppose that the points in 0 are near-standard; i.e., suppose that Ow is well defined for all w E O. Then f(') is a random variable on (X, 0ff,,,,, Ilv(')) if and only if there is an internal function F('): 0 -+ 'OR such that f(Ow) = °F(w) a.e. (Pv measure). To prove the theorem, we let ft (.): n ~ R be defined by ft (w) = few), WEn. Then f1 (.) is well defined and, by T 22.7, f1 (.) is a random variable

Chapter 22

516

on (0, ~d' Pv (·)) if and only if there is an internal function F(.): 0 -+ 'FR on (0, d) such that fl (w) = °F(w) for almost all WE O-Pv measure. Consequently, we can establish T 22.8 by showing that f(·) is a random variable on (X, o~d' J-lv(·)) if and only if fl (.) is a random variable on (0, ~d' Pv (· )). But that is obviously true, since by condition ii of the definition of standard versions of Loeb probability spaces, {x EX: f(x) < a}, a E R, belongs to o~d if and only if {w E 0 :f(Ow) < a} E ~d and since {w E 0 :fl(W) < a} = {w E 0 :f(Ow) < a}. For ease of reference we say that F(.) is a lifting of f(·) if f(·) and F(·) satisfy the conditions of T 22.8. The fact that we have two concepts of lifting should not cause confusion later. In reading T 22.8 it is important to note that if F( . ) is an internal function on (0, d, v(· )), there need not exist a random variable f(·) on (X, o~sf' J-lv(·)) such that F(·) is a lifting of f(·). E 22.7 Let (0, s1, v(· )), (0, .Fsf, Pv(' )),-and ([0, 1], ~ J-l(.)) be as in T 22.6 and let C E 2 be such that /l(C) > 0. Furthermore, let C = {w EO: Ow E C} and let C be as in equation 22.2. Finally, let IA ( .) denote the indicator function of A,

and let FE(·) and Fe( '), respectively, be liftings of IE( .) and Ie( '). Then Fc(· ) is a lifting of Id'), but Fe(') is not.

22.3.2 Integration in Hyperspace

In this section we establish certain basic results concerning nonstandard integration that we shall need for our study of exchange in hyperspace. Let (0, d, v(·)) be an internal probability space, let F(·): 0 -+ 'FR be an internal function; and suppose that F( . ) is d -measurable. Then F( . ) is finite if and only if there is an n E N such that, for all WE 0, - n ~ F(w) ~ n. For finite functions the following relationship holds: Let (0, s1, v(·)) be an internal probability space and let (0, .Fsf' Pv(')) be its associated Loeb space. Suppose that F('): 0---+ 'FR+ is internal and dmeasurable. If F( .) is finite

T 22.9

°

(fA F(w) dv(w)) = fA °F(w) dPv(w)

for all A

E

(22.3)

d.

I prove equation 22.3 for A = 0 and leave the concluding details for the reader. The theorem and the arguments used are due to Peter Loeb (Loeb 1975, pp. 117-118). Let B = {r E R : Pv ( {w EO: °F(w) = r}) > o} and observe that B is either finite or countably infinite. Next fix (5 E R++ and let Yi' (= 0, ... , m, be such that Yi E R - B, 0 = Yo < Yl < ... < Ym' SUPwen °F(w) < Ym and Yi - Yi-l < (5/3 for 1 ~ i ~ m.

Probability and Exchange in Hyperspace

Also let S.V

s.P

=

517

L~=lYi-lV(F-l("'[Yi-l'Yi)))'

s:, = L~=lYiV(F-l("'[Yi-l'Yi)))'

1 = L~=l Yi-l Pv(OF- ([Yi-l,YJ)), and Sp = L~l YiPv(OF-1([Yi-l,yJ)). Then ~ F(w) dv(w) ~ and ~ °F(w) dPv(w) ~ Sp. Moreover,

S.V Sn S.v < (j/3 and Sp °F- 1 (( Yi-l' yJ)

C

s:,

s.P
b}) =

Pv({w EO: I °F(w) - °Fn(w) I > b})

= Pv( {w EO: ° I F(w) - Fn(w) I > b}) ~ Pv( {w EO: I F(w) ~ v({w EO: / F(w) -

Fn(w) I > b}) Fn(w)/

> b}).

From points 1 and 2, we deduce both that f(·) is integrable and that

L

lim

L

= lim

O(L

f(m)dPo(m) =

!.(m)dPo(m)

F.(m)dv(m)) =

O(L

F(m)dV(m»)

(22.4)

for A = o. The same arguments applied to f(W)IA(W) show that equation 22.4 is valid for any A E d.

Probability and Exchange in Hyperspace

519

To prove the converse, we suppose that f(·) is integrable and let fn(w) = n or - n according as f(w) > n, f(w) < - nand fn(w) = f(w) if If(w) I ~ n, n E N. Then the fn(·) are integrable random variables on (0, $'.9/, Pv (·)) such that lim I f(w) - fn(w)ldPv(w) = O. Next, for each n E N, let Fn(·): ~ ""R be an integrable lifting of fn(·) and observe that O(Sn IFn(w) Fm(w)1 dv(w)) = Ifn(w) - fm(w) I dPv(w) and that Ifn(w) - fm(w)1 dPv(w) tends to zero as n, m tend to infinity. By T 21.3, the internal sequence, Fn (·), n E N, can be extended to an internal sequence, Fn (·), n E 'fN. For some 11 E 'fN - N, F,,(·) is an d-measurable function on (0, d, v(·)) and lim O(Sn IFn(w) - F,,(w) I dv(w)) = o. Thus F,,(·) is S-integrable and, by the first half of our proof, OF" ( .) is an integrable random variable on (0, $'.9/, Pv(· )).

Sn

°

Sn

Sn

From this and the inequalities

f.,

I[(w) - °F,(w) I dPo(w) V(t, y(t))} belongs to 2.

To prove the theorem we note that, by T 22.12, there exist S-integrable random variables on (T, sI, v(· )), denoted Fx (') and Fy ('), such that a.e.-Pv measure, °Fx(t) = x(°f) and °Fy(f) = y(0f) and such that x(t) d/1(t) = O(I T Fx(t) dv(f)) and y(t) d/1(t) = O(I T F/f) dv(f)). Let A = {f En: V(f, x (f)) > V(f, y(f))}, A = {f E T: Of E A}, and A = {f E T: °U(f, Fx(f)) > °U(t, Fy(t))}. Then it is easy to verify that Pv ( (A - A) u (A - A)) = 0. Also, by T 22.6, A E 2 if and only if A E ~d' From these two observations and the completeness of Pv (') it follows that, to show that A E 2 it suffices to show that A E $'.~. The latter relation follows from the properties of Fx ('), Fy ('),

In

In

Chapter 22

522

and U(·), which imply that, for all n E N, {t E T: U(t, Fx(t)) > U(t, Fy(t)) n- 1 } E d, and from

A=

U {t E T: U(t, Fx(t)) >

U(t, Fy(t))

+

+ n- 1 }.

neN

Theorems T 22.13 and T 22.14 show that E satisfies the conditions of Aumann's economy in Aumann 1966 (pp. 2-17). From this and Aumann's Main Theorem (Aumann 1966, p. 4) we deduce the validity of theorem T 22.15: T 22.15 Let E = (V('), n X R~, w(·)) be as described in equations 22.4-22.8. Then there exists a competitive equilibrium (p, x( . ), K) in E.

Next, let (p, x(·), K) be a competitive equilibrium in Eand let Fx (·): T-+ be an S-integrable lifting of x(·). We shall show that (p, Fx (·)) is an 5 competitive equilibrium in E. To do that we show first that Fx (·) is an 5 allocation. It follows from px(Ot) ~ pw(Ot), x(Ot) = °Fx(t), and w(Ot) = °A(t) that *R~

(22.9)

and from

t

Sn x(t) df.i(t) = Sn w(t) df.i(t), T 22.11, and equation 22.8 that

Fx(l)dv(l) '"

t

A(I)dv(l).

(22.10)

Also from equations 22.7, 22.9, and p E R~+, it follows that on the set of t E T, where px(Ot) ~ pw(°t) and x(Ot) = °Fx(t), Fx (·) is uniformly bounded. Since this set has Pv measure 1 and since Fx (·) is determined up to a set of Pv measure 0, we can without loss in generality assume that there is an a E R++ such that f E T.

(22.11)

Finally, from equation 22.11 and the fact that Fx (·) is a lifting of x( .) that satisfies equation 22.10, it follows that Fx (·) is an 5 allocation in E. Next we show that on a T set of Pv measure 1, Fx(t) is maximal with respect to U(t, .) in the set {y E *R~: py :s pA(t)}. Let A E ff denote the set of t E [0,1] at which px(t) ~ pw(t) and V(t, x(t)) ~ V(t, y) for all y E R~ such that py ~ pw(t), and let A = {t E T - .AI : °t E A, x(Ot) = °Fx(t), w(Ot) = °A(t)}. Then A E ~'?/ and pv(A) = 1. For all tEA, py ~ pw(Ot) implies that °U(t, y) ~ °U(t, Fx(t)) and hence that U(t, y) :s U(t, Fx(t)). Consequently, since y E *R~ and py ~ pA(t) implies pay ~ pOA(t), we can use T 21.18 deduce for all tEA, y

E *R~

and py

:s pA(t) implies U(t, y) ;S U(t, Fx(t)).

(22.12)

Probability and Exchange in Hyperspace

523

From equations 22.12 and 22.9 it follows that in A Fx(t) is maximal with respect to U(t, .) in the set {y E "'R~ : py ~ pA(t)}. Finally, using standard arguments, we can find an internal set A E d such that A c A and v(A) ~ 1. From this, and from equations 22.11 and 22.10, we deduce that SA Fx(t) dv(t) ~ SAA(t) dv(t) and conclude the proof of the following theorem: T 22.16 Let E = (U('), T x "R~, A(')) be as described in assumptions i-v above. Then there is an 5 competitive equilibrium (p, Fx (')) in E.

From T 22.16 and T 21.30 it follows that the 5 core of E is nonempty. Also the 5 core of E and the set of 5 competitive-equilibrium allocations in E are identical. Since E differs a bit from Es ' several remarks concerning E and the import of T 22.16 are in order. First the condition imposed on U(·) in condition v: The existence of a function V(·) that satisfies equation 22.6 ensures that the preferences of the individuals in Es are not too different. To see why, just observe that for a U(·) which is S-continuous on all of (T x "'R~) we can define V(·) by equation 22.6 and demonstrate that the resulting V(·) is well defined and continuous on ([0, 1] x R~). Next condition v and the existence of competitive equilibria: Neither condition v nor conditions ivb-ivd (section 21.4.1) are necessary for demonstrating the existence of an 5 competitive equilibrium in a hyperfinite exchange economy. Using nonstandard arguments, Donald Brown has established the existence of an 5 competitive equilibrium in an economy populated by consumers whose endowments and utility functions need not satisfy condition v and whose utility functions are ",-continuous but need not be S-continuous and quasi-concave. For details concerning Brown's economy I refer the reader to Brown 1976 (p. 540). Here it is worth noting that in establishing T 22.16 we did not really use assumptions ivb-ivd. To wit: Aumann does not assume that his consumers' preferences are quasi-concave. Hence both T 22.15 and our proof of T 22.16 are valid without our insisting on the quasi-concavity of U(t, '). Also if U(·) satisfies conditions iii, iva, and v, it is easy to verify that, for all t E (T - f), V(t,') is continuous and increasing and U(t,') is S-continuous. Hence our proof of T 22.16 is valid without our insisting on the 5 continuity of U(t, .) in condition ivd. 4 We might also note that if U(·): (T x "'R~) -+ ("'R+ n Ns("'R)) is internal and satisfies condition v, then U(·) is said to be a uniform lifting of V(·). It is, therefore, interesting that if W(·): ([0, 1] x R~) -+ R+ and W(', x) is Lebesgue-measurable for each and every x E R~, then for almost all t E

Chapter 22

524

[0,1] (/1 measure) WU, .) is continuous on R~ if and only if W(·) has a

uniform lifting. For a proof of this fact I refer the reader to Albeverio et al. 1986 (pp. 136-137). 5 My reasons for imposing strong conditions on E are methodological. I insisted on conditions i-iv for the sake of uniformity with the conditions imposed on E and Es in chapters 14 and 21. I imposed condition v so that we could relate our economy to Robert Aumann's economy and use T 21.28, T 21.30, and T 22.16 to demonstrate that Aumann's fundamental result concerning the equivalence of core allocations and competitive equilibria is a topological artifact and not a characteristic of large economies. The topological aspect of the equivalence of core allocations and competitive equilibria that we have established is a general feature of most measure-theoretic characterizations of exchange economies. 6 Our result, therefore, does not detract from the unquestioned importance to mathematical economics of Aumann's two seminal papers on exchange economies with a measure space of agents. Instead, our result provides evidence for the fact that when interpreting an economic theory, it is not sufficient to assign names to undefined terms and to check the mutual consistency of the axioms. The originator of an economic theory owes his readers a description of at least one situation in which the empirical relevance of the theory can be tested. 22.5 A Hyperfinite Construction of the Brownian Motion

So much for exchange in hyperspace. Next we use the idea of a Loeb probability space to construct an interesting version of one of the most well-known random processes, Brownian motion. The construction delineated is due to R. A. Anderson and most of the arguments used are taken from Anderson 1976 (pp. 26-33). A Brownian motion is a function, P(·): [0,1] X n -+ R, on a standard probability space (n, $', P( . )) that satisfies the following conditions: (i) For each t E [0, 1], pet, .) is a random variable on (n, fF, P(· )). (ii) For each pair,s, t E [0, 1], such that 5 < t, the random variable, PU, .) - pes, '), is normally distributed with mean zero and variance t - s. (iii) For any n-tuple, (Sl,t 1 ), ••. , (sn,tn), in [0, 1] such that 51 < t 1 ~ 52 < t z ~ ... < Sn < tn' the random variables, PU 1 , .) - P(Sl' '),"', PUn' .) - P(Sn' '), are independently distributed. To construct such a process, I must introduce two new notions of stochastic independence and establish a hyperfinite central-limit theorem.

Probability and Exchange in Hyperspace

525

22.5.1 Independent Random Variables in Hyperspace

Let I c 'ioN be an index set that contains N as a proper subset and let Xi' i E I, be a family of ,9I-measurable fundions on an internal probability space (a, sl, v(· )). This family is *-independent if and only if every internal subcolledion {Xt l ' . . . , Xtm }, m E 'ioN, and every internal m-tuple, (a I ' ... ,

am)

E

*Rm, satisfies m

v({w E

a :xt1(w)
and W k = w~, k = I, ... , 1](t - ,1.t), imply B(t, w) = B(t, w'). (iii) If t < I, the conditional expectation of B(t + ,1.t, .) given the observed values of B(O,·), ... , B(t, '), E(B(t + ,1.t, . )IB(O, w), ... , B(t, w)) = B(t, w) for almost all W E O-v measure. (iv) If t < 1, E( (B(t + ,1.t, .) - B(t, . ))21 B(O, W), ... , B(t, w)) = ,1.t, and

°

maXw,t

IB(t

+ ,1.t, w)

-

B(t, w)1 ~ ,1.t 1/2 .

Since B(t + ,1.t, w) - B(t, w) = 1]-1/2W "t, these conditions are easily verified. I leave the proof to the reader. A function x( . ): T X 0 ~ R'f that satisfies the appropriate analogues of conditions i-iii above is a hypermartingale on (O,.scI, v(· )).7 Such a function is S-continuous if and only if there is an A E d such that v(A) = I and such that, for all triples (w, t,s) E A X T 2 , x(t, w) is near-standard and 5 ~ t implies x(s, w) ~ x(t, w). From these definitions, conditions i-iv above, equation 22.14, and the universal theorem of Keisler (next) it follows that B( . ) is an S-continuous hypermartingale. UT 21 Let (0, d, v(' » be an internal probability space and let T = {O, 11'1, ... , I}, where '1 = y! for some y E"N - N. Also let x('): T x 0 -+ IfR be a hypermartingale such that

Chapter 22

528

(i) x(O, w) is finite a.e.-v measure;

(ii) maxro,t IxU

+ ~t, w) -

(iii) there is a K E((xU

+ ~t,')

E R++

x(t, w)1 ~ 0; and

such that, for all

°:: :; t
(~(u), t&") Iq>(~(v), t&")) if this term is well-defined, and

p!f it'f(a)

=

it'f(P(ulv))

= {

(v) If a and (a

/3

o otherwise.

are variable-free individual terms, than we interpret (a)-1,

+ /3), (a' /3), and (a - /3) in the usual way, e.g., by letting it'f(a +

/3) =

it'f(a)

+

it'f(/3)·

It can be shown that if a is a variable-free individual term of EL(it'), then either a is a name or there exist variable-free individual terms, /3 and y, and variable-free propositional terms, u and v, such that a is either P(ul v), (/3)-1, (/3 + y), (/3' y), or (/3 - y). From this it follows that conditions i-v above determine the interpretation it'f(a) of all variable-free individual terms. Next we interpret the closed atomic formulas of EL(it'). If u and v are variable-free individual terms, we interpret the wffs [u = v] and (u ~ v) in the usual way. If u is a variable-free propositional term, we let it'(u) = t&"(~(u)), and we let it'(TCL(u)) = t or f according as ~(u) satisfies TCL!f ( .) or not. Similarly, it' (S(u)) = t or f according as ~(u) satisfies S!f(·) or not. Finally, it'(CL(u)) and it'(r(u)), respectively, equal t or f according as ~(u) satisfies CL !f (.) and r!f('). Once the variable-free terms and closed atomic formulas of EL(it') have been interpreted, the remaining terms and formulas of EL(it') and EL can be interpreted in the standard way. I need not repeat those details here. 24.3.6 Salient Properties of the Interpretation of EL

The interpretation of EL that we have described is unambiguous in the following sense: MTEL 1 Let !£Lr be a structure for EL; let rJ. and a, respectively, be a variablefree individual term and a variable-free propositional term in EL(!£); and let (J [[r(w) /\ [[8(u) /\ 8(a)] /\ 8(C(u, a)]] ::::> [P(wla) ~ P(wlu, a)]]].

When P(Nula) > 0, the inequality in ELT 7 becomes a strict inequality. The import of ELT 7 can be intuited from the next example: E 24.8 Let A(x) assert that either x lacks at least one of the characteristics C1, Cz, and C3 or x has the characteristics C4 and Cs ; and let w be (Vx)A. Also, suppose that we have observed y and found that y has the charaderistics C1 , ... , Cs . Finally, let u be A x ( y) and think of a as expressing our knowldge before we observed y. Then, if P(Nula) > 0, ELT 7 insists that our observing u must

increase the probability we assign to w. As long as we insist on a formal justification of inductive inference in science, ELT 7 is as far as we can go with EL. However, if we are willing to argue informally, we can generalize upon ELT 7 and E 24.8 to establish interesting sufficient conditions for the validity of inductive inference in scientific research. This we do in the next example. Suppose that L r contains infinitely many constants, be a wff in Lr with just one free variable x.

E 24.9

°

1,

0z, ... , and let A

In addition, let A be (Vx)A and suppose that 8( "" A). Finally, let 01'=1 Ax(OJ be short for [A X((}l) /\ [ . .. [A x(On-1) /\ Ax(On)] .. . ]], and suppose that 8(A x(OJ), i = 1, ... , n, and that 8(01'=1 Ax(Oi)). Then, for any v such that TCL(v), lim n--+oo

p("" n

AX(Oi)IA, v)

1=1

=

1

implies that

Easy informal arguments based on ELA 13, ELT 7, and ELT 8 (given below) suffice to establish this assertion. Hence I leave its proof to the reader and only note that the relevance of v will be explicated in ELT 8. In reading E 24.9, the reader should observe that lim n-+ oo P(O?=l A x (8JI '" A, v) = constitutes ]. M. Keynes's sufficient condition for the validity of his inductive rules of inference (see Keynes 1921, pp. 236-237). Our condition reduces to Keynes's condition when P( ·Iv) is additive. Now, if the scientific method is such that a false hypothesis eventually will be found out, then ELT 7 and E 24.9 provide the sought-for justification of inductive inference in scientific research. With that remark, our discussion of good inductive rules of inference has come to a happy ending.

°

The Private Epistemological Universe

597

24.3.7.3 The Existence of P( .,. )

Our interpretation of logical probabilities in section 24.2 demonstrated the meaningfulness of the idea of additive and superadditive probability measures on first-order languages. In this subsection we shall see that the idea of a family of conditional probability measures that satisfies ELA 11ELA 13 is equally meaningful. This we do by establishing an interesting analogue of A. Renyi's fundamental theorem on conditional probability spaces, T 18.9. Let v, vo, and w be propositional terms. Then

ELT 8

(i) [[l(w) /\ [TCL(v) /\ TCL(vo)]] :::> [P(wlv)

= P(wlvo)]];

Oi) [[TCL(v o) /\ [l(w) /\ 8(v)]]




[[P(Nvlvo)

1] /\ [P(wlv)

(1 - P(Nvlvo))-l (P(D(w, Nv)lvo) - P(Nvlvo))].

ELT 8 is an analogue of T 18.9 since it demonstrates the existence of a uniquely defined superadditive probability measure Q (.) on C" P(·I vo), such that, for all propositional terms wand v, /I

[[r(w) /\ 2(v)] ::::> [[Q(Nv)

=

(1 -


[P(Nvlvo)

=

0]].

Moreover, the easily established fact that [[r(w) /\ TCL(v)] ::::> TCL(E(w, D(w, Nv)))]

and ELT 3 (ii) imply that [[r(w) /\ [TCL(v) /\ TCL(vo)]] ::::> [P(wlvo) = P(D(w, Nv)lvo )]]'

From these two observations, MLT 3 (ii), ELA 13 (ii), and the transitivity of = follows the validity of ELT 8 (i). The validity of ELT 8 (ii) is an immediate consequence of MLT 3 (ii) and ELA 13 (i) and (ii) and needs no further comment. When interpreting ELA 8, it is interesting to make the following observations: Suppose that .PLr determines a model of EL. Then, for all ~(v) E TCL 5f', q>(~(v), C) =

{H EX; CRH}.

Consequently, for all ~(vo), ~(v) E TCL 5f' and ~(w) E

r5f',

598

Chapter 24

!l'f(P(wlv))

=

pff'(cp(!l't(W), C)lcp(!l't(v), C))

=

pff'(cp(!l't(W), C)lcp(!l't(vo ), C))

= !l'f(P(wlvo )) in accordance with ELT 8 (i). Moreover, if we define Qff' (.): Qff'(!l't(w))

=

pff'(cp(!l't(w), C) Icp(!l't(vo ), C))

for some !l't(vo ) E TCL ff', then for all !l't(w) Qff'(!l't(Nv)) < 1 and !l'f(P(wlv))

=

rff' ~ [0, 1] by

(1 -

E

rft'

Qff(!l't(Nv)))-l(Qff'(!l't(D(w,Nv))) -

and !l'(v)

E 8ft',

Qff'(!l't(Nv)))

in accordance with ELT 8 (ii). When Qff' ( .) is additive, the last equation reduces to

24.3.7 Theorems concerning Kn(') and Bl(') So much for P( ·IK). Next we shall establish some of the salient properties of Kn(·). We begin by observing that if u and v are propositional terms such that TCL(u) and 8(v), then Kn(u, v). Hence if u is a tautological consequence of ELA 1 and M I, then Kn(u, v). To wit, ELT 9. ELT 9

Let u and v be propositional terms. Then

[[TeL (u) /\ 3(v)] => Kn(u, v)].

From ELA 9 (ii), MLT 4, ELA 7, and ELT 2 (i), it follows that [[TCL(u) /\ 8(v)] ~ [P(ulv)

=

1]].

Moreover, from ELA 9 (i) and ELA 8 we deduce that [[TCL(u) /\ 8(v)] ~ u].

But if this is so, we can use ELA 14 (i), ELA 15 (i), and TM 5.15 to establish the theorem. It ought to be the case that if u is a propositional term, then we either know that u or we do not know that u. Moreover, we shall know that u only if we do not know that not u. Finally, if we know that u, we shall know that we know that u. In the next three theorems we demonstrate the validity of these suppositions. The predicate-calculus analogue of T 4.1 implies ELT 10. ELT 10

Let u and v be propositional terms. Then

[[r(u) /\ 3(v)] => ['" Kn(u, v) v Kn(u, v)]].

The Private Epistemological Universe

599

In addition, since [[f(u) /\ 8(v)] ::> [Kn(Nu, v) ::> [u ::> '" Kn(Nu, v)]], we can appeal to

::> '" u]]

and hence [[f(u) /\

8(v)]

[[[u /\ BI(u, v)]

::>

::> '" Kn (Nu,

u]

::>

[[u

::> '" Kn(Nu,

v)]

[[u /\ BI(u, v)]

::>

v)]]],

TM 4.1, ELA I, and TM 5.15 to establish the validity of ELT 11. ELT 11 [[nu)

1\

Let u and v be propositional terms. Then 8(v)l :::> [Kn(u, v) :::> '" Kn(Nu, v)

n.

Finally, since [[f(u) /\ 8(v)]

::>

[Kn(Kn(u, v), v)

::>

Kn(u, v)]],

ELT 12 follows from ELT 10 and ELA 15 (i) and (ii). ELT 12 [[nu)

1\

Let u and v be propositional terms. Then 8(v)] :::> [Kn(u, v)

==

Kn(Kn(u, v), v)]].

Both BI(·) and Kn(·) have interesting distributive properties that we record next. The first property is one that BI(·) and Kn(·) share with D. ELT 13

Let u, v, and w be propositional terms. Then

[[[nu)

r(v)] 1\ 8(w)] :::> [[BI(l(u, v), w) :::> [Bl(u, w) :::> BI(v, w)]]

1\

1\

[Kn(l(u, v), w)

:::>

[Kn(u, w)

:::>

Kn(v, w)]]]].

To prove this theorem we write H for [[f(u) /\ f(v)] /\ 8(w)] and observe that [H::> [[BI(u, w) /\ Bl(I(u, v), w)]

From this, the fact that [H::> [((P(ulw)

::>

[[P(ulw)

=

1] /\ [P(I(u; v)lw)

~ [[u /\ [u ::> v]] ::> v],

+ P(I(u, v)lw))

M I, and

- P(C(u, I(u, v))lw) ~ 1])

we deduce, first, that [H::> [[BI(u, w) /\ BI(I(u, v), w)]

::>

[P(C(u, I(u, v))lw)

::>

[P(vlw) = 1]]].

=

1]]]

and, then, that [H::> [[BI(u, w) /\ Bl(I(u, v), w)]

Similar arguments suffice to show that [H::> [[Kn(u, w) /\ Kn(I(u, v), w)]

::>

[v /\ [P(vlw)

=

1]]]].

=

1]]]].

600

Chapter 24

But if this is so, we can use TM 5.3, TM 5.15, ELA 14 (i), and ELA 15 (i) to conclude the proof of the theorem. I leave the writing down of those details to the reader. Next we observe that Bl(·) and Kn(·) distribute over conjunctions as in ELT 14. ELT 14 [[1(u) 1\

1\

Let u, v, and w be propositional terms. Then [1(v)

1\

8(w)]]

[Kn(C(u, v), w)

::J

[[BI(C(u, v), w)

== [Kn(u, w)

1\

==

[B1(u, w)

1\

Bl(v, w)]]

Kn(v, w)]]]].

That is, if u and v are propositional terms such that r(u) and r(v), then we know (believe) that both u and v is the case if and only if we both know (believe) that u and know (believe) that v. The proof is easy. Since ~ [[u 1\ v] :::::> u] and ~ [[u 1\ v] :::::> v], we can appeal to ELA 14 (i), ELT 4 (iii), and ELA 15 (i) to establish, first, [[r(u)

1\ [r(v) 1\ 3(w)]] =:> [81(C(u, v), w) =:> [81(u, w) 1\ Bl(v, w)]]]

and, then, [[r(u)

1\ [r(v) 1\ 3(w)]] =:> [Kn(C(u, v), w) =:> [Kn(u, w) 1\ Kn(v, w)]]].

Next we use ELA 11 (i) and (iii), ELA 14 (i), and ELA 15 (i) to infer, first, [[r(u)

1\ [r(v) 1\ 3(w)]] =:>

[[Bl(u, w)

1\ 81(v, w)] =:>

Bl(C(u, v), w)]]

and, then, [[r(u)

1\ [r(v) 1\ 3(w)]] =:> [[Kn(u, w) 1\ Kn(v, w)] =:> Kn(C(u, v), w)]].

That concludes the proof of ELT 14. Finally we observe that Bl(·) and Kn(·) do not distribute over disjunctions in the way they distribute over conjunctions. Instead, we have ELT 15: ELT 15 [[F(u) 1\

1\

If u, v, and ware propositional terms, then [1(v)

1\

8(w)]]

[Kn(D(u, v), w)

::J

::J ['"

[[B1(D(u, v), w)

::J ['"

B1(Nu, w) v 81(v, w)]]

Kn(Nu, w) v Kn(v, w)]]]].

That is, either we do not know (believe) that u or v is the case or we do not know (believe) that not u or know (believe) that v. This theorem is a simple corollary of ELT 13, and I leave the proof to the reader. Theorems ELT l-ELT 15 concerned the properties of P(·), 81(·), and Kn(·). The next two theorems concern the relationship between Bl(·) and Kn(·). Of these, the first is a consequence of

The Private Epistemological Universe

[[r(u)

1\

601

8(v)] :::> [Kn(BI(u, v), v) :::> Bl(u, v)]],

the predicate-calculus analogue of T 4.1, and ELA 14 (ii). ELT 16

Let u and v be propositional terms. Then

[[r(u) /\ 8(v)]

:::>

[Bl(u, v)

==

Kn(Bl(u, v), v)]].

The second theorem, ELT 17, formalizes a remark of Jaakko Hintikka (see Hintikka 1962, p. 83). Let u and v be propositional terms. Then

ELT 17

[[r(u) /\ 8(v)]

:::> "" Kn(C(u,

NBl(u, v)), v)].

That is, we cannot know both that u and that we do not believe that u is the case. This is obviously true since, by ELT 14, [[r(u)

1\

8(v)] :::> [Kn(C(u, NBI(u, v)), v)

== [Kn(u, v)

1\

Kn(NBl(u, v), v)]]]

since, by ELA 15 (i), [[r(u)

1\

8(v)] :::> [[Kn(u, v) 1\ Kn(NBI(u, v), v)] :::> [Bl(u, v) 1\ NBl(u, v)]]]

and since that [[r(u)

1\

t- '" [Bl(u, v)

1\

NBI(u, v)] and MRI2, MLT 3, and M 3 (i) imply

8(v)] :::> [[Kn(u, v) 1\ Kn(NBI(u, v), v)] :::> '" [Bl(u, v) 1\ NBl(u, v)]]].

t-

Then from [[u :::> v] :::> [[u :::> '" v] follows the validity of the theorem.

:::> '" u]]

and repeated use of ERll

24.3.7.5 Substitution in Referentially Opaque Contexts

According to the principle of the indiscernibilUy of identicals, given a true statement of identity, one of its two terms may be substituted for the other in any true statement and the result will be true. This principle has many exceptions. Here are three of them. The following statement is a fact: Oslo = the captial of Norway. However "the capital of Norway" cannot be substituted for "Oslo" in this statement: "Oslo" contains four letters. Similarly, we know the following for a fact: Hesper = Lucifer.

602

Chapter 24

However, "Lucifer" cannot be substituted for "Hesper" in Sappho's song: "Oh Hesper! Thou art, I think, an evening star, of all stars the fairest." Finally, the following can be stated as a fact: Frank believes that Oslo lies in Sweden. However, it is false that Frank believes that the capital of Norway lies in Sweden. The failure of the substitutivity of equality in these cases happens because the occurrence of Oslo in the first is not referential and the occurrences of Hesper and Oslo in the second and third examples are not purely referential. W. v. O. Quine characterizes contexts such as "is unaware that ... ," "believes that ... ," and knows that ... " as referentially opaque. We shall next study the substitutivity of material equivalence in the referentially opaque context of Bl( 0) and Kn( 0). ELT 18

Let u, v, and w be propositional terms. Then

[[r(u) 1\ [r(v) 1\ 8(w)J] ::::> [TCL(E(u, v)) ::::> [[81(u, w)

==

81 (v, w)]

1\

[Kn(u, w)

==

Kn(v, w)lll].

The proof goes as follows: Let u, v, and w be propositional terms such that r(u), r(v), and 8(w). Then ELA 7 and MLT 4 imply that [TCL(E(u, v))

:::J

TCL(I(w, E(u, v)))].

From this, ELT 2 (iii), and ELA 14 (i), we deduce, first, that [[r(u)

1\ [r(v) 1\ 8(w)]] :::J [TCL(E(u, v)) :::J

[Bl(u, w)

and then, by appeal to M 1 and [TCL(E(u, v)) [[r(u)

:::J

[r(v) 1\ 8(w)]]

:::J

[TCL(E(u, v))

:::J

:::J

== 81 (v, w)]]]

E(u, v)],

[Kn(u, w)

==

that Kn(v, w)]]].

Thus in the referentially opaque contexts of Bl( 0) and Kn( 0), the substitutive property of material equivalence is valid for propositional terms u and v that satisfy r(o) and (I) TCL(E(o,o)). The importance of TCL(E( 0,0)) in ELT 18 must not be overlooked. One interesting illustration of why is the following: In spite of ELA 15 (i) we cannot demonstrate that [r(u) 1\ 8(v)] materially implies that [Bl(C(u, Bl(u, v)), v)

== 81(Kn(u, v), v)].

This may sound strange, but it is in accord with our semantic analysis as well. To wit: Let 2 be the structure of EL described in section 23.3.5 and suppose that the interpretation of EL determined by 2 is a model of

The Private Epistemological Universe

603

ELA l-ELA 15. Then ~ maps [u /\ BI(u, v)] and Kn(u, v) into two wffs in I~I, [~(u) /\ BI.P(~(u),~(v))], and Kn.P(~(u),~(v)). These wffs need not be identical, but they must have the same tmth value in ~. If they are not identical, their truth values may differ in any other member of fll'; and that is the semantic reason why we cannot use ELA 15 (i) to justify substituting Kn(u, v) for C(u, Bl(u, v)) in Bl(', v). To see how the substitutivity of equality fares in the referentially opaque context of Kn('), we first note that (by ELT 18) if u and v are propositional terms such that r(u) and B(v), then ~ [u

==

v] implies ~ [Kn(u, w)

== Kn(v, w)].

From this and PLA 6 and PLA 7 of chapter 5 it follows that if v E B, if C is a wff in Lr with one free variable x, and if a and b are variable-free terms in Lr , then ~ [a

=

b] implies ~ [Kn(Cx(a), v)

==

Kn(Cx(b), v)].

Consequently, if [a = b] is a theorem of T(r) we can be certain that Kn(Cx(a), v) if Kn(Cx(b), v) and conversely only if the structures in fll' are models of T(r). However, if [a = b] is not a theorem of T(r) and/or if the structures in fll' are not models of T(r), it is possible that 2(Kn(Cx(a), v)) = f even though ~(Kn(Cx(b), v)) = t. This is illustrated in E 24.10. Assume that e, m, v are constants of Lr that in ~ denote, respectively, the evening star, the morning star, and Venus. Suppose that 2 Lr is a model of EL and that 2([v = e]) = t and 2(Bl([v = e], K)) = t, where 3(K). Then 2(Kn([v = e], K)) = t. If in addition, 2([e = m]) = t, it follows from the transitivity of = that 2([v = m]) = t but not that 2(81([v = m], K)) = t. Hence 2(Kn([v = m], K)) need not equal t. However, if both 2([e = m]) = t and 2(Bl([e = m], K)) = t, then we can show that 2(Kn([v = m], K)) = t as well. E 24.10

24.3.7.6 The Epistemological Concept of Truth

In section 23.2 we insisted that a proposition A is epistemologically true if it can be known and epistemologically false if it cannot be known. In the context of EL a propositional term u can be known if and only if there is a propositional term v in B such that Kn(u, v). I shall use this observation to give the following syntactic characterization of the epistemologically true propositional terms of EL. ELT 19

Let u and v be propositional terms. Then

(i) [CL(u)

::J

[u

::J

3(u)]];

(ii) [[CL(u) /\ 3(v)] (iii) [CL(u)

::J

[u

==

::J

[Kn(u, v)

Kn(u, u)]J.

::J

3(u)]];

Chapter 24

604

The proof is easy. First we prove (i): It follows from TM 5.5 that [[ONu :::J Nu] :::J [[Nu :::J '" u] :::J [ONu :::J '" u]]].

From this, M 3 (i), M 1 (i), and repeated use of ERI 1, we deduce that [ONu :::J '" u].

Next we observe that, by TM 5.7, [[ONu :::J '" u] :::J ['" '" U :::J '" ONu]]. Consequently, by ERI 1, ['" '" U :::J '" ONu]. Since also, by TM 5.5, [[u:::J '" '" u] :::J [[ '" '" U :::J '" ONu] :::J [u :::J '" ONu]]] and, by TM 5.6, [u :::J '" '" u], we can apply ERll twice and deduce that [u :::J '" ONu]. From this, M l(i), an obvious application of TM 5.7, and repeated use of ERI 1 we conclude that [u :::J NO Nu]. But if that is the case, then by TM 5.5, [[u :::J NONu] :::J [[NONu :::J 8(u)] :::J [u :::J 8(u)]]]

and by an application of ERI 1, it follows that [[N 0 Nu :::J 8(u)] :::J [u :::J 8 (u)]].

To the last wff we apply TM 4.1 (with q equal to CL(u)) and conclude the proof of ELT 19 (i) by appealing to ELA 1 and ELT 1 and by using ERI 1 twice to deduce [CL(u) :::J [u :::J 8 (u)]].

The validity of ELT 19 (ii) is an immediate consequence of ELT 19 (i), ELA 15 (i), and ELA 1 and needs no further proof. The proof of ELT 19 (iii) goes as follows: First we observe that, by ELA 12 (i), ELA 14 (i), and TM 5.15, [8(u) :::J [0 [u :::J u] :::J BI(u, u)]].

From this, TM 5.3, T 4.1, ERI3, and ERI 1 we deduce that [8 (u) :::J BI (u, u)]. Next we observe that, by TM 5.5, [[u :::J 8(u)] :::J [[8(u) :::J Bl(u, u)] :::J [u :::J BI(u, u)]]].

From this and the preceding observation, TM 5.3, and ERI 1 we obtain [[u :::J 8(u)] :::J [u :::J BI(u, u)]].

But if that is so, then

The Private Epistemological Universe

[CL(u)

::J

[[u

::J

S(u)]

::J

[u

::J

605

BI(u, u)]]].

and we can use ELA 1, ELT 19 (i), and ERI 1 to establish [CL(u)

::J

[u

::J

BI(u, u) ll.

The last assertion and the fact that [CL(u) validity,of ELT 19 (iii).

::J

[u

::J

ull suffice to establish the

24.4 Other Concepts of Knowledge

The main purpose of constructing EL was to give a mathematical characterization of the concept of knowledge that we discussed in chapter 23. A brief review of section 24.3 suffices to see that we have succeeded. To wit, axioms ELA 14 and ELA 15, ERI3, and theorems ELT 10 and ELT 13 show that the Kn(') of EL has the four properties of the concept of knowledge that we insisted on in section 23.2. Also from MTEL 3 and ELT 9 it follows that, for any u in CL and v in S, we can assert Kn(u, v) if u is either a nomological hypothetical or an assertion whose truth we can establish by a priori reasoning alone. Finally, if u is an accidental hypothetical that belongs to CL, ELT 7 gives sufficient conditions that our belief in u on v increases with the number of observations. Our concept of knowledge has controversial features. I shall comment on some of them. Throughout my comments, Lr denotes the language for the- theory of knowledge of a person A who belongs to some given group ~§ (see section 24.1.3); lfL [ is a structure for EL that provides us with an interpretation of EL which satisfies all the axioms of EL; .or is a set of models of the nonlogical axioms of !fr ; and g denotes the private epistemological universe of A. 24.4.1 Peirce's Concept of Knowledge I believe that C. S. Peirce's ideas of knowledge and belief can be represented by a model of ELA 1-ELA 15 (i). Peirce certainly would have agreed to ELA 1-ELA 13 with equality instead of inequality in ELA ll-ELA 13. I believe that he also would have agreed to ELA 14 and ELA 15 (i) for reasons that are detailed below. To Peirce belief is a "demi-cadence which closes a musical phrase in the symphony of our intellectual life" and has just three properties: (1) it is something of which we are aware; (2) it appeases the irritation of doubt in our mind; and (3) it involves the establishment of a habit in our nature (see Peirce 1955, p. 28). That is, if a person A believes that something, say u, is

Chapter 24

606

the case, then A believes that u is the case without a trace of doubt, A is willing to act on his belief if the occasion arises, and A is aware of the fact that he believes that u is the case. From this we conclude that Peirce's idea of belief is in accord with ELA 14. To Peirce truth is what can be known, and what can be known is something which is fated to be ultimately agreed on by all who investigate (see Peirce 1955, p. 38). If that is correct, a person A cannot know that something, say u, is the case unless u is the case and A believes that u is the case. Conversely, if something, u, actually is the case and A believes that u is the case, then eventually all those who investigate will agree that A is right about u. Consequently, A knows that u is the case. From this, we conclude that Peirce's idea of knowledge is in accord with ELA 15 (i). While Peirce might even have accepted all our axioms for the theory of knowledge as they are, he would have had serious reservations with respect to our description of the private epistemological universe in section 24.1.3. There we insisted that G's assignment of truth values to the wffs of Lr be in accordance with A's potential knowledge, as reflected in the factual knowledge of the totality of individuals in '§. Peirce agrees that reality can be meaningfully conceived only in relation to the possible interpretation of it by a community of intelligible beings. However, in contradistinction to us, he insists that the reference group must be without limits and extend to the whole communion of minds to which we belong (see, for example, Peirce 1955, p. 247). 24.4.2 Hintikka's Concept of Knowledge

Jaakko Hintikka has expounded his ideas about knowledge and belief in a treatise, Knowledge and Belief (Hintikka 1962), and in numerous other places (e.g., Hintikka 1969, pp. 87-111; Hintikka 1974, pp. 212-233). I believe that Hintikka's concept of belief is analogous to mine but that his notion of knowledge differs in an interesting way from my idea of knowledge. My reasons are detailed below. I begin with belief. Let P be a proposition and let BAP assert that A believes that p. Then according to Hintikka (1962, p. 94) we have the following:

BAP is true in a possible world ~ if and only if P is true in all of A's doxastic alternatives to ~. Here a doxastic alternative to ~ for A is a possible world which is compatible with what A believes in ~. In my theory the set of alternatives to ~ is the set {H E PI: ~RH}. Moreover, for any pair of propositions u and v such that r(u) and 2(v),

The Private Epistemological Universe

~(BI(u, v))

=

607

t if and only if pIf(q>(2',(u), ~)Iq>(2',(v),~))

=

1.

When 2',(u) is P and 2',(v) satisfies TCL If, the last expression becomes a symbolic rendition of the first. From this, and from the fact that Hintikka (1962, pp. 109-110) insists that

BAP:::> BABAP, we infer that Hintikka's concept of belief accords with ELA 14. Next, knowledge: Let KAP assert that A knows that p. Then Hintikka (see (CKKll') and (CK) in Hintikka 1962, p. 43) asserts the following:

KAP is true in a possible world ~ if and only if P is true in all of A's epistemic alternatives to ~. Here an epistemic alternative to ~ for A is a possible world which is compatible with what A knows in ~. Hintikka (1962, p. 51) assumes that a doxastic alternative to ~ is an epistemic alternative to ~. I shall add the assumption that an epistemic alternative to ~ is either ~ or a doxastic alternative to ~. In my theory I do not distinguish between doxastic and epistemic alternatives to ~. They are just alternatives to ~. Moreover, for any pair of propositions u and v such that r(u) and 8(v), ~(Kn(u, v))

=

t if and only if ~(2',(u))

=

t and ~(BI(u, v))

=

t.

Since Hintikka (1962, p. 43) insists that

KAP:::> p, the second assertion becomes a symbolic rendition of the first once we equate P with 2',(u) and assume that 2',(v) satisfies TCL!E. From this and from Hintikka's acceptance (see (CKB) in Hintikka 1962, p. 50) of

KA'P

:::>

BAKAP,

it follows that Hintikka's characterization of knowledge accords with ELA 15 This demonstration of the equality of my concept of knowledge and Hintikka's depends crucially on the assumption that an epistemic alternative to ~ is either ~ or a doxastic alternative to ~. When that assumption fails, we can show that

KAP

:::>

P /\ BAP

and that

Chapter 24

KAKAP == KAP

608

1\

BAKAP

are valid in Hintikka's theory, but we cannot establish the relation

24.4.3 Chisholm's Concept of Knowledge

According to Roderick Chisholm (see Chisholm 1977, p. 110) if P is a proposition and KAP is short for A knows that p, then we have the following;

KAP if and only if (1) it is the case that P; (2) A believes that P without a trace of doubt; and (3) P is nondefectively evident for A. To explicate this notion of knowledge, we shall translate the assertion "P is nondefectively evident" into.a statement about the wffs of Lr . We begin by translating "P is self-presenting for A." Roughly speaking, in Chisholm's theory of knowledge (see Chisholm 1977, pp. 135-136) a proposition P is self-presenting for A if and only if (1) it is the case that p; and (2) the fact that P is true necessarily leads A to believe without a trace of doubt that P is the case. When u is a propositional term such that 2[(u) E CL 5f', then C(2[(u)) = t implies that C(2[(NDNu)) = t. From this and ELT 1 it follows that 2[(u) E CL 5f' and C(2[(u))

= t implies that 2[(u) E 85f'.

Since Bl(v, v) for all propositional terms v such that 8(v), we conclude that Chisholm's idea of self-presenting propositions can be translated as follows: For any propositonal term u, 2[(u) is self-presenting if and only if (1) 2[(u) E 85f' and (2) ~(w)]]])

=

t,

where ~(u) has taken the place of p. If our translation of "p is a nondefectively evident proposition for A" is right, then we can characterize Chisholm's idea of knowledge, CKn ( .), as follows: Let 8(v) assert that v is self-presenting; i.e., let [8(v)

==

[v /\ 8(v)]].

Then [[r(u) /\ 8(v) J ::::> [CKn(u, v) ::::>

[DBl(w, v)

::::>

==

[[u /\ BI(u, v)] /\ [r(w)

w]]]]].

In words, this assertion insists that if u and v are propositional terms such that r(u) and 8(v), then A knows that u on the basis of v if and only if (1) it is the case that u; (2) A believes that u on the basis v; and (3) v is a base of a propositional term w only if it is the case that w. Chisholm formulated his notion of knowledge to deal with the wounds that E. L. Gettier had inflicted on the traditional theory of knowledge. Example E 23.2 describes a problematic situation which Gettier envisaged. By letting u be D(p, w), p be "Gary owns a Ford," and w be "Peter owns a Ford," we see that, according to CKn(·), we cannot claim in E 23.2 that we know that "Gary or Peter owns a Ford." Chisholm's idea of knowledge is complicated. It is, therefore, interesting to make the following observations: Suppose that we substitute CKn( . ) for Kn (. ) in our description of EL and postulate ELA 1-ELA 14 and our characterization of CKn(·) above. Then we can demonstrate that, for any propositional terms u, v, and p such that r(u), 8(v), and r(p), the following assertions are valid: [CKn(u, v)

::::> u],

(CKn(I(u, p), v)

::::>

[CKn(u, v)

::::> CKn(p, v)]],

Chapter 24

610

==

[TCL(E(u, p)) ::,) [CKn(u, v)

CKn(p, v)]).

However, we cannot demonstrate that [TCL(u) ::,) CKn(u, v)]! Hence, without assuming that v is a nondefectively evident proposition, we cannot assert that CKn(u, v) if u is a tautological consequence of ELA 1 and M 1. To us that is a drawback of the notion of knowledge determined by CKn(· ). 24.4.4 Sundry Comments and a Look Ahead

In concluding our discussion of EL we shall briefly comment on four characteristics of EL and fI! that ought not to be overlooked. They concern the completeness of Bl(·) and Kn(·), C's assignment of truth values,missing quantifiers, and partial knowledge in EL. First, the completeness of Bl(.) and Kn (.): Many epistemologists will object to the content of ELT 9 and to some of the consequences of ELT 13. Although they might admit that it is not wrong to suppose that an individual believes in a logical truth, they will consider it odd that it be provably the case (Jones 1983, p. 52). Similarly, they will think it odd that in EL it is provably the case that everyone knows the logical consequences of what he knows since such knowledge is beyond the information storage capacity of most humans (Jones 1983, p. 68). Well, it may be odd. Still I insist that it ought to be provable that an individual must believe in logical truths and know the logical consequences of all that he knows. Otherwise he would, sooner or later, find himself in an indefensible position (in court, on the job, or at home) from which he would gladly retreat. Next, C's assignment of truth values: We have seen that both Hintikka and Chisholm object to ELA 15 (i). Andrew Jones agrees with them for reasons that are of particular interest to us. Using Hintikka's notation for knowledge and belief, Jones defines knowledge by KAP

=df

[[BAP

1\

VAP]

1\

0A VAP],

where VAp asserts that "p is true according to the information available to A" and 0A VAp insists that "it is optimal relative to A's interest in being informed, that VAP." Jones also postulates that 0A VAp::,) p,

but he refuses to accept the converse. Hence KAP ::,) [p

1\

[BAP

1\

VAP]],

but the converse need not hold. For all that we know, A might not want to

The Private Epistemological Universe

611

know that p! This is interesting because of the light it throws on our discussion of private epistemological universes in section 24.1.3. If !E determines a model of ELA 1-ELA 15 and ~ represents the private epistemological universe of A, then we insist that ~ assign truth values to the wffs in I~ I in accordance with A's potential knowledge without regard to A's wishes. Then the missing quantifiers in EL: Note that in EL the V quantifier is applied to individuals in EL and not to individuals in Lr . Hence the problem of passing from Kn((3x)C, v) to (3y)Kn(Cx (Y), v) and conversely does not arise in the context of EL; e.g., we can assert "Erik knows that there are spies," but we cannot infer from this that "there is a spy whom Erik knows." Not tackling this problem leaves us with an important lacuna in our theory. I chose not to fill it because filling it would divert attention from the main issues of epistemology that we want to study. Finally, the absence of partial knowledge in EL: Note that there is no such thing as partial knowledge in EL. According to ELT 10, either we know that A or we do not know that A. Usually we do not know that A when A is a derivative law. So we must construct a different language for talking about scientists' search for knowledge concerning the applicability of derivative laws. That we will do in the next chapter.

25

An Epistemological Language for Science

In chapters 23 and 24 I discussed the possibility of knowledge, and I proposed a schema for classifying different kinds of knowledge. I also formulated a language for explicating our idea of knowledge. In this chapter I shall develop an epistemological language for science with which we can formulate and analyze tests of scientific hypotheses. These hypotheses may be accidental or nomological laws, in which case they can be falsified or known eventually. They may also be derivative laws. If they are, we can determine the circumstances under which, and the degree of exactness with which, they can be applied, but we cannot falsify or know such laws. I shall begin by discussing two criteria, simplicity and autonomy, which scientific hypotheses ought to satisfy. Then I treat inference by analogy and describe ways in which analogy can be used to generate useful hypotheses. I also treat inductive inference and discuss the treacherous business of confronting scientific hypotheses with data. Finally, I formulate a multisorted language, discuss some of its semantic properties, and use a modified version of it to construct the sought-for epistemological language for science. In concluding the chapter, I delineate a probabilistic framework within which we can formulate statistical tests of scientific hypotheses. 25.1 Simple, Autonomous Relations

When we postulate theoretical relationships in LKPU " we would like to formulate simple relationships that are valid for as large a collection as possible of the relevant models of KPU*. In Trygve Haavelmo's terminology (Haavelmo 1944, pp. 21-39), we would like our postulated relationships to be (1) simple and (2) autonomous with respect to as large a . set of relevant KPU* models as possible.

An Epistemological Language for Science

613

Here is an example to fix our ideas: E 25.1 Let B be a vector whose components denote the various assets and liabilities available to consumers in the universe. We are looking for a finite set of factors, F1 , ••• , Fn , and a vector-valued function, g( .), such that a u.s. consumer's choice of balance sheet in 1962 can be represented by B = g(F1 , ... , Fn )

+ u,

where g( . ) is common to all consumers and the values of F1 , ••• , Fn , and u vary from one individual to the next. There are many candidates to choose from. We would like to find one (n + 2)-tuple, (g, F1 , ••• , Fn , u) such that (in the u.s. population of 1962) each component of u has mean zero, finite variance, and zero covariance with the Fi •

Some of the factors we have in mind in E 25.1 are age, education, risk aversion, initial wealth, and current-period asset prices, interest rates, and disposable income. Other factors are listed in table 25.1, which summarizes the reasons for acquiring various assets and liabilities given by a group of u.s. consumers in 1962. The requirement that theoretical relationships be simple suggests that in E 25.1 we choose as few factors as possible. The requirement that theoretical relationships be autonomous with respect to as large a set of relevant KPU'" models as possible suggests that we include in our list of sel~cted factors those most likely to have a significant effect on consumers' choice of equilibrium balance sheets. Note, therefore, that there might not have been one consumer in the u.S. population who in 1962 was influenced by all the factors in table 25.1. Note also that the factors were not equally important in determining the sample consumers' balance-sheet choices. The presentation of a scientific hypothesis in LKPU ' ought to include a description of the class of KPU'" models relative to which the hypothesis is believed to be autonomous. When applied to economic matters, this maxim requires that economic hypotheses be presented with a specification of a class of economic structures relative to which the hypotheses are autonomous. Delineating such classes is difficult and economists have worked hard to prove theorems which were required for that purpose. Cases in point are T 10.17, T 11.9, and T 12.14- T 12.18. The first theorem considers economic structures in which the extension of "consumer" is not restricted but commodity prices must vary in certain prescribed ways to ensure that consumers' demand for composite commodities can be treated as if the composites were ordinary commodities. The second theorem and the third group of theorems consider various economic structures in which prices

Table 25.1

n ::r

Assets associated with investment objectives (percentage distribution of consumer units).

~

""d

rt "'1

Investment objectives Maximum current cash return

Assets

Safe, steady return

Growth of capital through appreciation

Safety of capital

Liquidity or marketability

Minimizing income taxes

Cash and savings accounts

6

II

1

12

20

9

+ others)

52

54

54

45

55

38

24

27

12

39 5

Securities (stock

Investment in real estate

27

18

Other assets

7

II

5

8

9

No asset mentioned

6

5

15

7

5

9

L

100

100

100

100

100

100

Number of units

191

579

699

521

231

151

N V1

Source: Projector and Weiss 1966.

Q\ H ,j:::.

An Epistemological Language for Science

615

vary freely but the extension of "consumer" is restricted so that consumer demand for certain specified aggregates of commodities can be treated as if the aggregates were ordinary commodities. In the epistemological language presented below the interpretation of the modal operator 0 specifies the set of structures relative to which a given hypothesis is assumed to be autonomous. More specifically, the interpretation, (f!(, ~, R, qJ), singles out a structure ~ in which the hypothesis is valid and insists that it be valid in all structures in f!{ that are in the relation R to ~. 25.2 Analogy and the Generation of Scientific Hypotheses There are many ways to generate useful scientific hypotheses. In this section we shall discuss one of them, inference by analogy. 25.2.1 Analogy and Inductive Infere~ce

While discussing the possibility of knowledge in chapter 23 we defined inference by analogy as a process of reasoning in which objects or events that are similar in one respect are judged to resemble each other in certain other respects as well. We also suggested that induction may be thought of as an argument in which we single out two sets of positive analogies, Pi' ... , Pk and Qi' ... , Qm' for an observed group of individuals and hypothesize that the next individual we meet either will not satisfy some of the P's or will satisfy all the Q's. For our present purposes it is important to note that such an inductive argument concerns properties of individuals in one and the same universe; e.g., all the individuals x that we have observed have either not been ravens, '" R(x), or have been black, B(x); hence chances are that (\fx)[R(x) ~ B(x)). The characteristic feature of inference by analogy that we intend to stress in this section is that it is an argument by which we derive properties of individuals in one universe from the properties of individuals in some other universe. 25.2.2 Models

Arguments by analogy appear in different disguises in economic theory. We used such arguments in chapters 11 and 12 when we treated optimal expenditure strategies and equilibrium balance sheets as consumption bundles. In chapters 21 and 22 we used transfer to establish nonstandard

Chapter 25

616

analogues of well-known properties of standard economies. Linguistic limits to such analogies were discussed in sections 7.3 and 7.4. The preceding examples illustrate different aspects of the role that models and arguments by analogy play in the construction of new economic theories. This role is as important in other sciences as it is in economics. For example, in physics Huygens used the familiar view of sound as a wave phenomenon to develop his wave theory of light. Similarly, Black's experimental discoveries concerning heat and Fourier's theory of heat conduction were motivated by their conception of heat as a fluid. In each case "the model served both as a guide for setting up the fundamental assumptions of a theory, as well as a source of suggestions for extending the range of their application" (Nagel 1961, pp. 108-109). The epistemological language I present below provides a vehicle for systematic use of models and arguments by analogy in scientific research. In this language inferences by analogy from the properties of one kind of individuals to those of another are drawn in accordance with axioms that delineate the characteristics of members of an observational vocabulary and describe the relationship between the symbols of the observational and the theoretical vocabulary. These axioms determine both what kind of inferences the language allows and specify the conditions under which such inferences can be drawn. For example, some axioms may relate demand for single commodities to demand for aggregates of commodities. Others may explicate the relationship between single-vintage production functions and aggregate production functions. 25.2.3 Representative Individuals and Aggregates

We also use inference by analogy in many ways in econometrics. The idea of one of these ways dates back to A. Quetelet-the creator of the abominable l'homme moyen. Quetelet claimed that what related to the human species, considered en masse, was of the order of physical facts. "The greater the number of individuals," he said, "the more the individual will is effaced and leaves predominating the series of general facts which depend on the general causes, in accordance with which society exists and maintains itself. These are the causes we seek to ascertain, and when we shall know them, we shall determine effects for society as we determine effects by causes in the physical sciences." (Stigler 1965, p. 202) Quetelet's dictum contains a postulate and a prescription. The postulate asserts that in the human race there are positive analogies worth knowing. The prescription tells us how to look for these analogies.

An Epistemological Language for Science

617

Table 25.2 Gradations of income and relative surplus. Gradations ($)

300-500

Number of families 10

Their earnings

Their expenses

($)

($)

4,308

4,466

Average yearly surplus or debt ($) -15.8

500-700

140

86,684

86,023

4.72

700-900

181

143,281

138,747

25.05

900-1100

54

52,708

49,720

55.33

1100-1300

8

9,729

8,788

117.63

> 1300

4

6,090

5,241

212.25

397

302,800

292,987

24.72

Total

Source: Wright 1875, p. 380

E 25.2 In 1875 the Massachusetts Bureau of Labor Statistics under the supervision of C. Wright conducted a survey of incomes and expenditures of families of the state's workers. Table 25.2 gives a summary statement of the workers' incomes and savings. The relationship between savings and incomes, which we observe in the table, and Quetelefs dictum gave Wright the idea of the following law: The higher the income of a family, the greater is the amount it will save, actually and proportionately.

Today few believe in Quetelet's statistical methods, as exemplified in Wright's use of the data in table 25.2. However, econometricians believe in his dictum and use it in the way we shall use it in the applied econometric part of the book, i.e., in chapters 26-28. There we postulate the existence of a representative consumer and describe how other consumers differ from him. Then we use data that pertain to many different individuals and test whether the positive analogies for consumers that are exhibited in the behavior of the representative consumer exist. Our representative consumer is the econometric counterpart of the economic man. He is also the personification of Quetelet's l'homme moyen in our econometric axiom system. In this context it is interesting to note that the analogue of the representative consumer in our epistemological language is the ~ structure in the interpretation of SEL. The relation R and the set of structures PI determine the extent to which inference by analogy from one structure to another is permitted.

Chapter 25

618

25.2.4 Observations, Theoretical Hypotheses, and Analogy

Sometimes observations, theoretical hypotheses, and inference by analogy can combine to produce spectacular results. One example is Keynes's General Theory (Keynes 1936). Keynes used economic theory concerning the behavior of individual consumers and firms, observations, and inference by analogy to derive his macroeconomic relations, one of which was the aggregate version of Wright's Law. Another example, which we shall describe next, is Newton's derivation of his general law of gravity. In Newton's case the observational inputs were supplied by Tycho Brahe's observations on planetary motion and the following three empirical laws that Johannes Kepler derived from them. 1. Each planet moves around the sun in an ellipse, with the sun at one focus. 2. The radius vector from the sun to the planet sweeps out equal areas in equal intervals of time.

3. The orbital periods of any two planets are proportional to the 3/2 power of the lengths of the semimajor axes of their respective orbits.

The theoretical inputs were provided by Newton's three laws of motion: 1. Principle of Inertia: An object that is left alone remains still if just standing still, and continues to move with constant velocity in a straight line if originally moving. 2. A force is needed to change the velocity and the direction of motion of an object. The time rate of change of the mass times the velocity of the object is proportional to the force.

3. The action (Le., force exerted) of one object on another object equals the

reaction (Le., force exerted) of the second object on the first. Newton used his two first laws to show that the motion of the planets would satisfy Kepler's Law of Areas if and only if the forces that acted on the planets were directed exactly toward the sun. Then he showed that Kepler's first and third laws held if and only if the forces that acted on the planets varied inversely as the square of the distance of the respective planets from the sun. Finally, Newton used his third law and an argument by analogy to show that "all objects attract each other with a force directly proportional to the product of their masses and inversely proportional to the square of their separation" (Cohen 1981, pp. 123-131). Newton's argument by analogy was the following: "And since the action of centri-

An Epistemological Language for Science

619

petal force upon the attracted body, at equal distances, is proportional to the maHer in this body, it is reasonable, too, that it is also proportional to the maHer in the attracting body. For the action is mutual, and causes the bodies by a mutual endeavor [by the second law] to approach each other, and accordingly it ought to be similar to itself in both bodies" (Cohen 1981, p. 128). The assertion which resulted from this argument and Newton's third law is called Newton's General Law of Gravity and is formulated in symbols in equation 24.3. In presenting his theory, Newton distinguished between absolute space, which is immutable and immovable, and relative space, which is a movable representation of absolute space (see Newton 1968, pp. 9-18). His bodies moved about in absolute space while Tycho Brahe's planets circled the sun in relative space. It is, therefore, interesting to us that the positive analogies Newton's laws prescribe for celestial motion in absolute space do playa role in the analysis of celestial motion in relative space that is analogous to the role which our representative consumer plays in the characterization of individual behavior in a sample population. 25.3 Induction and Meaningful Sampling Schemes

Inductive rules of inference pass from assertions concerning characteristics of our observations to assertions concerning characteristics of observations to come. In section 24.3.7.2 we saw that the inductive rules of inference, which we delineated in the ELA ll-ELA 13 of EL in section 24.3.3, determine rules of inference for EL that are appropriate analogues of the Modus Ponens of our first-order predicate calculus. We also found sufficient conditions on the scientific method that these rules can be used to justify inductive inference in scientific research. The last result might have given the reader the impression that with "sufficiently many" observations we can determine the truth value of any variable hypothetical. But what do we mean by sufficiently many"? JJ

E 25.3 (log

So

n-

Let 1

n

(x) denote the number of primes less than x, and let l(x) =

dt. Gauss and other prominent mathematicians conjectured that

n (x) < l(x). This conjecture was not only plausible, but was supported by the evidence. The primes up to 10 7 and their numbers at intervals up to 10 9 were known and the inequality held for all x for which data were available. Yet Littlewood proved in 1912 that the conjecture is false: that there are infinitely many values for which the inequality must be reversed. Later Skewes demonstrated that there is a number x less than 10101034 which does not satisfy the inequality.

620

Chapter 25

Even though Skewes' number may be reduced by refining his arguments, it is unlikely that we shall ever know an instance of Littlewood's theorem. The example, therefore, shows that we cannot, in a given case, count on the number of observations alone to determine the correct generalization. 1 The moral of E 25.3 is not that ELT 7 and E 24.9 are irrelevant and that our inductive rules of inference are not good. Rather it is that, in processing evidence, we must consider both the data-generating process and the size of our sample of observations. In other words, in epistemology in general and econometrics in particular, it is not sufficient to introduce axioms concerning the characteristics of the universe and postulate properties of our inductive rules of inference. We must also hypothesize about the way our data are generated; e.g., the data may constitute a random sample or be generated by a stratified random sampling scheme. C. Broad suggested in Broad 1928 (pp. 15-18) that the necessary hypothesis concerning the data-generating processes in nature could be introduced once and for all in the form of a fundamental causal premise. I do not choose to do so because the data-generating mechanisms we face in econometrics are much more varied than those Broad envisioned. Instead I shall introduce the postulates we need as they become relevant for the applied-econometric studies I present later. In each case the assumptions will be such that we can appeal to probabilistic limit theorems and show that our rules of inference are good in the situation considered. To aid our intuition and to guide us in selecting the assumptions concerning the relevant data-generating processes that I introduce in subsequent chapters, I shall next introduce the idea of a meaningful sampling scheme. To that end, let (O,!#', P(·)) be a probability space; let Xi('): 0 -+ R, i = 1, ... ,n, be random variables on (0, !#'); and let Xi denote the range of x/'), i = 1, ... , n. Moreover, let 2 = Il?=l Xi; let 11(2) denote the smallest (J field that contains all sets of the form Il?=l Ai, with Ai C Xi and {w EO: Xi(W) E Ai} E :F, i = 1, ... , n; and let Q(') be a probability measure on (2, 11(2)). Then (2, 11(2), Q( . )) is the sample space of an experiment in which we, in accordance with Q('), obtain a sequence of n observations on the values assumed by the Xi(·). We shall say that (2,11(2), Q( . )) is a meaningful sampling scheme if and only if, Q(A) = P( {w EO: (Xl (W), ... , Xn(W) E A}),

A

E

11(2).

If the x i ( .) are independently distributed relative to P( .) and if (2,11(2), Q(')) is a meaningful sampling scheme, then relative to Q an outcome of the experiment (2,11(2)) is a random sample. It is not always easy to design an experiment in which we sample in accordance with a given probability measure. We illustrate this in E 25.4.

An Epistemological Language for Science

621

E 25.4 Fifty sharks were observed in Oslo fjord during the summer of 1976. Twenty were blue sharks. Can we infer that about 40 percent of all sharks in Oslo fjord are blue sharks? If the observations constitute a random sample, 0.4 is a reasonable estimate of the true proportion of blue sharks in Oslo fjord, and the answer is "yes." If not, 0.4 is an estimate of the mean of the sampling distribution. The mean can differ from the true proportion of blue sharks inasmuch as areas with either relatively few or relatively many blue sharks may have been oversampled. For nonrandom samples the suggested inference cannot be justified.

For the shark sample in E 25.4 to constitute a random sample, the sharks would have had to be sampled in the way we sample with replacement from an urn with blue and white balls and with as many balls as there were sharks and as many blue balls as there were blue sharks in Oslo fjord in the summer of 1976. It is unlikely that the sharks were observed in accordance with such a scheme. Analogous remarks apply to most nonexperimental observations obtained from nature; e.g., any black ravens we observe are sampled from a limited area over a time horizon that excludes all future periods and all periods in the distant past. Even in situations in which we have numerous observations and we have sampled in accordance with a meaningful sampling scheme, the probabilities that we may assign to the validity of the associated variable hypothetical might be low. Take E 25.5, for instance. E 25.5 Consider an urn with a large number, M, of balls that are identical except for color. Assume that each ball is painted with one color and that the choice of color was made at random from v different colors. We do not know the color composition of the urn, and we sample balls at random without replacement. The balls drawn are all red. If we assume that P( 'IK) is additive over the relevant propositions and drop K in the formulas for convenience, our rules of inference imply that

P{lst ball red}

=

M

L P{lst ball redls red balls}' P{s red balls}

s=o

st (~)(~)(;Y (1 - t)M-S = o

P{lst k balls red}

v-

1;

Ms!(M - k)! (M) (I)S ( l)M-S = = s-k ~ I( _ k)1 1- M.s. 5 V v

v-

k



Consequently, P(all balls redl1st n balls red)

=

v-(M-n).

The latter probability converges to 1 as n goes to M. However, for reasonable values of n, v-(M-n) will be near to 0 if M is large. 2

Chapter 25

622

The preceding examples illustrate some problematic aspects of inductive inference. There are many others. We shall later discuss those that are particular to tests of scientific hypotheses. But first an introduction to many-sorted logic. 25.4 Many-Sorted Languages

A many-sorted language is a first-order language with more than one set of individuals. In this section we shall discuss some of the properties of such languages. 25.4.1 The Symbols

Let L be a many-sorted language with k disjoint sets of individuals. The logical symbols of L are ""' :::> [

](,)V =

and k nonoverlapping infinite lists of individual variables,

The nonlogical vocabulary of L consists of a selection from the following lists of constants, function symbols, and predicate symbols: 1.

k disjoint sequences of constants,

rxd3 1 Yl ... ; rxd32Y2 ... ; ... ; rx k f3k Yk .... 2. Indexed sets of predicate symbols, {pi}iEJn ' n every j E In' pi is an n-ary predicate symbol.

=

1,2, .... For each nand

3. Indexed sets of function symbols, {po. i I ••..• in LEI n' and indexed sets of predicate symbols, {Rit ..... inLeJn. For each n = I, 2, .,. and every i E In' po. i I •.•. ,in is an n-ary function symbol whose values are to be individuals of the ioth kind and whose n arguments are to be individuals of kind im , m = I, ... , n. Similarly, for each n = I, 2, : .. and every j E In' Rit .... .in is an n-ary predicate symbol whose n arguments are to be individuals of the jmth kind, m = I, ... , n. E 25.6 In presenting nonstandard analysis we used a single-sorted language. We could have used a two-sorted language instead. Then our logical vocabulary would have contained two sets of variables, Xl Yl Zl ... and X z YzZz ... , the first for urelements and the second for sets. Also our nonlogical vocabulary would have consisted of two constants, (Xl' PI (i.e., 0 and I), two binary function symbols,

An Epistemological Language for Science

fl. 1. 1 and g 1.1.1 (i.e., (Le., e and

[P(wl C(u, a))

=

(P(C(u, w)la)/P(ula))]].

Moreover, it is obvious that the real-number theorems of EL are theorems of SEL. Finally a structure for SEL is an eight-tuple 2(L t ,p) = (2J , 2" X, R,~, qJ, Ro ' p!i'),

whose components are as follows: 1. Lt,p is the language we described in section 25.6, and ~ is a structure for Lt,p that is intended to represent the world as it is.

2. !l!J = (12J I, N.st'f' F~f, G.st'f) is a structure for a first-order language whose nonlogical vocabulary consists of the individual constant, function, and predicate symbols of SEL.

3. 2, = (12,1, N!i'" F.st'" G.st',) is a structure for a language whose nonlogical vocabulary consists of the propositional constant, function, and predicate symbols of SEL. Specifically, F!i', contains the interpretations in 2, of 0, N, I and the propositional constants. For simplicity we shall denote the interpretation of 0 by 0 rather than O!i',. Moreover, G!i'l contains the interpretations in 2, of TCL, 2, CL, and r which we denote, respectively, by TCL !i',(,), 2!i',(·), CL !i'l(') and r!i'l(.). Finally, 12,1 is a set of wffs that contains the closed wffs of Lt,p(~) and contains"" A, [A ::::> B] and oA whenever it contains A; and N!i', consists of all the names of the elements of 12, I-one name for each wff and different names for different wffs. We insist that

(i) the domain of TCL!i', contains the closures of the logical theorems of Lt,p and the closures of the logical consequences of r t and r p in Lt,p(~); (ii) the domain of CL !i',(.) consists of all the closed wffs of Lt,p(~); and

An Epistemological Language for Science

635

(iii) the domain of r YI ( .) is the smallest set of wffs that contains all the closed wffs of Lt.p(~) and contains "" A, [A :::> B] and DA whenever it contains A and B. For simplicity we also insist that the domain of r Y ' ( . ) is all of I~I. 4. PI is a set of structures for Lt,p(~); R is a reflexive, transitive relation on PI; and the expansion of ~ to a structure for Lt,p(O, which we denote by ~e' is a member of PI. We assume that PI and ~ satisfy the following conditions:

(i) The structures in PI are nonisomorphic models of T(r(1

r p , Dr,p).

(ii) ~ and the structures in PI that are in the relation R to ~e have universes that contain only a denumerable infinity of individuals of each kind. (iii) The structures in PI that are in the relation R to ~e have the same universe as ~ and name the individuals in the universe by the same names as ~. 5. Let CL .£1', denote the domain of definition of CL .£I' ('). Moreover, for each HE PI, let H('): CL .£1'1 ~ {t, f} be such that H(A) denotes the truth value of A in H. Then qJ('): CL .£1', x PI ~ f!J(PI) is defined by qJ(A, H)

=

{H'

E

PI : HRH' and H'(A)

6. Ro c PI x f!J(PI) and (H, E) {H'

E!!{:

E

= t}.

R o if and only if H

E

PI and E =

HRH'}.

7. p:t' (. I' ) : ;Y>(:i£) X f1J(X) ~ [0, 1] is a a-additive conditional probability measure on ,J/(:£) with admissible conditions f1J(.£).

8. There is a distinguished member K of S!I' that describes our knowledge before the test and is such that p5.P(qJ(K, ~e)I{H E ,£: ~eRH}) > O.

Before we can use !f to interpret SEL we must, for each H E PI, extend the domain of definition of H(') to I~I, and we must extend the domain of definition of qJ(') to I~I x PI. This we do by induction as follows: Let r i , i = 0, 2, ... , be an increasing sequence of sets of wffs that satisfy the conditions, (i) r o = CL .£1', and (ii) if A E r i for some i ~ 1, then there are wffs C and D in r i - 1 such that A is C, "" C, [C :::> D], or DC. Then our characterization of I~I implies that I~I = U~o rio Next observe that, for each H E PI, H(') is defined on r o and qJ(') is defined on r o x PI. Suppose that we have extended the definition of the H(') and qJ('), respectively, to rn - 1 and rn - 1 x PI. If A Ern' there are wffs C and D in r n - I such that A is C, "" C, [C ::::> D] or DC. If A is C, we let H(A) = H( C). If A is "" C, we let H(A) = t or f according as H( C) = f or t. If A is [C :::> D], we let H(A) = t if H(C) = f or H(D) = t and we let H(A) = f otherwise. If A is Dc' we let H(A) = t if (H, qJ(C, H)) E RD·

Chapter 25

636

Otherwise we let H(A) = f. Finally, we let 0,

+ r)]M-rx),

the consumer's (26.1)

where (M - IX)Yte = L~lrx [1/(1 + r)]iYt+i' We conclude: if H 1-H 7 hold and x, p, n, and A are interpreted as above, the consumer's demand for current consumption satisfies equation 26.1.

Besides theorems that provide bases for testing the empirical validity of an interpreted theory, there is another group of theorems: 2. There are theorems that can be used to obtain factual information concerning

properties of the theoretical constructs of the theory. One example of such a theorem is T 13.12 in the standard theory of consumer choice; this theorem establishes necessary and sufficient conditions on a consumer's demand function that his utility function be homothetic. A second example of such a theorem is T 12.7; it establishes on a consumer's demand for risky assets necessary and sufficient conditions that his absolute risk-aversion function be a decreasing function of net worth. Strictly speaking, theorems such as T 13.12 and T 12.7 mentioned above cannot be used to establish conclusively that a given theoretical construct (e.g., the consumer's utility function) has a specific structural property. They can only be used in conjunction with data to show that certain hypotheses concerning the structural properties of the theoretical constructs in question are empirically untenable. For instance, the data we use to test Arrow's theory suggest that Arrow's conjecture that a consumer's proportional risk-aversion function increases with his net worth is false. There is a third group of theorems that provide neither testable hypotheses nor indirect information about the properties of theoretical constructs of the theory.

Empirical Analysis of Economic Theories

649

3. There are theorems that guide the economist in his search for good ways to

confront his theory with data. For instance, if we use regression analysis to estimate a consumer's demand function, the values of the parameters we estimate must satisfy several linear equalities and nonlinear inequalities. Some of these are easily derived from the equality, pf(p, A) = A. Others can be deduced from T 10.16. The econometrician, besides heeding linear and nonlinear restrictions on the values of the parameters he estimates, must also worry about problems of aggregation. These occur in many disguises. For instance, while economic theories of choice usually concern choice of well-defined commodities, such as sugar and salt, and specific securities, such as shares in IBM or Ford Motor, the econometrician's data often refer to expenditures on groups of commodities and groups of securities. Some theorems are sensitive to aggregation. Others are not. The Hicks-Leontief Aggregation Theorem in the standard theory of consumer choice, T 10.17, shows that the main theorems of that theory are true of composite commodities, if the prices of the components of the composite commodities vary proportionately within the respective groups. However, Arrow's and PraH's theorems concerning the monotonic properties of the absolute and proportional risk-aversion functions and the consumer's demand for safe and risky assets are untrue if the risky asset is interpreted as an aggregate of many risky securities. Therefore, one of the main problems we face in testing Arrow's theory, is to delineate the class of utility functions that would render his consumer's demand for risky assets insensitive to such aggregation problems. Theorems T 12.14-T 12.18 provide the required characterization of utility functions. In the cases cited, the existence of the relevant aggregates is obvious. In other situations the existence of the aggregates that are needed for the empirical analysis may be hard to establish. Cases in point are meaningful index numbers in consumer theory and capital aggregates in production theory and in macroeconomic theory. Below I give an example of a theorem that establishes necessary and sufficient conditions on a firm's singlevintage production functions for the existence of a capital aggregate. Prescriptions for constructing capital aggregates-when they exist-are given in Stigum 1967 (pp. 349-367). E 26.2 Consider a firm that utilizes n vintages of a single capital good, kj , i = I, ... , n, and assume that the production opportunities available to this firm

can be represented by n single-vintage production functions of capital and labor, fj('): Ri -+ R+,

i

=

I, ... , n

Chapter 26

650

that are nondecreasing, continuous, and satisfy (i) fi(O, L;)

= fi(k i, 0) = 0 for (ki, Li) E R~, and

(ii) fi(k i, .) is a concave function for each fixed ki E R+. Then the firm's multivintage production function, H('):

H(k, L)

R~+1 --+

R+, is given by

n

=

max

L fj(k j, Lj),

Lj=1 Li~L.Li~O i=1

where k = (k 1 , ••• , kn ). Moreover, j('): R~ --+ R+ is a capital aggregate of H(') if and only if there exists a function, H·(·): R~ --+ R+, such that H·(j(k), L) = H(k, L) for

all (k, L) E

R~+1 .

We can show: Let jj('), i = 1, ... , n, be nondecreasing, continuous functions from R+ onto R+. Then there exists a capital aggregate j(') such that n

j(k)

= L jj(ki), i=1

if and only if there exists a nondecreasing, continuous function, F('): R; --+ R+, that is homogeneous of degree 1 and satisfies fi(k i, Li ) = F(ji(k;t L;), (k i, L;) E R;, i = 1, ... , n. For a proof see Stigum 1967 (p. 356).

The three groups of theorems described above, constitute the main body of an economic theory. There is a fourth group: 4. There are theorems that are relevant to the import of factors for which the

theory does not account. These are usually obtained by adding simplifying assumptions to the original set of axioms. For instance, when the standard theory of consumer choice is interpreted as in chapter 11, we can determine the effect of age on current consumption if the consumer's utility function has a simple additive structure (see T 11.8). Similarly, we can determine the import of education if we make assumptions about how education affects the consumer's income stream. For later reference, I give an example of the simplifying assumptions concerning consumer behavior that Modigliani and Brumberg proposed in Modigliani and Brumberg 1955 (pp. 394-397). Other examples can be obtained with the help of T 11.7 and T 11.9. E 26.3 Modigliani and Brumberg suggested several simplifying assumptions for the analysis of their consumer's behavior. Two of them were

r = 0 andgi(P) gip)

=

(1

+ r)i- j

I,. ] .

=

0, ... , M - ex.

If they hold, equation 26.1 becomes Ct

= (M -

ex

+ 1)-1[At_1 + Yt + (M -

cx)yn.

Empirical Analysis of Economic Theories

651

With respect to confronting theory with data, the fourth group of theorems is open-ended. The economist may prove some such theorems to determine what kind of information he can extract from real-life observations. He may prove other such theorems after he has analyzed his data; in that case his objective is to find assumptions that can be used to rationalize certain characteristics of the data. Whatever an economist's motives for establishing the theorems in the fourth group are, such theorems play an important role in the dynamics of theory formation. 26.2 The Structure of an Empirical Analysis

We have sorted our theorems into groups. Next we interpret them and put them in four boxes T1 , T2 , T3 and T4 , according to the order in which we discussed them. In that way we complete the left side of the schematic process illustrated in figure 26.1. The top half of this figure repeats the relevant part of the schema presented in figure 2.1: three boxes, one with the undefined terms, one with the axioms, and one with the theorems. On the right side of the schema, we interpret the undefined objects and axioms of the original theory and distribute them on three boxes with universal names. We then add a box of axioms concerning the interpreted objects that contains the simplifying assumptions of the theorems in T4 to complete the right side of the schema. In looking at figure 26.1, one observation is especially important: What is a theoretical construct in the context of one empirical analysis need not be a theoretical construct in another. For example, when we confront T(H 1, ... , H 6) with data, V(·) will always designate a theoretical construct. On the other hand, if we were to add to H 1, ... , H 6 an axiom A that insists that V(x) = f]?=l Xfi, then in an empirical analysis of T(H 1, ... , T 6, A), V(·) would denote a theoretical construct only if the (Xi were not estimable. The (Xi may be estimable in one situation and not in another. It is also important to observe that whatever the interpreted versions of the undefined objects may be, it is about them that the theorems in T1 , ••• , T4 talk, and it is they that we put in the boxes for"observable objects" and "theoretical constructs." With that in mind yve continue the construction of our axiomatic superstructure. First we use information stored in a "design" box to single out the objects that are to play the role of undefined terms in the theory-data confrontation. Some of these terms will denote objects in the "observable objects" and "theoretical constructs" boxes. Others will denote objects for which the "design" box insists that we have records. Next we use ideas

Chapter 26

652

bservable Objects

,

--(

, \= ,

"

:~/~

'"T~

r- ....

r--.L---,

;.~--,

I

I

1

I

I

1

-::;:~~--:-

r---'---=:,.:..--'--=r-....J..:~::"'=:...,.:;:-.-•• r T_

- -'-

~ ~- -

1.. __ r- _

...l.._ 1-J

1-_ _....I.

I

L;:

:

...I

- - -- - _

J

. . i..,

r--- - - -- --:---- -..t'1." x ~ - - - - -i - - - - -- - i .J ,..--- ---, L

L

:

I

'----r----..I

't'x' 1-----,i --f--- - --\.. .~)-- -- -- L

.-------, ~

, __ J I

L Figure 26.1

-, I

J

l

Empirical Analysis of Economic Theories

653

contained in the ~ boxes and information from the "design" box to write the axioms of the empirical analysis. Finally, we place the axioms into an "axioms" box and proceed to use them and the necessary universal theorems to derive useful theorems for the theory-data confrontation. We call such theorems sample theorems and store them in a box with this name. The "sample theorems" box puts the finishing touch on our axiomatic superstructure. To use it in empirical analysis, we must first interpret the undefined terms and collect data. When that is done, we use the data and the sample theorems to test the validity of the proposed model of the original theory. (See figure 26.2.) We have described briefly an axiomatic superstructure that can be used as a basis for testing an economic theory. Our superstructure is pictured in figures 26.1 and 26.2. We shall next describe in more detail the contents of the "axioms," "sample theorems," and "tests" boxes in figure 26.2. Desjgn

Definitions

Inlerpretation

Data

Tests

Figure 26.2

654

Chapter 26

26.2.1 The Undefined Terms: 5, (0, IF), and P(· )

In the schema depicted in figure 26.2, the undefined objects are the sample population 5, the sample space (0, IF), the sampling distribution P( '), and two vectors, W T and W p . In the intended interpretation, = 0T X Op, WT E 0T' and W p E Op. Also, interpreted versions of the undefined terms of the original theory are components of WT; components of W p denote the objects of the empirical investigation. Hence the subscripts T and P may be read, respectively, as theory and population. There are components in W T for each of the objects in the "observable objects" and "theoretical constructs" boxes and for other objects as well. How many components there are depends on the kind of empirical analysis we envisage. For instance, if we were to use budget data to obtain information concerning characteristics of a consumer's demand function, W T would have one component for each of the components of the vector, (p, x, A, V), where the respective letters refer to prices, commodities, income, and the utility function. The vector might also have one component for each of the factors age, profession, education, race, and region, and several components of error terms. If instead of cross-section data, we were to use time-series data to study consumer behavior, W T would have components for each of the components of the vector, (A(M - 1), p(M), y(M), ... , p(K), y(K)), where [M, K] denotes the time interval over which we have observations and A(i - 1), p(i), and y(i) denote, respectively, net worth at the beginning of period i and prices and income in period i. Also, W T would have components for error terms and whatever factors we think might be relevant for our analysis. How many components Wp must have will depend on the characteristics of the empirical analysis undertaken. Certainly, W p will have components which the econometrician associates with specific components of WT' For instance, in a budget study of consumer behavior, W p will have one component for each component of p and components for such aggregates as expenditures on cheese, shoes, and skis~ In addition, W p will have components that represent instrumental variables and other extralogical observable variables that playa role in the empirical analysis. That is so even though these variables are not mentioned in the interpretation of the theory. Finally, W p may have components that denote unobservable factors such as risk and entrepreneurship, factors that need not have any representative components in WT'

°

E 26.4 The preceding comments concerning 5, (0, ff), P('), and (w T , w p ) are reflected in the first six axioms of the axiomatic superstructure that we shall use

Empirical Analysis of Economic Theories

655

to test Modigliani and Bromberg's theory of the consumption function, which we described in E 26.1: M1

Let # S denote the number, of elements in S. Then 5

M2

0 c R 16 and :F is a (J field of subsets of O.

M3

P( . ) is a probability measure on (0, :F).

M4

wT

M5

= wp =

(r,

CfI

E {I, 2, ... }.

Yt' Yte , At-I' M, IX, U, v, '1, c5),

(r, C~, Yt, At-I' Ii), and (w T , wp) E O.

There is a one-to-one mapping,

F(·):S~O.

In the interpretation of these axioms that we intend, the components of W T represent the interpreted versions of r, Ct , Yfl Yte , At-I, M, and IX in Modigliani and Bromberg's theory and four error terms: u, v, '1, and c5. Also, the components of W p represent the observed counterparts of r, Ct , Yt, and A t - 1 and an age group.

Note that in the preceding example the observed values of W p will depend both on the sample design and on the population from which the sample is drawn. We shall test Modigliani and Brumberg's theory using data collected by the Federal Reserve Board. 1 These data pertain to economic choices made by a group of U.S. consumers during 1962 and 1963. For each consumer in the sample, the data provide information on 1962 end-of-year net worth, which we take to be the denotation of At-I; on 1963 disposable income, which will be the denotation of fit; on the change in his net worth during 1963; and on the age of the head of the household in 1963, which we take as the age of the consumer. In our test we let the denotation of c~ equal Yt minus the change in the consumer's net worth during 1963. The values of c~ that we observe are different from the values we would have observed in a budget study of the same consumers. 26.2.2 The Axioms concerning

n

For our testing the empirical relevance of an economic theory, there are two sets of axioms in the "axioms" box of figure 26.2. First we have two sorts of axioms concerning n. One kind specifies the denotation of the components of W = (w T , w p ). These axioms may assert that some of the components of W can take on one value only, that others can take on values in a subset of R or in certain abstract spaces. For instance, in a study based on budget data of consumer-demand functions, the econometrician may assume that prices do not vary over his sample of consumers. If he does, he will postulate that the price components of W T equal some arbitrary con-

6S6

Chapter 26

stant vector. Similarly, if the econometrician denotes a consumer by a household and the age of the consumer by the age of the head of the household, and if the latter is recorded in years, he will postulate that the age component of W T can take on only a certain number of integral values. The second kind of axioms concerning n specifies the functional relationships that hold between the components of w. For instance, some axioms may define aggregation operators which transform components of WT representing, say, ages, different kinds of corporate shares, and different kinds of shoes into components of W p which, when interpreted, will denote age groups, investments in stocks, and expenditures on shoes. Still other axioms may postulate the existence of "errors-in-variable" operators and "partial-adjustment" operators. In the simplest cases an operator of the former kind uses an error-term component, v, of W T to transform a component, a, of W T into a component, h, of W p according to the equation, h = a + v. An operator of the latter kind generally uses a subvector, Z = (zo, ... , Zk)' of W p , an error-term component, 11, of WT' and a (k + 1)tuple of constants, (Cl o,.'" Cl k ), to transform a component, d, of W T into a component, e, of W p according to the equation, k

e = Clo(d - zo)

+L

Cl l

Zi

+ 11·

i=l

Finally, there are axioms that postulate functional relationships between the components of WT; these can be derived from the T1 - T4 boxes. Examples of such axioms are given in E 26.5 below. E 26.5 In E 26.4 we recorded the first five axioms of the superstructure we shall use to test Modigliani and Brumberg's theory. Here we present seven more. Together with M 2 and M 4, these provide a complete characterization of the components of (COT' COp). In reading these axioms, note that the COT components of F(s) must satisfy equations 26.2-26.6 for all 5 E 5 with F(s) E .Q(ct). Note also that equation 26.3 is the sample analogue of equation 26.1; and finally note that, if we were to use equation 26.6 to define v, the values assumed by v would, for a given pair (v,f(ct)), depend on Yre but not on A r- 1 and Yr' The importance of this observation will become apparent later.

M6

ct E {IS, ... , 100}.

M7

There is a one-to-one mapping

G('): {IS, ... , 100} -+ {76, ... ,200}

such that, for all (COT' COp),

M = G(ct). M8

Let Q(&) = {(COT' COp) E Q: ct =

There exists a function

(26.2)

&},

& E {IS, ... , 100}.

Empirical Analysis of Economic Theories

H(·): R++ x {15, ... , lOa}

such that, for all (w T , w p ) Ct

=

H(r, ex) {A t - 1

M9

R++

E n(ex),

+ Yt + (M if ex

=

5(i

{

+ 6) + 2

70

Then for all ex




0;

0 ~ C(y)

and




0, as asserted in F 17, (27.46)

From equations 27.46, 27.24, and 27.36 it follows that if A is valid, then for large n. k~ is close to c(a)/y(a) if and only if a l = 0. But a l = if F 7 and F 8 are true. Thus-as was to be shown-when we record the values of (k~ - (c(a)ly(a)), we are checking the validity of A and F 7 and F 8. The two-stage test of Friedman's hypothesis, which we described above, can be formulated as a statistical test in several ways. The simplest way is the following: For a given a, let A, o-Y P ' and .[I be the factor-analytic estimates of A, (JyP' and t/J, and let b be the corresponding estimate of b(a). Then in the first step of the test the null hypothesis is

°

H lo

A, A

=

A, (JyP

=

o-YP

and

t/J = .[I,

and the alternative hypothesis is H ll

f3

=1=

b.

If Friedman's hypothesis passes the first step of the test, then in the second step the null hypothesis is H zo

A, B, A

= A,

(JyP

=

o-Yp

and

and the alternative hypothesis is H Zl

al

=1= 0.

t/J

=

.[I,

The Permanent-Income Hypothesis

697

From the values of (b(a) - !J(a)) we see that F 1-F 6, F 9-F 17, and the subsidiary condition on the ~i passed the first test with flying colors. However, F 1-F 6, F 9-F 17, the subsidiary conditions on the ~i' and F 7 and F 8 performed miserably on the second test. To see how badly Friedman's hypothesis fared on the second test we recorded the values of al for the various groups. These values with the corresponding standard errors in parentheses below are displayed in table 27.1. They show that, for all groups, a l is significantly different from zero at the 0.05 level of confidence. From this we conclude that our data do not satisfy the certainty version of Friedman's permanent-income hypothesis. In reading our test, note that we have used our estimate of al , al , as a test statistic rather than (k ii - c(a)/y(a)). The implications of this can be seen as follows: We can without loss in generality assume the following for the U.S. population of 1962-1963: (i) For anyone of the relevant groups, the probability of obtaining a

sample in which y(a) < 1 is zero. If that is true and if H 2o is valid, then in terms of the sampling distribution of ka.' c(a), y(a), and a l (a), we have a further condition: (ii) For any c > 0, the probability that than the probability of lall ~ 8.

Ikii

-

c(a)/y(a)\ ~

c is not larger

Consequently, if condition i is true, then at any level of significance the probability of rejecting H 2o with H 2l as specified is not smaller than the probability of rejecting H 2o if the alternative hypothesis were to insist that kii =1= Eiic/Eiiy·

27.4.3 The Rate of Time Preference and the Human-Nonhuman Wealth Ratio The factor-analytic estimates of the parameters in equations 27.26 and 27.27 enable us both to propose tests of the permanent-income hypothesis and to estimate other interesting parameters in Friedman's model. Specifically, from the estimates of the third component of A, A 3 , and from

A3

1

+r

=---, r

we can find the implicit values of r for various groups in the sample population. These estimates of r can be interpreted as the mean rate of time preference of the groups studied. Similarly, from

n :J""

Table 27.1 Test of the certainty version of the permanent-income hypothesis.

III

""Cl

(;

Estimates

'"1

a

2 1O- 8 uYp

1O- 8 uy,2

b

P

kd

ely

0, i = I, ... , 5, and the conditions of SF 16 ensure that the solution to equation 27.60 is unique. The uniqueness of the solution to equation 27.60 can be established as follows: Let B be a nonsingular 2 x 2 matrix that satisfies B'B = M and let

0 from the corresponding element of cov(vhz(O" - 0)) can be made arbitrarily small by choosing m, n, and N large enough. The estimates described in equations 27.98 and 27.99 are bootstrap estimates of 0 and (~(f})'S-l ~(f}))-1, respectively. Petter Laake generated Table 27.5 Factor-analytic estimates and their bootstrap standard errors: The certainty case. Estimates Age groups

A

6Y2

0,

i

=

1, ... , q.



of Y x B, and let

Consumer Choice among Risky and Nonrisky Assets

725

SA 13 Let {G I , ... , Gq } be as in SA 12 and let P('I 0G) denote the conditional probability measure on (!1, iF) given that we are in 0G' Moreover, let EGf denote the expected value of f with respect to P( 'IOG)' Then

EG;£

= EG;(j = 0,

i

= 1, ... , q.

SA 14 Let {G I , ... , Gq } be as in SA 12. The variables A, £, independently distributed relative to P( 'IOG)' i = 1, ... , q.

(j,

and 11 are

SA 15 Let {G I , ... , Gq } be as in SA 12. The variables am, A, )11 A, have finite positive variances with respect to P( 'IOG), i = 1, ... , q.

£, (j,

and 11

In reading these axioms, note that if r§v is the partition {G 1 , ... , Gq } in SA 12, then the 11 in SA 14 and SA 15 is the 11v(') of equation 28.10. Note also that the inequalities which the first eleven axioms impose on E and b are not such that S I-S 13 are inconsistent. For example, if a = 2, then the [; and b of E 28.1 satisfy the inequalities E ~

-b,

- (b

+ A)

~

b

+ E,

and

-A

~

b,

none of which contradicts SF 13 since A ~ 1 and bE (-1, -i]. Finally, note that the intended interpretation of P(·) is like the interpretation we gave to P(·) in M 1-M 19. Specifically, let Q(F- 1 (0)) = P(O n range of F(' ))IP(range of F(' )), 0 E $'; and let !Fs denote the (J field of subsets of 5 that consists of all subsets of 5 that are inverse images under F(') of sets which belong to $'. Then Q(') is a probability measure on (5,!Fs). As in chapters 26 and 27 we interpret Q(F- 1 (0)) as the probability which the samplers assign to the chance of abserving a consumer with a (WI" w p ) vector in 0 (e.g., a consumer with at least twelve years of education or a consumer with a yearly income of more than $100,000). The data we intend to use to test Arrow's theory were obtained in accordance with a stratified random sampling scheme in which consumers were stratified by their 1960 income. Some of the characteristics of this sampling scheme are described in the next three axioms. SA 16

Let a r be one of the numbers $3000, $5000, $7500, $10,000, $15,000, $25,000, $50,000, and $100,000, with a l < a 2 ••• < a B • Moreover, let I r be defined by

= {(W7" Ir = {(W7" 19 = {(W7" II

w p ) EO: y

< al },

w p ) EO: a r - I ~ Y


0, r

=

1, ... ,9.

726

Chapter 28

SA 17 There are N observations with n r observations from I r , r probability distribution of the sample is given by 9

fl

=

1, ... , 9. The

(P( . IIr))n r •

r=l

Axioms SA I-SA 17 are all the axioms we need to carry out the tests we have in mind. Of these axioms, only SA 7 and SA 8 pertain to Arrow's theory per se. The others concern properties of P( . ) and the way consumers in the sample population differ among themselves. Note, therefore, that the implications of our empirical analysis for Arrow's theory and for his hypothesis concerning the monotonic properties of a consumer's riskaversion functions stand and fall with the validity of axioms SA 9-SA 17. 28.2 Arrow's Risk-Aversion Functions and the Data

In this section we describe a test of Arrow's hypothesis that a consumer's absolute risk-aversion function is strictly decre~sing and that his proportional risk-aversion function is strictly increasing. The test is based on several theorems concerning characteristics of the sample population that are easy consequences of the axioms and of well-known theorems in mathematical statistics. 28.2.1 The Data and the Axioms To test Arrow's hypothesis we shall use data that were collected by the Federal Reserve Board in two reinterview surveys of consumer finances (see Projector and Weiss 1966 and Projector 1968). In these a consumer was taken to be a "coI\sumer unit" as defined by the Bureau of the Budget Le. either a family living together and having a common budget or a single individual living alone. For each such consumer the surveyors recorded the value of his assets and liabilities at the end of 1962 and 1963, the level of his income in 1962 and 1963, and the values of many of his most important characteristics, such as age and education. Th~ assets comprised major financial assets plus others such as equity in farm and nonfarm sole proprietorships and company savings plans. The debts consisted of debt secured by a home and / or investment assets, including life insurance, installment debt, and unsecured loans to doctors, hospitals, and banks. This extraordinary supply of information on consumer finances leaves us with a serious problem of interpretation which we resolve as follows: We identify Arrow's consumer with the two surveys' consumers and we adopt our intended interpretation of y, J1, A, and the pairs (ai, mi), i = 1, ... , n.

Consumer Choice among Risky and Nonrisky Assets

727

Moreover, we let y and j1 denote, respectively, the consumer's income in 1963 and the value of his liquid assets (as specified in Projector 1968, pp. 45-46) at the end of 1963; we identify m with the end-of-1963 value of those of the consumer's investment assets that were covered by the two surveys (see Projector 1968, pp. 45-47 and our interpretation of J in section 27.5.1) minus the value of debt secured by these assets; and we define A to equal fi + m. Finally, we interpret b, b, 8, and 13 in such a way that the inverse images under F( . ) of the sets of the various ~ partition the sample population into renters and homeowners, self-employed and employed by others, whites and nonwhites, consumers whose head of the household has ~ 8, 9-12, or > 12 years of education, and consumers whose head of the household was < 35, 35-44, 45-54, 55-64, or ~ 65 years old in 1962. Our interpretation satisfies axioms SA I-SA 5, SA 10, SA 12, and SA 15-SA 17 since they are true by design. It satisfies SA 6, SA 8, and (except for equation 28.7) SA 9 as if by definition, because in delineating the denotation of the components of (w T , w p ) we made sure that these axioms were satisfied. Whether our interpretation satisfies axioms SA 7, SA 11, and SA 13-SA 14 is harder to say. Since these axioms involve so many unobservables, we can test their validity only in indirect ways. 28.2.2 Sample Theorems To obtain an indirect test of SA 7, SA 11, SA 13, and SA 14 we next record two sample theorems. The first theorem concerns the monotonic properties of h(a,·, b). We have postulated that this function is an increasing function of A. The theorem asserts that, if this is so, any interpretation of the axioms which satisfies SA 8-SA 17 must agree that the covariance of mand A is positive. T 28.1

Let {G 1 , ••. , Gq } be one of the partitions rtJv of Y x B, and let

m = (t.G j +

f3GjA

+u

(28.12)

denote the regression relation of i

=

1, ... ,

mon A relative to P('I QG). Then

q.

(28.13)

In reading the theorem, note the existence of a pair (cx Gj , (3G) and a variable u which satisfy equation 28.12 and the additional conditions EGju

=

EGj(A - EGjA)u

=

0

follows from the fact that both

mand A have finite mean and variance. The

728

Chapter 28

inequality in equation 28.13 is a consequence of the monotonicity of hea,', b) and SA 12-SA 14. To wit: Fix v and i and observe that EG/(m - EG/m)(A - EG/A)

= EG/(h(a, A, b) -

EG/m) (A - EG/A) + EG/(t:((A - EG/A) + (c5 - EG/c5))

= EG/(h(a, A,fv(GJ) - h(if, EG/A,fv(Gi ))) (A - EG/A) - EG/lv(A - EG/A) = EG/(h(a, A,fv(Gi )) - h(a, EG/A,fv(Gi ))) (A - EG/A) > o.

Our next theorem concerns the monotonic properties of g('). Axiom SA 7 insists that g(a, " b) is an increasing function of A. If that is correct, then (according to the theorem) any interpretation of the axioms which satisfies SA 8-SA 17 must agree that the covariance of ([il A) and A is negative. Let {G l , ... , Gq } be anyone of the partitions

T 28.2 (j1/A)

f§v

of Y x B and let

= 'YG; + G;A + ~

(28.14)

denote the regression relation of (filA) on A relative to P(·IOG). Then G

j


'\'

II53.2109

Within groups

310778.8727

Total

3II932.0836

I

1349

--

Total Grand mean Between groups Within groups Total F

II53.2I09 230.3772

'< ~

~

0-

Z

0

1350

F ANOVA analysis of '1/ Racial group

Nonwhite

(l) "'1

1280

Grand mean

White

3

n ::r

ANOVA analysis of C; Racial group White

c

~

~.

5.006

Vl

:>'\'

'
Vl Vl

1280

0.05

70

0.02

--

~

Vl

1350 0.05 0.0739

I

0.0739

17.3186

1349

0.0128

---

--

17.3925

1350 5.756

"'l ,j:::o

Vl

Chapter 28

746

variable. Still we may infer from the tables that in all partitions the between-groups variation is a significant source of variation in the dependent variable. The tables and the columns of cell means underneath them, therefore, suggest the following inductive generalizations: IG 1 Absolute risk aversion increases with age and education. Moreover, th absolute risk aversion of the self-employed, homeowners, and white consume] tends to be higher than that of consumers who are, respectively, employed by others, renters, and nonwhite. IG 2 Proportional risk aversion tends to increase with age and education. Moreover, proportional risk aversion tends to be higher among the selfemployed, homeowners, and white consumers than among those who are, respectively, employed by others, renters, and nonwhite.

These inductive generalizations are interesting. So a few remarks concerning their validity are in order. Psychologists believe that people become more conservative as they grow older. Hence our finding that absolute risk aversion increases with age seems reasonable. Our observation in IG 2 that consumers' proportional risk aversion increases with age is reasonable for the same reason. However, the strength of the statistical result is surprising. Since the income and expenditure streams of younger people are less predictable than the income and expenditure streams of older people, everything else being equal, the demand for liquid funds should decrease with age. If it does, our statistics indicate that this decrease in demand has been swamped by the import of changes in the proportional risk aversion of consumers in the sample. It is difficult to understand why self-employed individuals (SEls) should have a higher absolute risk aversion than those who are employed by others (EOls). To explain why, Richard Manning has ventured the following hypothesis: In the business world only a few are very successful and many fail. Chances are that the predominant portion of those who survive, survive because they act conservatively. If that is true, a random sample of SEIs is likely to contain a large proportion of individuals whose investment behavior is characteristic of people with a relatively high absolute risk aversion. Richard Manning's hypothesis is plausible. However, equally reasonable scenarious suggest that our findings concerning the relative absolute risk aversion of SEIs and EOIs are due to a statistical artifact: SEIs function in a riskier environment than EOIs do; e.g., the SEIs face more variable income streams and possess more highly levered portfolios than EOls do. If that is

Consumer Choice among Risky and Nonrisky Assets

747

true, our fundamental assumption concerning the similarity of subjective probability distributions is false, and our observations that 5Els have a higher absolute risk aversion than EOls do cannot be maintained. A second statistical artifact might be the reason why homeowners seem to have a higher absolute risk aversion than renters. Even though cars and homes are not financial assets, a consumer's choice of financial assets might depend on the cars and houses he owns. For example, the extent to which a consumer has mortgaged his house and used loans to finance his car purchase will be an important determining factor of his optimal mix of safe and risky assets. 3 The same mix' will also vary with the degree of uncertainty he faces with respect to future repairs and renovations of his house. Finally, it seems unreasonable that absolute risk aversion increases with years of education. So we hypothesize that our findings concerning education and absolute risk aversion are due to a statistical artifact: Income expectations increase with education. If that is true, we should have added an estimate of human capital when we calculated the value of As for the consumers in our sample. The upshot of the preceding remarks is that we cannot accept IG 1 and IG 2 without further tests. Two such tests are described in sections 28.3.4 and 28.3.5. 28.3.3 Two-Way Analysis of Variance: Theory

In this section we shall prepare the ground for a five-way analysis of variance of our data by discussing the ideas behind a two-way analysis of variance of ~. The same ideas carry over to 'II as well. Consider two partitions of Y X B, {C;, ... , C;} and {Dt, ... , D;}, and write ms as 5 E

where, for i

=

1, ... , q - 1, j

=

1, ... ,

5

k, and for i = q and j

(28.30)

=

1, ... ,

k - 1, G.IJS =

{I

0

if 5 E F- 1 (nG~ n nD~) J otherwise. 1

Then obtain the least-squares estimates of the parameters in equation 28.30-a; a ij , i = 1, ... , q - 1, j = 1, ... , k; aqj' j = 1, ... , k - 1; and {3-and use the estimated value of {3 to construct ~ in accordance with 28.22.

Chapter 28

748

Next let and Then G1 , ... , Gq and 0 1 , •.• , Ok are partitions of S. Suppose that we have N ij observations in Gi n OJ' i = I, ."., q, j = I, .. ", k, and let q

and

N=

L ~*'

i=1

Suppose also that Nij =

(~". . N".)/N,

i

=

I, ... , q,

j

=

I, ... , k

and let ~,(i,j) denote the lth observation of I, ... , k Finally, let

~

in Gi n OJ' i

(28.31)

=

I, ... , q, j

=

and ~JfJf = N- 1

q

k

N ij

L L L ~I(i,j)·

i=1 j=1 1=1

Then it can be shown that q

k

N ij

L L L [~I(i,j) -

i=1 j=1 1=1

~".".f

(28.32)

In the last equation, the first term on the right-hand side records that part of the variation in ~I(i, j)'s that is due to variation in ~I(i,j) within groups.

Consumer Choice among Risky and Nonrisky Assets

749

The next two terms represent, respectively, that part of the variation in ~l(i, j) that is due to variation in the group means of the two partitions. Finally, whenever

for some pair i, j, the last term on the right-hand side differs from zero and measures that part of the variation in ~l(i,j) that is due to interaction between the two factors which determine the partitions of 5 we are considering. In standard statistical terminology, the first term on the right-hand side of equation 28.32 is referred to as the residual or within-class effect; the next two terms are called the main effect, or the between-class variation; and the last term is called the two-way interaction effect. When we consider more than two partitions of 5, there will be one residual term as above, more main effects, several two-way interaction effects, one or more three-way interaction effects, etc. The significance of the various effects is measured by an F statistic. For instance, the significance of the between-class variation over the Gi is measured by F = N - qk ( q_ 1 I

.f Ni~(~i~ - ~H)2

q

1"=1 k

N i)

1

_ '

'i~ j~l l~ [~l(i,j) - ~ijJ2

and the significance of the two-way interaction effect is measured by

Under the hypothesis that the respective effects are nonexistent, these statistics are approximately F-distributed with (q - 1, N - qk) and ((q - 1)(k - 1), (N - qk)) degrees of freedom'. Since the preceding tests are not easy to understand, we shall be a bit more specific. As we did in the one-way analysis-of-variance case, we begin by supposing that there are constants, fi, ai' bj , and Yij and independently distributed random variables cl(i,j) such that, for I = 1, ... , N ij , ~l(i,j)

= fi + ai + bj + Yij +

cl(i,j),

i

=

1, ... ,

q,

j

=

1, ... ,

k

(28.33) (28.34)

750

Chapter 28

k

q

L Ni~

= 0,

Yij

j = 1, ... , k,

L

and

N~jYij = 0,

i = 1,... , k

j==l

i==l

(28.35)

and Es1U,j)

=

0

and

Es 1U,j)2

=

(J2,

i

=

1, ... ,

q,

j

=

1, ... , k.

(28.36)

Then we observe that

q

+ L ~~((~i~ i==l

~n) - ai )2

k

+ L N~/(~~j j==l

~n) - b)2 (28.37)

and deduce that the least-squares estimates of the constants in equations 28.33-28.35 are given by i = 1, ... , q j = 1, ... , k

and

i = 1, ... , q,

j = 1, ... , k.

(28.38)

Next we let Yn , Yn ' and Y n , respectively, denote the minimum value of the left-hand side inaequation Y28.3 7 under no additional restrictions on the constants in equations 28.33-28.35, when the ai are taken to be zero and when the Yij are presumed to be zero. Then it is easy to see that

and

Under the null hypothesis that equations 28.33-28.36 are valid and all

Consumer Choice among Risky and Nonrisky Assets

751

ai equal zero, (N - qk) (9"0 9"o)/(q - 1)9"0 equals Fi and has for large N approximately the F distribution with (q - 1) and (N - qk) degrees of -

0

freedom. The null hypothesis is to be rejected for large values of the statistics. Similarly, under the null hypothesis that equations 28.3328.36 are valid and all the Yij are zero, the statistic (N - qk) (9"Oy - 9"0)/ (q - 1) (k - 1)9"0 equals Fij and is for large N approximately F-distributed with (q - 1) (k - 1) and (N - qk) degrees of freedom. Again the null hypothesis is to be rejected for large values of the statistic. The preceding tests provide us with a way to measure the relative risk aversion of consumers who belong to different groups in a given partition of S. To see that they do, we need only delineate the relationship between the least-squares estimates of the coefficients in equation 28.30 and the values of il, ai' bj , and Yij in equation 28.38. We do that in equations 28.39-28.44: k

a + {l = I i

j=l

aq + {l

k-l

=

I

j==l

(N~/N)&ij (N~/N)&qj

+ (& + /JA 0),

i = 1, ... , q - 1

+ (& + /JA 0),

(28.40)

j = 1, ... k - 1

bk + {l

q-l

=

I

i=l

(~~/N)CXik

(28.39)

+ (& + pAO),

(28.41)

(28.42)

and Yij

= (& + pA 0) + &ij (i, j

Yqk = (&

i=

ai -

bj -

{l,

i = 1, ... , q, j = 1, ... , k (28.43)

(q, k)

+ pA 0) - aq - bq -

{l.

(28.44)

These equations explicate the import of the two-way analysis-of-variance tests described above. 28.3.4 Multiple-Classification Analysis of the Data

In our statistical analysis we consider five factors-age, education, tenure, profession, and race. For us the analogue of equation 28.31 is not satisfied. Hence the details of the decomposition of the variance of ~,(i, j) and (with the obvious notation) t/!,(i, j) differ from the natural extension of equation 28.32 for five factors. The ideas of it, however, carry over to our case.

752

Chapter 28

Therefore I will be content to present our results next and refer the reader to a text on the analysis of variance for further details concerning both the theory and the computational aspects of the analysis (see, for instance, Scheffe 1959, pp. 90-145). Our results are presented in tables 28.7 and 28.8. Table 28.7 supports the conclusions of IG 1 with respect to age, profession, and education; i.e., absolute risk aversion increases with age and education, and individuals who are self-employed have a higher absolute risk aversion than those who are employed by others. However, differences in the characteristics of tenure and race provide little if any information as to the relative degree of consumer's absolute risk aversion. Table 28.8 confirms the conclusion of IG 2 that proportional risk aversion tends to increase with age and education. The table also upholds the conclusion of IG 2 with respect to profession; i.e., the proportional risk aversion tends to be higher among the self-employed consumers than among those who are employed by others. Finally, table 28.8 suggests that the conclusion of IG 2 that homeowners and white consumers have a higher proportional risk aversion than renters and nonwhite consumers cannot be maintained. 28.3.5 Education and Income

Our data do not allow us to test whether the expected income streams of SEls in 1963 were more variable than the expected income streams of EOls. However, we can test whether our result concerning education and absolute and proportional risk aversion is a statistical artifact. This we do by introducing four new covariates in our statistical analysis. They are defined as follows: I s1

=

5

E

F- 1 (12

U

13 ) or not;

Is2 = 1 or 0 according as 5

E

F- 1 (14

U

Is) or not;

I;

=

1 or 0 according as

1 or 0 according as

I; = 1 or 0 according as

5

5

E F- 1 (16 U 17 )

or not;

1

E F- (1s u 19 ) or not.

When we include these covariates in our analysis, we first use all our data to regress ~ and t/J on I;, i = 1, ... , 4. Then we perform an analysis of variance of the resulting error terms. Thus we adjust our observations on m and jil A for the effect of both income and net worth and perform the analysis of variance with respect to age, profession, tenure, education, and race of the adjusted observations.

Table 28.7 Five-way analysis of variance of ~ with age, employment, tenure, education, and race. Sum of squares

Source of variation

(') 0 ::l

~

Mean square

D.F.

Significance of F

F

c::

3rtI ""l

(')

::r

48160.963

9

5351.218

18.587

0.000

20888.989

4

5222.247

18.139

0.000

n· rtI

8641.104

1

8641.104

30.014

0.000

3

39.917

1

39.917

0.139

0.710

OQ

7469.153

2

3734.576

12.972

0.000

:;:tl

530.241

1

530.241

1.842

0.175

'
u(x). Let our data consist of pairs (Xi, pi) conditions are equivalent:

T 29.1

E R~

x R~ +. Then the following

(i) There exists a nonsatiated utility function u('): R~ --+ R which rationalizes the

data. (ii) The data satisfy GARP.

(iii) There exist pairs of numbers, (U i, Ai) 1

~

E

R x R++, such that, for all pairs

i,j ~ m,

U i ~ Ui

+ Aipi(x

i

-

xi).

(29.5)

(iv) There exists a nonsatiated, continuous, concave monotonic function u('): R~ --+ R which rationalizes the data.

It is obvious that condition iv implies condition i. Moreover, UT 22 insists that condition iii follows from condition ii. Therefore, to prove the theorem we need only show that condition i implies ii and that condition iv follows

from iii. We begin by showing that condition i implies ii: Suppose that u(·) rationalizes the data and that xiQX i . Then u(x i ) ~ u(x i ). Suppose also that pixi > piXi . Then, by local nonsatiation, there is ayE R~ such that pixi > piy > piXi and u(x i ) < u(y) ~ u(x i ), which contradicts u(x i ) ~ u(x i ). Hence XiQX i and pixi > piXi cannot happen if condition i is valid. To show that condition iv follows from iii, we define u(·): R~ --+ R by u(x) =

min {U i

+

Aipi(X - Xi)},

XE R~.

(29.6)

l~i~m

Then u(·) is continuous. To show that u(·) is nonsatiated and monotonic, we pick x, y E R~ such that x ~ y and x i= y and assume that (29.7)

and

Time-Series Tests of the Utility Hypothesis

+

u(y) = Ui

765

).ipi(y - xi).

(29.8)

Since pi E R~ +, it follows from equations 29.6-29.8 that u(x) ~ Ui

+

)Jpi(x - xi)


0 and j = I, ... , m, u(x i ) ~ u(x) for all x E R~ such that qixi ~ qix implies that u(xi/a) ~ u(y) for all y E R~ such that (aqi)(xi/a) ~ (aqi)y.

Consequently, since 1 = (siqi)(xi/s i ) = (siqi)(x'/s'), we have u(xi/s i ) ~ u(x'/s'). Continuing in this way, we deduce at last that and

(29.12)

From equation 29.12 and the fact that u(·) is nonsatiated follows 1 = qkxk

~

qk(SkXi) = Sk(qkXi)

and the validity of condition ii in T 29.2. To show that condition ii implies iii we let Ui =

min

{(qix')(q'x t ) ••. (qkXi)},

{i", ... ,k,i}

i = 1, ... ,m

where the minimum is over all sequences of integers in {I, ... , m}, starting somewhere and terminating in i. By condition ii we need only consider sequences without cycles. Hence the U i are well defined. Moreover, they satisfy the inequalities in equation 29.11, since if U i = (qix')(q'x t ) •.• (qkXi)

and Ui = (qrxs)(qsx O) ... (qex i ),

then the defining equation for U i , i = 1, ... , m, implies that U i = (qix ') (q'x t ) ••• (qkXi) ~ (qrxs) (qSX O ) • •• (qe x i ) (qi Xi ) = Uiqix i .

In order to show that condition iii implies iv we let u(x) =

min Uiqix ,

xE R~.

l~i~m

Then it is a routine matter to verify that u(·) satisfies all the requirements of condition iv. Since condition iv obviously implies i, our proof of T 29.2 is complete. With T 29.2 assured, we can ascertain the implication of T 29.1 for our tests of the permanent-income hypothesis. Theorem T 29.1 demonstrates that cross-section data on the expenditures of consumers who face the

Chapter 29

768

same price vector cannot be used to test the utility hypothesis. However, if we accept the utility hypothesis, T 29.2 shows that we cannot use such data to determine whether the utility function is homothetic either. The reason is that such data always satisfy condition (ii).l It is, therefore, interesting to recall that in chapter 27 we carried out factor-analytic tests of the homotheticity of Friedman's utility function with different kinds of data-data consisting of triples (c i , yi, Ai) and quintuples (ci,ji, K i , yi, Ai) that were marred by errors of observation. 29.3 Testing for Homothetic Separability of the Utility Function

In theorems T 11.10 and T 11.11 we characterized the behavior of a consumer whose utility function is homothetically separable. Next we shall propose a nonparametric test to determine whether a consumer's utility function is homothetically separable. T 29.3 Suppose that n = ~k and that our data consists of m ~-tuples ((xL pO, ... , (x~, p~)), i = 1, ... , m, where (xj, pj) E Rt x Rt +, j = 1, , ~. Moreover, let

CJ = pJxJ

qJ

and

=

pJICj,

j

=

1, ... ,~,

i

=

1,

, m.

Then the following assertions are equivalent: (i) There exist nonsatiated, continous, concave monotonic functions, V(·): Ri -+ R, u/·): Rt -+ R+, j = 1, ... , ~, such that the uj (·) are linearly homogeneous and V(u 1 ( • ), ••• , u~( . )) rationalizes the data. (ii) There exist m ~-tuples, ((UL Pi), ... , (U~, P~)), i following conditions:

(UJ, PJ) E R; +, b. Uj ~ UJqJxJ, c. PJ = Cp UJ,

j

a.

=

1, ... ,

e,

1 ~ i, r ~ m, j

=

1,

d. The ~-tuples ((UL PD,

,~,

i = 1,

, (U~, P~)),

, ~.

m.

,

i

=

1, ... , m, which satisfy the

, m.

j = 1,

i = 1,

=

1, ... , m, satisfy GARP.

For brevity I shall only sketch a proof of the theorem. Suppose first that condition i is true. Then it must be case that, for each j, j = 1, ... , ~, u/') rationalizes the m pairs (xj, pj), i = 1, ... , m. Consequently, since the u/') are linearly homogeneous, we can appeal to T 29.2 and deduce the existence of numbers Uj, j = 1, ... , ~ and i = 1, ... , m, which satisfy condition iib. In fact, we can show that the Uj can be chosen so that they satisfy both condition iib and i = I, ... , m,

j = I, ...

I

~.

(29.13)

Time-Series Tests of the Utility Hypothesis

769

To establish equation 29.13 we fix i and j and use the properties of uj (,) and a standard theorem in concave programming (Karlin 1959, p. 201) to deduce that there exists a A E R+ + such that xE R~.

(29.14)

From equation 29.14 and the homogeneity of Uj('), it follows that a(Uj(x) -

ApJX) ~ u(xJ) - ApJXJ,

x

E R~

and a

E

R+.

Consequently, (29.15)

and xE R~.

(29.16)

From equations 29.15 and 29.16 it follows that we can choose the UJ so that they satisfy equation 29.13. Suppose that we have chosen the UJ in accordance with equation 29.13 and defined the PJ by condition ii-c. Then the pairs (UJ, PJ), i = I, ... , m, and j = 1, ... , ~, satisfy condition ii-a. Moreover, V(·) rationalizes the vectors ((UL PD, ... , (U~, P~)), i = 1, ... , m. To establish this fact we let

pi = (PL ... , P~),

and

i = I, ... , m

and we intend to show that, for each i = I, ... , m,

piU i ~ piU, U E

Ri, implies V(U i ) ~

V(U).

(29.17)

To begin, we fix i and let ~

A =

L

CJ.

j=l

Then we use the optimality of xJ, j = 1, ... , Uj(') to show that V(u 1 (x~ / Cf) Cf,

and the homogeneity of the

... , U~(X~/ CD C~) V(U 1 (xi/CDA1 , ... , u~(x~/C~)A~).

max

Aj~O,j=l""'~'Lj=l

Ri

such that piU O = piU i and

V(U).

max U E Ri,piV=PiVi

Then

LJ=l (PJUP) =

(29.18)

Aj=A

Finally, we choose U O E

V(Uo) =

~,

A and equation 29.18 imply that

(29.19)

770

Chapter 29

V(UO) = V(Ul (xUCf)P~ Up, ... , u~(x~/cDP~ug)

~ V(Ul (xUCf)cL ... , u~(x~/c~)C~) = V(U i );

(29.20)

and equations 29.19 and 29.20 imply that (29.21)

From equations 29.19 and 29.21 follows the validity of equation 29.17 for the given i. Since i was chosen arbitrarily, we conclude from the preceding arguments that V(·) rationalizes the ~-tuples, ((U~, P~), . .. , (U~, PD), i = 1, ... , m, as was to be shown. To conclude the proof that condition i implies condition ii, we use the preceding result and theorem T 29.1 to show that the ~-tuples, ((UL PD, ... , (U~, P~)), i = 1, ... , m, satisfy GARP. In order to establish the converse we first define U/x) =

min Ujqjx,

j = 1, ... , ~,

X E

R~

l~i~m

and observe that the u/ .) are continuous, nonsatiated, concave, and monotonic linearly homogeneous functions which rationalize the respective data subsets, (xj, pj), i = 1, ... , m, for each j = 1, ... , ~. Evidently, u/xj) = Uj,

j = 1, ... ,

~,and

i = 1, ... , m.

Next we let U i = (UL ... , UV and pi = (Pf, ... ,PI), i = 1, use T 29.1 and condition iid to find pairs, (Vi, Ai) E R;+, i = 1, satisfy 1

~

i, r

~

, m, and , m, that

m.

Finally, we define V(·): R~ ~ R by V(UJ

= min

{Vi

+ Aipi(U -

U i )}.

l~i~m

Then it is easy to see that

i = 1, ... , m and that V(·) is a nonsatiated, continuous, concave, monotonic function. It is also evident that, for each i = 1, ... , m, piU i ~ piU implies that V(U i ) ~ V(U).

This fact and the properties of the uj (·) suffice to give a simple proof that

Time-Series Tests of the Utility Hypothesis

771

V(UI (.), ... , u~(·)) rationalizes our data. Those arguments I leave to the reader. The preceding theorem is an analogue of a theorem of Varian's (see Varian 1983, theorem 5, p. 107). Varian also proposes a test for additively separable utility functions (see Varian 1983, p. 107-108).

29.4 Excess Demand Functions and the Utility Hypothesis

Above we discussed the use of time-series observations on single consumers to test the utility hypothesis. In the process I presented several nonparametric tests of the empirical relevance of the standard interpretation of H 1-H 6. Next we shall discuss the use of time-series observations on the income and expenditures of groups of consumers to test the empirical relevance of the standard interpretation of a slightly modified version of H I-H 6. In this version a commodity bundle is a pair (x, y) E R~-l X [0, T], and a price is a pair (p, w) E R~-) x R+ +. Also, the.A in H 3 is replaced by a vector, (w, T) E R~-l X R+ +, and pw + wT is substituted for the A in H 4 so that r (p, A) becomes r(p,w,pw

+ wT)

= {(x,y) E R~-l

x

[0, T] :px

+ wy

~ pw

+ wT}.

In the intended interpretation of the axioms, the components of x denote so many units of ordinary commodities and y denotes hours of leisure time. Also, w is taken to be the consumer's initial holdings of commodities and T measures the total amount of leisure time available to the consumer. If we adopt the version of H 1-H 6 described above, the consumer's behavior can be characterized by the relation (z, - L) = h(p, w),

where L denotes so many hours of labor, h('): R~ + by h(p, w) = [(p, w, pw

+ wT) -

-+

Rn-l

X

R is defined

(w, T),

and [(.): R~+ x R+ -+ R~ is the demand function. As we shall see, the significance of this representation of consumer behavior stems from the fact that h(') satisfies the following conditions: (C) h(.) is continuous; (H) h(.) is homogeneous of degree zero;

772

Chapter 29

(W) h(') satisfies Walras's Law; Le., (p, w)h(p, w) = 0; and (B) h(') is bounded from below. Any function which satisfies (H) and (W) is called an excess demand [unction. Suppose now that we have obtained a time series, (Zi, Li, pi, Wi), i = 1, ... , m, of observations on consumer behavior. If our data represent the choices which a single consumer made during the time from i = 1 to i = m, we can apply T 29.1 and construct a test of the utility hypothesis. If our data pertain to the choices which a group of consumers made during the same time period, the analysis of sections 29.1-29.3 does not apply. We shall next discuss some of the alternatives open to us. 29.4.1 Testing the Utility Hypothesis with Group Data That Satisfy

GARP Let G denote a group of consumers and assume that #G~n.

Moreover, let Zi and L i, respectively, denote the total purchases of the components of z, which we assume to be positive, and the aggregate supply of hours of labor in period i by consumers in G. Finally, let pi and Wi, respectively, denote the price of z and the wage rate which the same consumers faced in period i. We assume that our data consist of the quadruples, (Zi, Li, pi, Wi), i = 1, ... , m, and we want to determine how we can use these data to test the utility hypothesis. It seems reasonable that if we have obtained information on the number of hours which the consumers in G supplied in period i, we should be able to determine the value of T for each consumer and hence the value of T for the group as a whole. It is also reasonable to assume that the total amount of leisure time available to the consumers in G does not change from one period to the next. So from now on we assume that we have determined the value of T for the group as a whole and define yi E R+, i = 1, ... , m by i = 1, ... , m.

Then, for each period i, yi measures the total amount of leisure time of which the consumers in G actually disposed. Suppose now that the sequence of data (Zi, yi, pi, Wi), i = 1, ... , m, satisfy GARP. Then we can find a nonsatiated, monotonic, continuous, concave function, u(·): R~ ---+ R, which rationalizes our data.

Time-Series Tests of the Utility Hypothesis

773

The existence of u(·) shows that it is possible that the Z and y aggregates for the given group are chosen as if the group as a whole maximized utility subject to the group's budget constraint. That is interesting, but it does not help us determine whether each consumer in the group is a utility maximizer. And it is the latter problem about which the utility hypothesis is concerned. To test the utility hypothesis we proceed as follows. We let N = # G and A = liN and observe that if our data satisfy GARP, the sequence (AZ i, Ayi, pi, Wi), i = 1, ... , m, must satisfy GARP as well. Consequently, there exists a nonsatiated, monotonic, continuous, concave function, u(·): R~ ~ R, which rationalizes (AZ i, Ayi, pi, Wi), i = I, ... , m, and is such that if each consumer in G chose his consumption bundle by maximizing u(·) subject to the constraint, pZ

+ wy

~

w(2T),

Z

E R~-1

and

y E [O,2T]

then for each i = I, ... , m, the quadruple (Zi, yi, pi, Wi) would belong to the graph of the aggregate demand correspondence of the consumers in G. 2 The relationship between the u(.) of the members of G and the u(.) of G is not uniquely determined. A possible choice of u(·) can be intuited from the following arguments: Consider a group of (H I-H 6) consumers, (R~, ~('), AI)' I = I, ... , N. Moreover, let P(·): R~ + x R+ ~ R~ denote the demand function of consumer I, I = I, ... , N, and assume that the aggregate income of the group, A, is always divided so that the income of each consumer equals AIN. Finally, assume that the utility functions of the consumers are identical. Then the aggregate demand function of the group, F('): R~+ x R+ ~ R~, which is defined by N

F(p,A) =

L P(p,AIN), 1=1

is the demand function of a consumer, (R~, V('), Lf=1 Ai)' whose utility function satisfies the relation, V(x) = V 1 (xIN),

xE R~.

Now the u(·) of G need not satisfy H 6. Still we may chose u(·) such that u(xIN) = u(x),

xE R~

and demonstrate that (Zi, yi, pi, Wi) belong to the graph of the aggregate demand correspondence of the consumers in G. Those details I leave to the reader.

774

Chapter 29

From the preceding arguments it follows that a sufficient condition that our group data be consistent with the utility hypothesis is that they satisfy GARP. 29.4 Testing for the Homotheticity of Individual Utility Functions with

Group Data That Satisfy GARP

i = 1, ... , m qi

= (pi, w')

let.

pJzJ

+ w iy }

i = 1, ... , m.

Moreover, suppose that the pairs (Xi, qi), i = 1, ... , m, satisfy condition ii of T 29.2. Then we can find a nonsatiated, continuous, concave, homothetic, and monotonic function, u(·): R~ ~ R, which rationalizes our data. From this and from arguments like those used above, we deduce that our data are consistent with the hypothesis that the consumers in G determine their consumption bundles by maximizing one and the same homothetic utility function u('): R~ ~ R+, subject to their respective budget contraints. When the u(·) of G is homothetic, the u(·) of the members of G can be taken to equal u(·). This fact can be intuited from the next theorem: T 29.4 Consider a set of H 1-H 6 consumers, (R~, ~('), A,), 1 = 1, ... , k, and assume that the ~(.) satisfy the conditions of H 6. Moreover, let fl(.): R~ + x R+ -+ R~ denote the demand function of the lth consumer, 1 = 1, .0., k, and let F('): R~ + x R~ -+ R~ be defined by k

F(p,A t , .. o,Ak )

= L !'(p,AJ I=t

Then there exists a function H('): R~ + x R+ A k ) E R~+ x R~,

H(P' f

I=t

-+ R~

such that, for all (p, At, ... ,

Ai) = F(p,At,···,A k )

if and only if the

~(.) (1)

(29.22)

are homothetic and (2) induce the same ordering of R~.

This theorem is due to G. Antonelli (see Antonelli 1971, pp. 344-345) and the proof is as follows: If the ~(.) are homothetic and induce the same ordering of R~, then T 11.9 implies that there exists a continuous function g('): R~ + ~ R~ such that !'(p, A) = g(p)' A,

(p, A) E R~ + x R+, 1 = 1, ... , k

from which it follows that F(.) satisfies equation 29.22 with

(29.23)

Time-Series Tests of the Utility Hypothesis

775

(29.24)

Conversely, suppose that there is a function H('): R~ + X R -+ R~ which satisfies equation 29.22 for all (p, A l , ••• , A k ) E R~ + x R~. Then

1 = 1, ... , k.

H(p,A) = P(p,A),

Moreover, for all (A l , A 2 )

E

(29.25)

R; and p E R~ +,

from which it follows by standard arguments that there is a continuous function g('): R~ + -+ R~ which satisfies equation 29.23. But if this is true, then equations 29.23 29.25 and T 13.12 imply that the V,(.) are homothetic and induce the same ordering of R~ . 29.4.3 A Characterization of Excess Demand Functions

To test the utility hypothesis with group data that cannot be rationalized by a utility function, we must generate new ideas. The most interesting of those ideas is described in the next theorem, which is due to D. McFadden, A. Mas-Colell, R. Mantel, and M. K. Richter: T 29.5 A function h('): R~ + ~ Rn satisfies conditions (H), (W), and (8) if and only if there exist n preference-maximizing consumers whose excess demand functions satisfy (8) and sum to h(·).

It is obvious that if the excess demand functions of n preference-

maximizing consumers satisfy (B) and sum to h('), then h(') will satisfy (H), (W), and (B). Hence we need only prove the converse. The proof of the necessity part of T 28.5 which McFadden et al. gave in McFadden, Mas-Colell, Mantel, and Richter 1974 (pp. 364-366) is informative. Hence I shall repeat it here: We begin by envisioning a sphere with center - q E {x ERn: X < o} and radius 211 q I and by observing that the intersection of this sphere with R~ +, i.e., Q

= {p E

R~+:

lip + qll = 211qll},

has the following interesting properties: (i) for each p E (ii) for each p E that p = Ar; (iii) q E Q;

R~ +, R~ +,

there is a unique A E R+ + such that Ap E Q; there is a unique pair (r, A) E Q x R+ + such that

776

Chapter 29

(iv) if p, r E Q and p :f. r, then (r + q)(p + q) < 411q112; and (v) if p E Q, p. q ~ q . q and p. (p + q) ~ 2q . q. Since these properties are easily established, I leave their verification to the reader. Next we pick a q E R~ + such that, for all p E R~ +, h( p) + q > 0, and define f3,(.): R++, 1 = 1, ... , n, by n

h(p)

+ P+

q=

L f31(p)e ' ,

(29.26)

1=1

where e' is the lth unit vector, 1 = 1, ... , n. Since ph(p) = 0, n

p. (p

+ q) = L f31(p)pe ' . 1=1

Consequently, for all p E Q, n

h(p) =

L f31(p)e ' -

(p

+ q)

1=1

n

=

L f3,(p)[I -

((p

+

q)p' /p. (p

1=1

+

q))]e '

where I is the n x n identity matrix and h'(p) = f3,(p) {I - [(p

+ q)p'/p. (p + q)]}e ' ,

pE Q,

1=

1, ... , n. (29.27)

For each 1 = 1, ... , n, the function h ' (·): Q ~ Rn has three interesting properties: (i) ph'(p) = 0;

(ii) h' (. ) is bounded below; and (iii) h ' (·) satisfies the Strong Axiom of Revealed Preference. The validity of conditions i and ii is obvious. Hence it suffices to prove condition iii. Let r, p be two vectors in Q. In order that h'(r) be directly revealed preferred to h'(p), we must have h'(r) :f. h'(p) and

o ~ rh'(p) = f31(p){re ' - [r(p

+ q)/p(p + q)]pe l }.

(29.28)

Time-Series Tests of the Utility Hypothesis

777

Since pel> 0 and (see property iv of Q)

+ q) -

r(p

p(p

+ q) =

(r

+ q)(p + q) -

(p

+ q)(p + q)
0, there is a c5 > 0 such that, for all closed subsets B of G and for all z, u E ~t, II z - u II < () implies that I Q(z, B) - Q(u, B)I < £.6 (iv) For each pair (Nv Nil) flit

n {(P'l' W,

(I.,

E R~ +/,

there is an a

E

R+ + such that, for all t E T,

Pm' d, N v Nil) E R~+2(l +k+rn) : (Nv Nil)

=

(Nv Nil)}

~ ({u E R~" +"., k+~~+. u, = a} nR~+.1+".) x R~

v

x {(N N.)).

Then condition ii and Parthasarathy's theorem (Parthasarathy 1967, theorem 6.7, p. 47) ensure that the closure of the family {q(z); z E flit} is a compact subset of At (flit + 1 ). From this, from the boundedness of V(·), from the fact that G C R~+/(l+k+m), and from conditions iii and iv above follows the existence of the required extension of V(·) in MCBS 2 for each value of (Nv Nil). That is all that is required for the purposes of T 30.7.

30.6.4 Concluding Remarks

The price system and dividends of CBS 5 and the pair (Q( .), V(·)) of CBS 6 are undefined terms of CBS I-CBS 12. Similarly, the price system of FBS 5 and the pair (Q('), V(·)) of FBS 6 are undefined terms of FBS I-FBS 12. Therefore, the remarks made above about various characteristics of the agents in the economy of T 30.7 concerned both the relevance of T 30.7 and the possibility of finding nomologically adequate models of CBS 1CBS 12 and FBS I-FBS 12. For the purpose of our concluding remarks, we shall now assume that we are dealing with an uncertainty economy for which T 30.7 is relevant. Specifically, we assume that a vector of prices and dividends and an allocation of resources in the uncertainty economy is a temporary equilibrium for that economy if and only if it is a temporary equilibrium in the economy of T 30.7. Our intent is to check whether such equilibria have the same properties as competitive equilibria in a deterministic environment. We begin with stability. The problem of the stability of a temporary equilibrium in an economy populated by CBS I-CBS 12 consumers and FBS I-FBS 12 entrepreneurs must be resolved by determining the stability of temporary equilibria in £, the economy of T 30.7. In that respect, the following observations are relevant: 1. The temporary equilibria in £ cannot be more stable than competitive equilibria in a deterministic environment. Example E 14.5, therefore, demonstrates that we cannot be sure that the set of temporary equilibria in £ is stable.

2. Since the consumers in £ have no initial quantities of consumer goods, the excess demand function of £ is not continuous on all of R~+h+I+l+k+S -

Chapter 30

826

{o}. Consequently, neither T 14.8 nor T 14.9 have analogues that are valid for E. Our inability to establish the stability of temporary equilibria in our uncertainty economy is as much a concern to us as was our failure in chapter 14 to establish the stability of competitive equilibria in a deterministic environment. However, here as in chapter 14 we are left with the feeling that our inability might be a reflection of deficiencies in the dynamic model we use to model the price-adjustment process. Therefore we shall not pursue the matter any further. Next, the characteristic feature of resource allocation in an economy populated by CBS I-CBS 12 consumers and FBS 1- FBS 12 firms: For such an economy an allocation of its current-period resources is admissible if no feasible reallocation of them would increase one agent's expected utility without simultaneously decreasing somebody else's. We demonstrated in T 14.17 that a temporary equilibrium in a deterministic environment achieves an admissible allocation of resources. This is not true of the economy we now consider. The reason is that in the present economy the expected utility which each agent associates with his sales and purchases of goods and securities depends in part on current prices. Therefore, if is possible that one set of equilibrium prices might provide a higher level of satisfaction for all agents in the economy than some other set. In other words, if the economy could attain more than one temporary equilibrium, it is possible that one and not the other might be admissible. In reading the preceding observation, we should make several mental notes. First, in the definition of an admissible allocation of resources, the expected utilities of entrepreneurs count as much as the expected utilities of consumers. This is controversial since the standard notion of a Paretooptimal allocation of resources-under uncertainty as well as certainty-is explicated in terms of production possibility sets and consumer preferences only. We have no qualms with the standard optimality crite(ia as such. They are the natural criteria to apply to the economies about which the standard certainty and uncertainty theories are concerned. However, I believe that they cannot be applied to the kind of uncertainty economy I have in mind. My reasons for that are essentially the reasons I gave for introdUcing an entrepreneurial utility function in section 30.5 and need not be repeated here. Second, even though a temporary equilibrium need not be admissible, it still allocates resources efficiently in the usual sense. To wit: Fix prices and dividends as prescribed by some given temporary equilibrium. The corresponding equilibrium allocation of goods and securities is obviously

Temporary Equilibria under Uncertainty

827

efficient in the sense that there exists no feasible redistribution of goods and securities that will provide some (consumers and entrepreneurs) with higher utility without causing a loss of utility to others. Third, if one temporary equilibrium is admissible and another is not, the reason need not be that the conditional expectations of consumers and entrepreneurs in one equilibrium is "better" than in the other by, for example, corresponding more closely to the true distribution of future prices. Thus there is no trace of the idea of a rational expectations equilibrium underlying my notion of an admissible allocation of resources. Inasmuch as the expectations of the agents in our uncertainty economy are multivalued with respect to the states of nature, there cannot be such a thing as a rational expectations equilibrium in our economy. Pertinent references in this respect are Radner 1979 (pp. 655-677), Jordan 1985 (pp. 257-276), Kihlstrom and Mirman 1975 (pp. 357-376), and Stigum 1974 (pp. 98-105). 30.7 Appendix: Proofs of Theorems

In this appendix we shall sketch proofs of theorems T 30.1, T 30.2, T 30.5, and T 30.7. We begin with T 30.1 and T 30.2, which we treat as if they were different parts of one and the same theorem. 30.7. I Proof of T 30. I and T 30.2

Our proof of T 30. I and T 30.2 is obtained in several steps, the first seven of which concern the existence of V(·). Throughout the proof, we let x(t) = (q, L) (t). Also, with z short for (p, W, :X, Pm' d, N v N/4)' we let r(z, r)

=

{(q, L,J.l, m)

E

R~ x R_ X R k X R~ : (L,J.l) ~ -(Nv N/4) and

(p q, w, a, Pm)(q, L, J.l, m) ~ r}.

Finally, we shall write z(t), a(t), Pm(t), d(t), y(t), and A for the values assumed, respectively, by (Pq,W,a,Pm,d,Nv N/4)(t,'), a(t,'), Pm(t,·), d(t,·), )JU,'), and A('), t = I, ... , M + 1; and we shall argue as if M > 2. Step 1: It is clear that r (.) is convex. It is equally clear that r (.) is compact if (Pw w, a, Pm) > O. Finally, it follows from UT 7 and only obvious additional arguments that r(·) is continuous on the set {(z, r)

E R~+l +k+l+l+l +k X

R: (p q, w, a, Pm) > 0, (Nv N/4) > 0 and r ~ -wNL

-

aN/4}'

828

Chapter 30

Step 2:

For each

10 (~, A, y(~), x(I),

= E{ U, (X(I),

~ = I, ... , M,

, x(~),

jl,

let

m)

, X(.;'), 1'1

+ :~ Cl,(.;' + 1)I'i+l + (Pm + d)(.;' + l)m) IA, y(.;'),.;'}. It follows from CBS 9 and CBS 11 (i) and (iii) that, for each value of ~, lo(~' .)

is a well-defined, bounded, continuous function on the set

{range of (A, y(~))} x (R~ x R_)~

X

Rk

X R~,7

and that, for each (A, y(~)) E {range of (A, y(~))}, lo(~, A, y(~), .) is a strictly increasing, concave function on the set (R~

x

R_)~

x Rk

X R~.

Moreover, if the conditional distribution of (I, cx I (~ + I, '), ... , CX k - 1 (~ + I, '), (Pm + d)(~ + I, ')), given (~,A,y(~)), is nondegenerate, then/o(~,A, y('), .) is a strictly concave function on (R~ x R_)~ X R k X R~. Hence, by CBS 12, lo(t, A, y(t), .) is a strictly concave function on (R~ x R_ Y X R k X R~, t = I, ... , M.

Step 3:

Let

h*(M, A, y(M), x(I), ... , x(M -

max

I), r)

10(M, A, y(M), x(I), ... , x(M),

j-l,

m).

(x(M),Il, m) E r(z(M), r)

Then it follows from step 1 and C. Berge's theorems VI. 3.1 and VI3.2 (Berge 1959, pp. 121-122) that N(M, .) is a well-defined, bounded, continuous function on the set {(A, y(M), u, r) E {range of (A,y(M))} x (R~ x R_)M-I X

R: r ~ -w(M)NL(M) - cx(M)NIl(M)}.

Moreover, for each (A,y(M)) E {range of (A,y(M))}, ft*(M,A,y(M),') is strictly increasing and strictly concave on the set

Step 4: In this step we shall define two functions, It*(~, M, and 11 (M - I, .) and derive their properties. Let

'),

~ =

M - I,

Temporary Equilibria under Uncertainty

ftf(e, A, y(M -

1), x(I), ... , x(M -

fo(~, A, y(M -

829

1), jl, m)

1), x(I), ... , x(M - 1), jl, m)

if

e= M

- 1

k-l

E { ft ( M, A, y(M), x(l), ... , x(M -

1), jll

+ i~

lY. i (M)jli+l

+ (Pm + d)(M)m)!A.Y(M -l).~} if ~ =

M.

Moreover, let

it (M -

1, A, y(M - 1), x(I), ... , x(M - 1), jl, m)

= E{K"'(~,A,y(M -

1),x(I), ... ,x(M -

1),jl,m)IA,y(M -

1),

~~M-l}.

Then the properties of fo(M - 1, .) and ft(M, '), together with CBS 10 (ii), CBS 11 (iii), and CBS 12 imply that, for e= M - 1 and M, ftJf(e, .) is a well-defined, bounded, continuous function on the set {(A, y(M X

Rk

X

1), U, jl, m) E {range of (A, y(M -

R~: jl ~ -N/l(M -

I»} x (R~ x R_ )M-l

I)}.

Moreover, for each (A, y(M - 1» E {range of (A, y(M - I»}, KJf(e, A, y(M - 1), .) is a strictly increasing, strictly concave function on the set (R~ x

R_ )M-l

X R~N~(M-1) X R~.

From the properties of NJf(·) and CBS 11 (ii), it follows that it (M - 1, .) is a well-defined, bounded, continuous function on the set {(A,y(M - 1),u,jl,m) E {range of (A,y(M -

x Rk

X R~:jl~

I»} x (R~ x R_)M-t

-N/l(M-l)},

and that, for each (A,y(M - 1» E {range of (A,y(M - I»},ft(M - I,A, y(M - 1), .) is a strictly increasing, strictly concave function on the set (R+

x R_ )M-t x

R~N~(M-t) X R~.

Step 5: Note that, conditional upon the observed value of (A, y(M - 1», ft (M - 1, A, y(M - 1), x(l), ... , x(M - 1), jl, m) represents the maximum expected utility which the consumer can obtain over the remainder of his life if he chooses the vector (x(M - 1), jl, m) in period M - 1 after having chosen x(l), ... , x(M - 2) in the preceding periods.

830

Chapter 30

Step 6: Since it (M - I, .) has the same properties as fo (M - I, '), it is evident that, by working backward on the tree structure of events facing the consumer, we can use the same arguments used above to construct real-valued functions h(M - s, .) on the respective sets {(A,y(M - S),u,Il,m)

x Rk S

X R~:

E

{range of (A,y(M - s»} x (R~ x R_)M-s

Il ~ -NIl(M - s)},

=

2, ... , M - I, with the following properties: (i) h(M - s, .) is bounded and continuous;

(ii) for each pair (A, y(M - s» E {range of (A, y(M - s»}, h(M y(M - s), .) is strictly increasing and strictly concave on the set (R~ x

R_ )M-s

X

5,

A,

R_Np(M-S) X R~;

(iii) for each vector (A, y(M - s), x(I), , x(M - s), Il, m) in the domain of h(M - s, '), h(M - s, A, y(M - s), x(I), , x(M - s), Il, m) measures the maximum expected utility which the consumer, conditional upon the event (g ~ M - s} n {(A,y(M - s»(w) = (A,y(M - s»}), could obtain over the remainder of his life if he chose (x(M - s), Il, m) in period M - S after having chosen x(I), ... , x(M - S - 1) in the preceding periods.

Step 7:

From step I-step 6 it follows that if we let

V(A, y(I), x(I), Il, m)

=

fM-l

(I, A, y(I), x(I), Il, m),

then V(·) has all the properties required of V(·) in T 30.2 (i). To demonstrate that V(·) satisfies the condition described in T 30.2 (ii), we assume that the consumer has a consumption-investment strategy, which we denote by

CIS = {(4, L, A, m)(t, W), t = 1, ... , M}, and proceed as follows:

Step 8: set

Let (q, L, Il, m) (M, .) be the real vector-valued function on the

{(A, y(M), x(l), ... , x(M - I), r) X

R:

r ~

E

{range of (A, y(M»} x

(R~

x R_ )M-l

-1X(M)NIl(M) - w(M)NL(M)},

which at each value of its arguments solves the associated maximum problem in step 3. It is an easy consequence of the properties of r(·) and fo(M, .) and C. Berge's Theoreme du Maximum (Berge 1959, p. 122) that

Temporary Equilibria under Uncertainty

831

(q, L, 11, m) (M, .) is a continuous function of its arguments. Consequently, when we substitute (q, L) (t, .) for x(t), t = 1, ... , M - 1, and P1 (M 1, .) + I~':-l (Xj(M)Pj+l (M - 1, .) + (Pm + d)(M)m(M - 1, .) for r in (q, L, 11, m) (M, '), we obtain a function, (q, [, jl, m)(M, .) : n -+ R~

x R_

X R

k

x RL

which satisfies the condition $'((q, [, ii, m)(M» c $'(A, y(M), I~~M)' But if this is so, it is a routine matter to verify that (q, [, ii, m) (M, w) = (q,

Step 9:

Let

f{(M -

1, A, y(M -

L, It, m) (M, w)

a.e.

1), x(I), ... , x(M -

max (x(M-1),J.!,m) E r(z(M-1),r)

ft (M -

2), r)

1, A, y(M -

1), x(I), ... , x(M -

1),11, m).

Then it follows from the properties of r (.) and 11 (M - 1, .) and from C. Berge's theorems VI. 3.1, VI. 3.2, and Theoreme du Maximum that the following assertions are true: (i) f{ (M -

t =

1, .) is well defined and continuous on the set

{(A, y(M - 1), x(I), ... , x(M - 2), r) E {range of(A, y(M - I»)}

x

(R~

x R_ )M-2

X

R:

r ~ - (X(M -

I)NJ.!(M -

1)

- w(M - I)NL (M - I)}. Moreover, for each value of (A, y(M - 1), 12*(M - 1, A, y(M - 1), .) is a strictly increasing, strictly concave function. (ii) The vector-valued function on t, (q, L, 11, m) (M - 1, '), which at each value of its arguments solves the associated maximum problem in the definition of f{ (M - 1, '), is well defined and continuous. From steps 8 and 9 we deduce that if we substitute (q, L) (t, .) 1, ... , M - 2, and Itl (M - 2, .) + I~':-l (Xj(M - l)ltj+l (M 2, .) + (Pm + d)(M - l)m(M - 2, .) for r in (q, L, 11, m)(M - I, A, y(M 1), .) we obtain a function,

Step 10:

for x(t), t

=

(q, [, ii, m)(M - 1, .) : n -+ R~

x R_

X

Rk

X R~,

which satisfies the condition $'((q, [, ii, m)(M - 1» c $'(A, y(M - 1), I~~M-1)' Moreover, if we substitute (q, L) (t, .) for x(t), t = 1, ... , M - 2, (q, [) (M - 1, .) for x(M - 1), and fi1 (M - 1, .) + I7,:1 (Xj(M){ij+1 (M -

Chapter 30

832

1, .) + (Pm + d)(M)m(M - 1, .) for r in (q, 1, j1, m)(M, A, y(M), '), we obtain a function,

(q, I, il, m)(M, .) : Q -+ R~ X

R_

X

Rk

X

R~,

which satisfies the condition ~((q, I, il, m)(M)) c ~(A, "I (M), Il;~M)' But if this is so, then it is a routine matter to verify that (q, I, il, m)(M, w) = (q,

L, {l, m)(M, w)

a.e.

and (q,

L, ii, m) (M -

1, w)

=

(q,

L, {l, m) (M -

a.e.

1, w)

Step 11: Proceeding as above, we can establish the existence of a sequence of vector-valued functions (q, L, j1, m) (M - 5, '), 5 = 2, ... , M - 1, that are, respectively, well defined and continuous on the set 1), r) E {range of (A, y(M - 5))}

{(A, y(M - 5), x(I), ... , x(M - 5 X (R~ X

R_ )M-S-I

R:

X

r ~ - cx(M - 5)N/l(M - 5)

- w(M - 5)NL (M - 5)} and maximize the value of fM-s(M - 5, A, y(M - 5), x(I), ... , x(M - 5 1), .) subject to the conditions of r(z(M - 5), r). Moreover, by substituting (q, L) (t, .) for x(t), t = 1, ... , M - 5 - 1, and PI (M - 5 - 1) + cxJM - 5)Pi+I (M - 5 - 1,') + (Pm + d)(M - 5)m(M - 5 1, .) for r in (q, L, j1, m) (M - 5, A, y(M - 5), '), 5 = 2, ... , M - 1 (with P(O, .) = m(O, .) = and A substituted for r when 5 = M - 1), we obtain a sequence of functions,

IJ:l

°

(q,

L, ii, m) (M -

5, .) : Q -+ R~ X

R_

X

Rk

X R~,

5

= 2, ... , M - 1,

that satisfy the conditions (i) ~((q, L, ji, m)(M - 5)) c ~(A, y(M - 5), Il;~M-S) (ii) (q, L, ii, m) (M -

5,

w)

=

(q,

L, P, m) (M -

5,

w)

and

a.e.

These results and the definition of V(·) establish the validity of T 30.2 (ii).

Step 12: The validity of T 30.2 (ii) would be vacuous if T 30.1 were false. In this step we establish the validity of T 30.1. We begin by observing that ~((q, L, j1, m)(I)) c

~(A, "1(1), Il;~I)

Temporary Equilibria under Uncertainty

833

and by letting

(q, l, {I, m) (I, w) =

(q, L, Il, m) (1, A(w), 1'(1, w)),

WEn.

Next we substitute (q, L) (1) for x(I) and {Il (I, .) + L~~f C);i(2) {Ii +1 (1, .) + d)(2)m(I, .) for r in (q, L, Il, m)(2, A, 1'(2), .) to obtain a function,

+

(Pm (q,

l, {I, m) (2,

.) : n ~ R~ x

R_

X

Rk

X R~,

which satisfies the condition !F ((q, l, {I, m) (2)) c !F (A, l' (2), Il;~ 2)' Continuing in the obvious way, we obtain a family of random variables, OCIS

= {(q,l, {I, m)(s, w); 5 =

1, ... , M},

with the following properties: (i) If (q, L) (t, .) is substituted for x(t), t = I, ... , 5, and {Il (5, .) + L~~f C);/s + I){Ii+l (5, .) + (Pm + d) (5 + I)m(s, .) is substituted for r in (q, L, Il, m)(s + I, A, 1'(5 + I), '), we obtain (q, l, {I, m)(s + I, '), 5 = I, ... , M - 1; and (ii) !F((q,

l, {I, m)(s))

c !F(A, 1'(5), Il;~s)'

5=

I, ... ,M.

It is evident that OCIS constitutes an expenditure sequence, and that the arguments we outlined in steps 9-11, with only obvious modifications, suffice to show that OCIS is a consumption-investment strategy. 30.7.2 Proof of T 30.5

The proof of T 30.5 is similar to the proof of T 30.2. Throughout the proof we let z be short for (p y, Px' W, PI' C);, PM' Nil' N M) and we let f(z,K,r)

=

{(y,x,L,d,LIl,M) E R~ x R~ x R+ x R+ x R~ X R~N~ x

[I,NMl :PII

+ d + C);J.l- PMM ~ r + pyy -

pxx

- wL,y ~g(K,x,L)}.

Moreover, we let z(t), C);(t), PM(t), t/J(t), and A denote values assumed, respectively, by (py,Px' W,PI' C);,PM,NwNM)(t, '), C);(t, '), PM(t, '), t/J(t, '), and A ( '), t = I, ... , N + 1. The proof is obtained in several steps as follows: Since g( . ) is continuous and does not vary with z, we can use UT 7 and only obvious additional arguments to demonstrate that t(·) is convex, compact, and continuous on the set Step 1:

E Rv+l+l+h+k+l+k+l X {( z,K,r)+ +

X . R·: / r:>++ " -P MN M - C);N/l }•

R h

834

Chapter 30

Step 2: In this step we shall define three functions, 10('), It(·), and 11 (.), and derive their properties. First 10('): , d(N), M)

10 (A, t/J(N), d(I),

= E{V(d(l),

,d(N),PM(N

+ I)M)IA,t/J(N)}.

From FBS 10 and FBS 12 (ii) it follows that (i) 10 ( . ): {range of (A, t/J (N))} x R~ x R+ ~ R is well defined, bounded, and continuous; and (ii) for each value of (A, t/J (N)), 10 (A, t/J (N), .) is a strictly increasing, strictly concave function. Next It(·): ft*(A, t/J(N), d(I), . .. , d(N -

max

1), K, r) 10(A, t/J(N), d(I), ... , d(N -

1), d, M).

(y,x,L,d,I,!J.,M) E r(z(N),K,r)

Then it follows from the properties of f (.) and from C. Berge's theorems VI. 3.1 and VI. 3.2 that (iii) It(·) is a well-defined, bounded, continuous real-valued function on the set, {(A, t/J(N), u, K, r) E {range of (A, t/J(N))}

x R:

x

R~-1 X R~+

r ~ -

(X, (N)N!J.(N)

- PM(N)NM(N)};

(iv) for each value of (A, t/J(N),ft*(A, t/J(N), .) is a strictly increasing, strictly concave function. Finally 11 ( . ):

ft (A, t/J(N = E

1), d(I), . .. , d(N -

1), K, {L, M)

{t.- (A, ",(N), d(ll, .... dIN -

1), K /11

+

:t.

1X,(N)/1i+l

- PM(NlM)IA,,,,(N - ll}. It follows from the properties of ft*(·) and from FBS 11 and FBS 12 (i) that

(v) 11 (.) is a well-defined, bounded, continuous real-valued function on the set {(A,t/J(N-l),u,K,J.1,M)E {rangeof(A,t/J(N-l))} X R

k

X R1: {L ~ -N!J.(N -

1),M ~ NM(N -

I)};

x

R~-1

x

R~+

Temporary Equilibria under Uncertainty

835

(vi) for each value of (A, t/J(N - 1)),[1 (A, t/J(N - 1), .) is a strictly concave function; (vii) for each value of (A, t/J(N - 1), M), ft (A, t/J(N - 1), ., M) is a strictly increasing function; and (viii) for each value of (A, t/J(N - 1), d(I), ... , d(N - 1), K, J1), ft (A, t/J(N 1), d(l), ... , d(N - 1), K, J1, .) is a strictly decreasing function. If N = 2, we can proceed directly to step 4. Otherwise we must continue as described in step 3.

Step 3: In this step we work backward on the tree structure of events facing the entrepreneur. We begin with 11 (.) and use arguments like those we used above to construct real-valued functions fs('), 5 = 2, ... , N - 1, that satisfy the conditions: (i) fs(') is a well-defined, bounded, continuous function on the set {(A,t/J(N-s),u,K,J1,M)E {rangeof(A,t/J(N-s))} X R

k

X

x

R~-s

x

R~+

R1 : J1 ~ NIl(N - s),M ~ NM(N - s)};

(ii) for each value of (A, t/J(N - s)),fs(A, t/J(N - 5), .) is strictly concave function; (iii) for each value of (A, t/J (N - 5), M), fs(A, t/J (N - 5), ., M) is a strictly increasing function; (iv) for each value of (A,t/J(N - s),d(l), ... ,d(N - s),K,J1),ft(A,t/J(N5), d(I), ... , d(N - 5), K, J1, .) is a strictly decreasing function; and (v) for each vector (A, t/J(N - 5), d(l), ... , d(N - 5), K + 1, J1, M) in the domain of fs(' ),fs(A, t/J(N - 5), d(l), ... , d(N - 5), K + I, J1, M) measures the maximum expected utility which the entrepreneur, conditional upon the event {w En: (A, t/J(N - s))(w) = (A, t/J(N - s))}, could obtain over the remainder of his planning horizon if he chose (d(N - 5), I, J1, M) in period N - 5 after having paid out (d(l), ... , d(N - 5 - 1) in dividends and accumulated a stock of capital equal to K in the preceding periods.

Step 4:

From steps 1-3, it follows that if we let

V(A, t/J(I), d(I),

K + I, J1, M) = IN-l (A, t/J(I), d(I), K + I, J1, M),

then V(·) has all the properties required of it in T 30.5 (i)-(iii). The proof of T 30.5 (iv) is, with only obvious modifications, like the proof of T 30.2 (ii). Those details I leave to the reader.

Chapter 30

836

30.7.3 Proof of T 30.7 In this subsection we shall establish the existence of a temporary equilibrium in a production economy which satisfies the conditions of MCBS 1MCBS3, MFBS I-MFBS4 and MMBS 1. The arguments we need to obtain this result are essentially the arguments I used to establish Proposition III in Stigum 1969 (pp. 551-553). In the economy of T 30.7 a temporary equilibrium is a vector, (p,

w, a, Pm' (x, [, il, m)1, ... , (x, L, il, mf, (y, L, a,I, p, 10.)1, ... , (y, L, a, 1, p, M)S),

which satisfies the following conditions: (i) (p, w,!X, Pm) P=

uE

{

E R~++h+I+1+k+S

R~+h+l+1+k+S :

n P,

where

n+h+I+1+k+S} Ui 1 ;

.L

=

1=1

(ii) for each i (x, [, il,

=

I, ... , T,

m)i

E

Xi;

- m- )i (X,- L-,J1,

E

r i(Ap, w, A a, Pm' A a, ~ - i N i Ni) m, v /l'

~

a

=

~1

AS

and

(a , ... , d),

ri(p, w, (X, Pm' d,

where

mi, Nt, N~) =

{(x, L, J1, m)

E

Xi : pX

+ wL + (XJ1

+ pmm ~ (Pm + d)m i }; Qi(p, w, a, Pm' a, Nt, N~, N~, (x, [, il, m)i)

=

max

(x,L,/l,m) e ri(p, w,a,Pm,a,mi,Nt,N~)

(iii) for each j (y,

pyj

ty E

=

Qi(p,w,a'Pm,a,Nt,N~,N~,x,L,J1,m);

I, ... , 5,

yj;

+ wLj =

max py

+ wL;

(y,L)e yi

(a, 1, p, M)j

E

Zj,

(Ji(p, w, a, Pm' NL, N~, N~, (a, 1, p, M)j) max (d,I,/l,M) e

where

ri(p, w,a,Pm,N~,NI."Mi,xi)

OJ (p, w, a, Pm' NL, N~, N~, d, I, J1, M),

Temporary Equilibria under Uncertainty

837

ri(p, W, \1., Pm' Nj, Nf.t, Mi, x j )

=

{(d, I,

f.1,

M) E Zi : (Pn+l' ... , Pn+h)I

+ dMj + Pm(Mj -

M)

+ \1.f.1 ~ pxj + pyj + wU}; (iv) let 0hn and 0hl be, respectively, h x nand h x 1matrices whose components equal zero; let I h denote the h x h identity matrix; and let

T

S

L + L (Ii D Xi

i=l

T

L

yj -

=

0;

••• ,

S•

xj)

j=l

S

[i -

i=l

I

fj

= 0;

j=l

T

S

L fl + L p,i = i

i=l

0;

and

j=l

T

,,-i_MAj-O L. m -, j

' - I,

] -

i=l

Let E denote the economy we described in sections 30.6.1 and 30.6.2. To establish T 30.7 we must construct two auxiliary economies, E(C) and E(C, b). We begin with E(C). To construct E(C) and for later reference, we first observe that our assumptions concerning the firms' production sets, i.e., MFBS 2 and FBS 9, imply that

Ct, Yi)

c {r E R: r ,;;; q'+h+l+1

for some C E R++.

(30.33)

From this it follows that we can find aCE R+ + which is so large that it satisfies equation 30.33 and the following conditions:

If (y, L) E yj, then S 'Iyd < C,

i = 1, ... , n

+ h + 1;

(30.34)

and T

(n

S

+ h + 1) + i=l L (Nl + IIN;II + IINmll) + j=l I (1lxjll + IIN~II + Nf.t) + (S + T)


yJ. The vector-valued function H('): R~ -+ R~ is called indecomposable if, for any x, y E R~ such that x ~ y, x =I y, and N(x, y) is a proper subset of M, Hi(x) =I Hi(y) for some i ¢ N(x, y). Moreover, H(') is called decomposable if and only if it is not indecomposable.

Two simple economies which illustrate this definition can be obtained as follows: Let Xl (t) denote an economy's stock of labor at time t; X2(t) the same economy's aggregate capital stock at time t; and F('): R; -+ R+ the economy's aggregate production function, which we assume to be linearly homogeneous and increasing. If in the economy, labor is "produced" by labor alone and capital is produced by combining both labor and capital,

Balanced Growth under Uncertainty

841

then the behavior of the economy can be described by the following two equations: Xl (t

+ 1) =

AX l

(t),

and

in which A denotes the rate of growth of the labor force, 5 the average propensity of consumers to save. Applying the above definition to these equations, we see that the economy whose behavior they represent is clearly a decomposable economy. Now let us make a slight switch in our assumptions. Let us think of labor as being produced both by labor and consumption. Then the behavior of the economy would be described by the following system of equations:

and

in which G(') is taken to be a linearly homogeneous, increasing function of its arguments. Our definition tells us that the above equations represent the behavior of an indecomposable economy. Of the preceding models, the first turns out to be the difference-equation analogue of the system of differential equations proposed by Solow in Solow 1956, while the second is just a particular realization of the general systems studied by Solow and Samuelson in Solow and Samuelson 1953. In studying the behavior of solutions to equations 31.1, Samuelson and Solow found that, under "reasonable" conditions on H('), there exists a positive number A, a positive vector V, and a real-valued function y(.) such that, for each nonnegative, nonzero initial vector x(O), lim (x(t)/ At) = y(x(O)) V. This they interpreted to imply that if an economy behaves in accordance with equations 31.1, then regardless of where it starts out, it will, for large t, approach closely a balanced growth pattern. Also this balanced growth configuration will be independent of the economy's initial starting point. In their papers, Samuelson and Solow did not specify whether or not their equation systems would apply to an economy which operates under uncertainty. It seems quite unlikely, however, that the relationship implied

Chapter 31

842

by the system 31.1 would hold exactly for an "uncertainty" economy. The best one could ever hope for would be that it would hold "on the average" in the sense that the conditionally expected value of x(t + I), given x(O), ... , x(t), would equal some linearly homogeneous function H(') of x(t). To see why, let us take a closer look at Solow's model. First the labor equation: In any economy which operates under uncertainty, the size of the labor force in any given period t would depend not just on the size of the labor force in period t - 1 but also on the composition of the population in t - 1. Moreover, this composition would have been determined by many random events that occurred in the past such as wars, droughts, and severe depressions. Also, even if the composition of the population and the size of the labor force in period t - 1 were known, the size of the labor force in period t would still depend on such imponderables as the extent to which students decided to stay in school one more year, the number of highway accidents, the occurrence of an earthquake, and the successful termination of a war in period t - 1. Therefore in an uncertainty economy, Solow's labor equation-seen as a description of the actual growth of the economy's labor force-could at best hold in the sense of "on the average" suggested above. Next the savings-investment equation: It is true that, in an economy without government and international trade, investment in period t ex post, (X2 (t + 1) - X 2 (t)), must equal savings in period t. It is not true, however, that actual investment need equal the average propensity to save times national income. One reason is that, while the savings function 5 = sY (where Y denotes national income) might be a good "long-run" assumption in the sense that over time consumers tend on the average to allocate a constant proportion of their disposable income to savings, it is maybe a poor approximation of consumer behavior in anyone short-run period. Second, even if 5 = sY represented a true short- as well as long-run relationship, there is no reason why in an uncertainty economy investors need choose to invest exactly that amount in each period. Thus-seen as a description of the allocation of savings and investment in an uncertainty economy-Solow's equation is suspect on the one hand because it might not represent the true equilibrium relationship between savings and investment and on the other hand because an economy operating under uncertainty need not ever be in equilibrium. Conclusion: for an uncertainty economy, the best one could ever hope for is that Solow's investmentsavings equation might hold "on the average" in the sense suggested above.

Balanced Growth under Uncertainty

843

If equation 31.1 holds only on the average, then it becomes important to determine whether or not the assumption that this relationship is exact is "crucial", i.e., whether or not Samuelson and Solow's results are sensitive to this feature of their models. In this chapter we test this sensitivity of Samuelson and Solow's result. Specifically, we ask the following question: Suppose that the behavior over time of an economy can be represented by a family of vector-valued random variables {x(t), t = 0, 1, ... }. Moreover, suppose that there exists a linearly homogeneous, nondecreasing vectorvalued function H( . ) such that, for each t = 0, 1, ... ,

+ 1)lx(o), ... , x(t)} = where E{x(t + 1)/x(O), ... , E{x(t

H(x(t))

with probability 1,

x(t)} denotes the expected value of x(t + 1) conditional upon the observed values of x(O), ... , x(t). Then does there exist a random variable g, a positive vector V, and a positive constant A such that

lim (x(t)/ At)

= gV

with probability 1,

t->oo

and such that E{gjx(O)

=

x}

>

°

for all x ~ 0,

x

-=1=

07

If the answer to this equation is "yes," then Samuelson and Solow's assumption that equation 31.1 holds exactly is in fact not crucial. To answer this question, we have to point up one other aspect of Samuelson and Solow's theories which might not be apparent in the general formulation (31.1) but which comes out clearly in Solow's model. This aspect concerns the relationship between flows and stocks. In any period in which the two equations in Solow's model are satisfied, the demand for stocks of labor and capital is equal to the supply of labor and capital. Thus there is "stock equilibrium." Moreover, the supply and demand for output (flows) are equal. Thus there is also a "flow equilibrium." However, the relationship between stocks and flows need not be one of equilibrium. In fact, stocks and flows cannot be in equilibrium vis-a.-vis each other unless the economy travels along the balanced growth path. Solow has shown for his economy that if flows and stocks are not in equilibrium vis-a.-vis each other, then there exists an inexorable force which in each period moves stocks and flows closer to an equilibrium configuration. Such a "gravitational" force also operates in the economy studied in this chapter (see T 31.2 and T 31.4). However, since in our economy there need be neither a stock nor a flow equilibrium in any given period, we must-to establish the existence of a balanced growth path-introduce an

Chapter 31

844

assumption on the variance of the distribution of the x(t)' s that ensures that the economy will, with large probability, move within the "sphere of influence" of this force. Such an assumption is stated in equation 31.31 for the case). > 1. 31.1 Balanced Growth under Certainty

In this section we shall discuss salient characteristics of solutions to equation 31.1. We begin with the case when H(·) is indecomposable. 31.1.1 The Indecomposable Case

The study of solutions to equations 31.1 is essentially a study of the properties of nonnegative, linearly homogeneous, vector-valued functions and their iterates. Nonnegative, linearly homogeneous, vector-valued functions have much in common with nonnegative square matrices. For example, let A be an n x n symmetric, nonnegative matrix and assume that there is an integer t ~ 1 such that the components of At are positive. Then A has a strictly positive eigenvalue p which is simple and exceeds all other eigenvalues in absolute value. Moreover, there exist u, v E R~ + such that

Av

= pv;

uA

= pu;

(31.2)

and u· v

=

(31.3)

1.

Solow and Samuelson showed in Solow and Samuelson 1953 (pp. 415416) that nonegative, linearly homogeneous, vector-valued functions have analogous properties. To wit, they gave us T 31.1: T 31.1 Let H('): R~ --+ R~ be a continuous, nondecreasing function which is homogeneous of degree 1 and indecomposable. Then there exist a A E R+ + and a V E R~ + such that H(V)

=

AV.

(31.4)

The Ais unique and V is determined up to a multiplicative positive constant. If H( .) in addition is differentiable at V and the components of H , (V)

=

(Hij(V))

=

----a;;-

(OHi(V))

are positive, then there is a U E UH'(V)

=

AU;

R~ +

such that

Balanced Growth under Uncertainty

845

and UV

=

1.

The iterates of linearly homogeneous, vector-valued functions also have much in common with the iterates of nonnegative square matrices. For example, let A, v, p, and u be as in equations 31.2 and 31.3. Then there exist constants k and ry,. such that 0 < ry,. < 1 and, for all t = 1, 2, ... and all 1 ~ i, j ~ n, (31.5)

The iterates of a linearly homogeneous, vector-valued function H( . ), which we denote by H t ( • ), have analogous properties. Consider T 31.2: Let H('): R~ ~ R~ be a continuous, nondecreasing function which is homogeneous of degree 1, and indecomposable. Moreover, let A and V be as in equation 31.4 and assume that H(') has a derivative at V, H'(V), whose components are positive. Finally, assume that

T 31.2

H(x)

> 0

x E (R~ -

whenever

{O}).

Then there exist a continuous, nondecreasing function y('): R~ ~ R+ and constants K, K*, and 'Y. such that 0 < 'Y. < 1 and, for all t = 1, 2, ... , and x

I (H t (x)lA t )

y(x) V

-

I

~ K'Y. t Ilxll;2

E R~,

(31.6)

and

I (Ht(x)/IIHt(x) II) - (V/IIVII)II

t

~ K*'Y. .

(31.7)

We shall assume the validity of T 31.1 and sketch a proof of equation 31.6. Proofs of T 31.1 and equation 31.7, respectively, can be found in Solow and Samuelson 1953 and Stigum 1972 (pp. 55-56). In order to establish equation 31.6 we must first establish the existence of a function y('): R~ ~ R+ which satisfies .

Ht(x)

lim -~ = y(x)v' t-+x·

x E R~.

(31.8)

/,,,,

To do that we let .

ry,.(t,x)

= mm

{J(t,x)

=

1 ~i~n

.

m~n

1 ~/~n

(Ht(X))i --t-' ), Vi

(Ht(X))i

~V--, ~,

E R~,

t=

0, 1, ... ,

x

t=

0, 1, ... ,

X E R~.

and

i

Then, for each t = 0, 1, ... , ry,.(t, .) and {J(t, .) are nondecreasing, continuous functions which, for all x E R~, satisfy

846

Chapter 31

a(O, x) ::::; ... ::::; a(t, x) ::::; ... ::::; f3(t, x) ::::; ... ::::; f3(0, x).

We shall show that lim a(t, x) t-+oo

= lim f3(t, x).

(31.9)

t-+oo

In order to establish equation 31.9, we first note that Ht(x)

~

=

a(t,x) V

+ (f3(t, x)

-

(31.10)

a(t,x))y(t,x),

°:: :;

for some vector y(t, x) with Yi(t, x) ::::; ~, i = 1, ... , n, and Yk(t, x) ~ Vk for some k, k = 1, ... , n. Then we observe that, for any nonnegative y, H(V

+ y) = H(V) + H'(V)y + o(y)

and that, for some sufficiently small e > H(V

+ y)

~ H(V)

(31.11)

(see note 3)

°and all Y such that °:: :; Y ::::; eV,

+ (1I2)H'(V)y.

(31.12)

These observations can be used to establish equation 31.9 in the following way: Deduce from equation 31.10 that

= (~)H(a(t,x)V + ~

(I1)

(P A2 , it follows from a theorem of Frank Fisher's that we can find a V 2 E R~-+nl such that (A 1 ,(V 1 , V 2 )) satisfies equation 31.25. Specifically, we have T 31.3: T 31.3 Let H(o) = (H 1 (·),H 2 (.» and assume that H 1 (.) satisfies equation 31.21 and that H 2 (.) satisfies equation 31.22. Moreover, suppose that H 1 (.) and H 2 ( .) are continuous, nondecreasing functions which are homogeneous of degree 1 and suppose that both H 1 (. ) and H 2 (0 1 ,0) are indecomposable. Finally, let (A 1 , V 1 ) and (A2' il 2), be as in equations 31.23 and 31.24, respectively, and

assume that A1

>

A2

(31.26)

and that, for any x, y E R~ with x ~ y, N(x, y) =F ~ and {n 1 + 1, ... , n} not contained in N(x, y), there is an i E {n 1 + 1, ... , n} such that i ~ N(x, y) and

Hl(x) > Hl(y)· Then there exists a unique vector V 2 E R~-:l such that (31.27)

The proof of this theorem is involved. Therefore, since the meaning of the theorem is clear, I omit the proof and refer the reader to Fisher 1963 (pp. 79-81) for the details of Fisher's own proof. In order to study growth in decomposable economies, we must establish analogues of T 31.2 for the iterates of (H 1 ( • ), H 2 ( • ))0 We shall do that for the case A1 > A2 and present two examples from the case A1 = A2 . For details concerning the case A1 < A2 , the interested reader is referred to Kesten and Stigum 1974 (ppo 356-361) for details. T 31.4 Suppose that H(o) and that

=

(H 1 ('),H 2 (o») satisfies the conditions ofT 31.3

850

Chapter 31

XIE(R~I_{O})

implies

H(X I ,X 2

»0.

Suppose also that the matrix {Hi/VI, V 2 )}

=

{oHj(V I , V 2 loxj

}

exists with Hij(V I , V 2 )

>

0,

1 ~ i, j ~ nl

nl ~ i ~ n, 1 ~ j ~ n.

and

Then for each e > 0 and e < (/I VI 11I11 (VI, V 2 ) /I), there exist finite positive constants KI and rx and a continuous, nondecreasing, linearly homogeneous function y('): R~1 ---+ R+ such that 0 < rx < 1 and such that, for all t ~ 1 and all x E (R~ - {O}) for which e ~ (/lx l /l//lxll), II(Ht(x)n~) -

y(XI)(V I , V 2 )II ~ Klrxtllxll.

(31.28)

Moreover, there exists a constant K2 which depends on e but not t and x such that, for (11x l 11/11xll) > e,

I (Ht(x)/IIHt(x) II) -

((VI, V 2 )/II(V I ,

V2 )11)11 ~ K 2 (e)rx t.

(31.29)

Since the meaning of the theorem is clear and since the ideas of the proof are similar to the ideas of the proof of T 31.2, I refer the reader to Kesten and Stigum 1974 (pp. 352-356) for a detailed proof and omit the proof here. For the purposes of this chapter equations 31.28 and 31.29 tell us all we need to know about balanced growth under certainty in a decomposable economy which satisfies equations 31.21-31.24 and 31.26. From them we deduce that lim (x(f)/ A~) = y(x i )(V I , V 2 ), t-+oo

and t-+oo

Hence the direction of growth of a large economy which satisfies equations 31.21-31.24 and 31.26 is nearly independent of its starting point. In addition, there is a gravitational force which drives the economy toward its balanced growth path. The behavior of a decomposable economy for which the AI in equation 31.23 differs from the A2 in equation 31.24 is like the behavior of an indecomposable economy. The behavior of a decomposable economy for which Al = A2 is very different from that of an indecomposable economy. The next two examples will establish that fact. E 31.2

and that

Suppose that A > 0 and that n

=

2. Moreover, suppose that 0


such that lim (L(t)/ At) = q>

a.e.

(31.32)

t-+oo

and E{q>IL(O)} ~ 1.

Whether or not E{q> IL(O)} > 0 depends on the distribution of the L(t)' s. We consider three different cases. 1. Suppose that the probability distributions induced by the L(t)'s and P(·) satisfy the probability law of a Galton-Watson process. Then it can be shown that the distribution of q> has a jump at the origin equal to q E [0, 1] and allows a continuous density function on the set of positive real numbers (see Stigum 1966, p. 697). Moreover, q


= a.e.. In interpreting the preceding result, note that equations 31.34 and 31.35 imply that w. pr. 1 lim limt.... ooL(t,w)/et t = or 00 according as log et >,u or log et 1 and < K < 1,

°

°

lim P({w En: L(t,w)/et t ~ K}) t ....

oo

=1

and

lim P({w En: L(t,w)/et t ~ K}) = 1. t ....

oo

If £Ilog 8(t) 2 = 00 and log et = ,u, the "limiting distribution" of (L(t)/ et t ) either does not exist or is concentrated on and + 00. From these observations it follows that, for the case under study, there is no "right" normalizing constant et which ensures that (L(t)/et t ) converges to a random variable whose distribution is concentrated on [0,00) with some positive mass on

°

1

(0, 00).

3. Suppose that the L(t) satisfy the conditions

t=

0, 1, ... ,

for some finite constants K o and 10gK" -logA. 31.2.2 Balanced Growth When n ~ 2

So much for balanced growth of a univariate economy. Next we discuss balanced growth in an indecomposable multivariate economy. We begin by establishing a theorem concerning the possibility of growth in such economy. Suppose that n ~ 2 and let {x(t, w); t E T} be an n-vector-valued random process on a probability space (O,.?, P( . )). Assume that there exists a continuous, nondecreasing, linearly homogeneous function, H('): R~ ~ R~, such that, for all t E T,

T 31.5

E{x(t

+ l)lx(O), ... , x(t)} = H(x(t))

a.e.

In addition, assume that H(') satisfies the conditions of T 31.2 and that the A of equation 31.4 satisfies A > 1. Finally, assume that, for all t E T, P{x(t

+ 1) ~ H(x(t))lx(O), ... , x(t)} >

Then for each 0 < k < on k and x) such that

P{ Ilx(t) II

~ klx(O)

00

and x

O.

E (R~

(31.41)

- {O}), there exists a finite t (depending

= x} > o.

The proof goes as follows: Let A, V, and y(') be as in equation 31.6 and observe that y(H(x)) V

= lim t~OC!

H(X)) H t ( ------,--;-

=

A

Consequently, for all x

E R~,

A lim t~OC!

Ht+l(X) -------;r+l A

=

Ay(X) V.

Balanced Growth under Uncertainty

=

y(H(x))

857

Ay(X).

(31.42)

Next let qJ(x)

=

P{x(l) ~ H(x)lx(O)

=

xE R~,

x},

and let qJo = qJ(x(O)). From equations 31.42 and 31.41 it follows that P{y(x(l)) ~ Ay(x(O))lx(O)} ~ qJo

>

O.

In addition, let qJ1 (x(I))

=

and let qJ1 qJ1 (x(l))

>

P{x(2) ~ H(x(l))lx(O), x(l)},

=

E{ qJ1 (x(l))lx(l) ~ H(x(O)), x(O)}. Then, by equation 31.41,

0 for all x(l) E ((R~ - {O}) n (range of x(l, .)) and

P{x(2) ~ H 2(x(O))lx(O)} ~ P{x(2) ~ H 2(x(O)), x(l) ~ H(x(O))lx(O)} ~ qJoP{x(2) ~ H(x(l))lx(l) ~ H(x(O)), x(O)}

= qJoE{P{x(2)

~ H(x(l))lx(l),x(O)}lx(l) ~ H(x(O)),x(O)}

= qJoE{ qJ1 (x(l))lx(l)

~ H(x(O)), x(O)}

=

>

qJOqJ1

O.

(31.43)

From equation 31.43 we find that P{y(x(2)) ~ A2y(x(O))lx(O)} ~ qJOqJ1

>

O.

By continuing the process begun above, we can find positive numbers qJ2' ... , qJt such that, for all t ~ 2,

n qJi' t

P{y(x(t)) ~ Aty(x(O))lx(O)} ~

i=O

To conclude the proof of the theorem, we need only observe that if y(x(t)) ~ Aty(X(O)), then

~ I X(t) I :;/

At

y(x(O) ~ y(x(t)/llx(t) 11):;/

K At 14

,

where K 14 = y(x(O))/maxllxlI=l y(x). Because then, for t so large that K 14 At > k, we find that

n qJi > O. t

P{ Ilx(t) I ~ klx(O)} ~ P{y(x(t)) ~ Aty(x(O))lx(O)} ~

i=O

The assertion made in T 31.5 is an analogue of the last half of proposition fl/J. Next I shall introduce a weak Lipschitz condition on H(') and establish a theorem that corresponds to the first half of proposition fl/J.

Chapter 31

858

Suppose that n ~ 2 and let {x(t, w); t E T} be an n-vector-valued random process on a probability space (0, $', P(· ». Assume that there exists a continuous, nondecreasing, linearly homogeneous function, H('): R~ ~ R~, such that, for all t E T,

T 31.6

E{x(t

+ l)!x(O), ... , x(t)} = H(x(t»

a.e.

In addition, assume that H(') satisfies the conditions of T 31.2 and that the A of equation 31.4 satisfies A > 1. Finally, assume that (1) the x(t, .) satisfy the variance condition described in condition iv in section 31.2 and (2) for each s > 0, there is a constant K(s) such that, for all x, y E R~ with Ilxll = I yll = 1 and x ~ s, y ~ s,

liH(x) -

H(y)

I

Ilx - yll.

~ K(s)

(31.44)

Then, with Ks

=(

min Hi(Z») ( max 1 ";;i";;n

Ilzll=l

n

Hi(z»)-l

1 ";;i";;n

Ilzll=l

for each 0 < s < K s and 1] E (0,1), there exists a positive integer K(1], s) such that on the set where s ~ (x(t)/llx(t) II) and Ilx(t) I ~ K(1], s), p{ lim (x(s, w)/ AS)

= g(w) V for some finite g(w) >

Olx(O), ... , x(t)}

s..... oo

~ 1-

1].

With some obvious modifications, this theorem can be established by using arguments similar to those used by H. Kesten in proving theorem 6.1 in Kesten 1970 (pp. 91-98). Since these arguments are involved and since the ideas of the proof generalize upon the ideas of the proof I gave of the first half of flJ, I omit the proof of T 31.6 for brevity's sake. With T 31.5 and T 31.6 well in hand, we can establish the promised theorem on balanced growth in an indecomposable economy. Suppose that n ~ 2 and let {x(t, w); t E T} be an n-vector-valued random process on a probability space (0, $', P(· ». Assume that the x(t,') satisfy the conditions of T 31.6. Then

T 31.7

P{llx(t, w) I will remain bounded or lim x(t, ~) = g(w) V for some finite t--+oo

g(w)

>

OIX(O)}

=

A

1.

(31.45)

If the x(t,') also satisfy equation 31.41, then for all x(O)

p{!~~ (x(t,w)/At) =

g(w) V

E (R~ -

for some finiteg(w) > Olx(O)} > O.

{O}), (31.46)

The proof goes as follows: It is clear that on the n set where lim SUpt.. . oo //x(t, w/l < 00, the random variable g(') defined by

Balanced Growth under Uncertainty

-I.

( )V gw

859

x(t, w)

lm-,-t-

t--+OCJ

It.

is well defined and equal to zero. Therefore, to establish the validity of equation 31.45 we need only show that g(') is well defined, finite, and positive, a.e. on the Q set where lim SUPt--+OCJ Ilx(t, w) II = 00. This we will do by showing that for each '1 > 0 and x ~ 0, x i= 0, x(O) = x,

p{w E Q : lim sup IIx(t, w) II = 00 and there exists no finite, positive t--+OCJ

g(w) such that !~~

x(t, w)

-At - =

g(w) Vlx(O)

}

~ 2'1.

(31.47)

To establish equation 31.47 we begin by fixing x(O) = x, x ~ 0, x i= 0, 0, and 0 < G < Ks . Next, we let "x(t) n. r.l. b." mean that there exists no finite positive g such that lim t --+ OCJ (x(t)/ At) = gV. Finally, we let

'1

>

Ak

= {x: Ilxll

~ k, (x/llxll)

< G for some i},

and &

= {x: Ilxll

~

k, (x/llxll)

~

G},

and note that lim sup Ilx(t, w) II = { w E Q : t--+OCJ c

oo}

{wEQ:x(t,w)ineach&,k= I, ... ,Lo.} U {w

E Q: x(t, w) in each A k , k

=

1,2, ... , Lo.},

(31.48)

where "i.o." means infinitely often. It is easy to show that, for each pair (k, m) with k large and greater than m, we can find a positive constant bk , m such that for t = 0, I, ... , P{x(t

+ 5) E Bm for some 5 ~

Ilx(O), .. . ,x(t), x(t) E A k } ~

bk,m'

Hence (see Breiman 1968, chapter 5, problem 9, p. 97) if we "throw out" a null n set, then {w En: x(t, w) E AkLo.} c {w En: x(t, w) E BmLo.}.

(31.49)

From equation 31.49 it follows that, if we "throw out" a null Q set, then for each fixed m, {w En: x(t, w) in each A k , k = 1,2, ... , i.o.} c {w E Q: x(t, w) E

BmLo.}.

860

Chapter 31

The preceding result can be used to establish equation 31.47 in the following way: First, for the pair (1], e) let k" = K(1], e) be chosen as in T 31.6. Then it follows from T 31.6 that P{x(t) E each ~ P{x(t) E

1\ i.o. but x(t) n.r.1. b.lx(O)} 1\

~

for some t but x(t) n. r.1. b.lx(O)}

00

~

L

P{x(t) E 1\~ for the first time at t = slx(O)}

s=l

x P{x(t)n. r.1. b.lx(O), ... , x(s), x(s)

E

1\J

~ 1],

(31.50)

and P{ x(t) E each A k i. o. but x(t) n. r.1. b.lx(O)} ~ P{x(t) E each A k i. o. but x(t) not in 1\~ i. o./x(O)}

+ P{x(t) E 1\ i. o. and x(t) n. r.1. b.lx(O)} ~ 0 + P{x(t) E 1\~ for some t but x(t) n. r.1. b.lx(O)} ~

~ 1].

(31.51)

The validity of equation 31.47 is an immediate consequence of equations 31.50, 31.51, and 31.48, and the proof of the validity of equation 31.45 under the conditions stated is complete. The validity of equation 31.46 can be established by observing that, under the conditions stated in T 31.7, theorems T 31.5 and T 31.6 imply that, for all x ~ 0, x =I- 0, and x(O) = x,

p{w En: lim sup Ilx(t, w) II = 00 IX(O)} > o. ' .... 00

Theorem T 31. 7 can be interpreted to say: If an uncertainty economy satisfies the conditions of T 31.7, then with positive probability the economy will, for large enough t, come arbitrarily close to balanced growth. Moreover-and this is definitely the most interesting aspect-both the eventual rate of growth and the eventual balanced growth configuration of the uncertainty economy will be identical with those that the certainty theory of Samuelson and Solow would have predicted. The conditions we imposed on the x(t,·) in T 31.7 are in one sense the best possible sufficient conditions for balanced 'growth under uncertainty. Specifically, there are processes which satisfy equations 31.30, 31.41, and 31.44 and for which equation 31.46 is false. Here is one case in point:

Balanced Growth under Uncertainty

861

Let {O(t, co); t = 1,2, ... } be an n-variate, purely random process on a probability space (0, f7, P( . )) and assume that the O(t, .) are nonnegative and have a finite positive mean and a finite diagonal covariance matrix with positive diagonal entries. Moreover, let w(O) E R~ + be a fixed vector; let wi(t + 1, co) = Gi(Oi(t + 1, co), w(t)), t = 0,1, ... , i = 1, ... , n, where w(t) = (WI (t), ... , wn(t)), and assume that Gi (·): R~+1 ~ R+ is continuous, that Gi (·, w(t)) is bounded, and that Gi(Oi(t + 1),·) is increasing, linearly homogeneous, and strictly quasi-concave, i = 1, ... , n. Finally, assume that the O(t,·) are distributed independently of the w(t), and let

E 31.5

Hi(w(t))

=

E{ Gi(Oi(t

+ 1), w(t))lw(t)}, i =

1, ... , n,

and H(w(t)) = (HI (w(t)), ... , Hn(w(t))). Then it is easy to verify that H(·) is increasing, continuous, linearly homogeneous, and strictly quasi-concave and satisfies equation 31.44. It is also clear that the probability distributions of the w(t) satisfy equation 31.41 and that, for all t = 0, 1, ... ,

+ 1)lw(O), ... ,w(t)} = H(w(t))

E{w(t

a.e..

Finally, if the pair (A, V) E R+ + x R~ + is such that H(V) = AV, then the preceding observations and theorem 1 in Stigum 1972a (p. 47) imply that there exists a random variable g such that lim t--.oo

w(t) -:;t

= gV

a.e.,

It.

and E{glw(O)}




0

and

P{g>

Olx(O)} > 0

(31.64)

whenever Xl (0) =I- 0, Ilx(O) II ~ M( IIx (0) 1I/IIx(O) II) for a suitable finite function 1

M('): (0, 1] -+ (0, 00).

If

P{ Ilx 1 (s) 1I/IIx(s) II) ~ e, Ilx(s) II ~ M(e) for some slx(O)} > 0

(31.65)

864

Chapter 31

for some G > 0 and all x(O) with x I (0) =I 0, then ~quation 31.64 holds for all (0) =I o. On the other hand,

Xl

Xl (t)

= 0 for all t and g =

0

a.e. on {Xl (0)

=

O}.

Finally, E{glx(O)}


0. Then H( . ) satisfies the conditions of T 31.4, equation 31.60, and equation 31.62; and equations 31.63 and 31.64 hold (see Kesten and Stigum 1967, pp. 335-336, for a proof) if and only if E{xd1) log Xl (1)/Xl (0)

=

I}


0, x 2 ~ O}. We assume (1) that capital and output are both publicly owned, (2) that workers share equally in national output, each one's share being equal to a fraction of labor's average product, and (3) that the general surplus (i.e. net national product-wage allotments) is used in toto by government to augment the nation's capital stock. Assume in addition that the share of national output credited on the public books to each worker is so ample that a worker is "more likely not to spend it all." If a worker does not fully expend his credit, the balance is turned iT\to general surplus. Under extraordinary circumstances a worker might be allowed to spend more than his allotment but never more than labor's average product. The excess above the usual allotment would be taken out of the general surplus. More precisely we are assuming (4) that the fraction of labor's average product consumed in each period by the ith worker can be represented by a random variable Ci with range (0, 1], and (5) that, if Xl (t) and X 2 (t) denote the labor and capital in period t, then for all t: E 31.10

xtC/)

x 2(t

+ 1) = X2(t) + L

(1 - cJ[F(X l (t),X 2(t))/X l (t)].

i=l

We will also assume (6) that the distribution of Ci is independent of i and constant

866

Chapter 31

over time and that, for each pair (i, j), Ci and Cj are distributed independently of each other and of labor and capital. Finally, we assume (7) that the growth of the labor force can be represented by a Galton-Watson process with mean A > 1 and finite variance (J2. The preceding assumptions allow us to describe the development over time of our utopian economy in terms of a random process, {(x l (t,W),x 2(t,w)); t E T}, with the following properties: E{xt(t

+ 1)lx(O), ... ,x(t)} =

AXl(t)

(31.67)

a.e.,

and E{x 2(t

+ 1)lx(O), .. . ,x(t)} = x 2(t) + sF(x(t))

where

5

E{[xl(t

=

a.e.,

(31.68)

E(1 - cl ). Moreover,

+ 1) -

AX l (t)]2Ix(0), ... ,x(t)}

+ 1) -

X2(t) - sF(x(t)) )2Ix(0), ... , x(t)} ~ K Ilx(t) II

=

a.e.,

(J2 Xl (t),

(31.69)

and E{(x 2(t

a.e.

(31.70)

for a suitable finite constant K.

In concluding the chapter it is interesting to observe that equations 31.63 and 31.64 may obtain even if equation 31.55 is not valid. In order that a vector-valued random process satisfy equations 31.63 and 31.64, it is sufficient that there exist functions H('), HI (.), and H 2 ( .) that satisfy equations 31.59-31.62 and the conditions of T 31.4. To bring this point home, we consider the following variation on the theme of E 31.10, which also comes from Kesten and Stigum 1974. Consider the economy of E 31.10 and modify the assumption concerning the probability distribution of the Ci as follows: Let Yl' Y2' ... be identically and independently distributed, nonnegative random variables and assume that

E 31.11

o < E{ yd} =

J1


1.

Z)-1

J

and

(32.47)

In addition, let B('): C -+ C be defined by m

= L

zmB(z)

bkz m- k

k=O

~

and define

n m

=

(z -

(32.48)

wi)'

)=1

=

g(t, w); t E

T} by

m

= L

~(t, w)

bkx(t - k, w),

WE 0,

t E T.

(32.49)

k=O

Then it is easy to verify that ~ is an orthogonal wide-sense stationary process and that there exist complex constants 1'" [ = 0, 1, ... , and a such that lal < 1, 11',1 < Klal', 1= 0, 1, ... , for some K and 00

x(t, w)

=

L YI~(t -

1, w)

a.e.,

t E T.

1=0

Moreover, m

¢Jr,1 (w)

= -

L bkx(t + 1 k=1

k, w)

a.e.,

t E

T.

Next we record an important corollary of T 32.8, the proof of which I leave to the reader: Let X = {x(t, w); t E T}, ~ = {~(t, w); t E T}, and V = {v(t, w); t E T} be as described in T 32.8; and let ¢Jt.v(·) denote the linear least-squares predictor of

T 32.9

Distributed Lags and Wide-Sense Stationary Processes

x(t

+ v,

.) based on our knowledge of x(s, .) for 5 ~ t. Then, for all t E T, 00

¢Jt,v(w)

895

=L

Yj~(t

+v -

j, w)

+ v(t + v, w)

a.e.;

j=v

and

32.2.4 Kolmogorov's Theorem

The preceding theorem presents one characterization of

t E T.

0,

(32.51)

In the next and last theorem of this section we state necessary and sufficient conditions for the inequality in equation 32.51. This remarkable theorem is due to A. Kolmogorov. A proof of the theorem is given in Doob 1953 (pp. 577-578). T 32.11 Let X = {x(t, w); t E T} be a wide-sense stationary process with spectral distribution function, F('): [ - n, n] ~ R, and assume that T = { ... , - 1, 0, 1, ... } and that Ex(t) = o. Then X satisfies the conditions in equation 32.51 if and only if F' (A) > 0 a.e. in [ -n, n] (Lebesgue measure) and

f:1t log F'(A) d/L(A) >

-

(32.52)

00.

The power of T 32.11 and the import of T 32.10 are exemplified in the following example, which concludes our discussion of prediction with wide-sense stationary processes. E 32.8 Let X = {x(t, w); t E T} and ~ = {~(t, w); t E T} be wide-sense stationary process and assume that T = { ... , -1,0, I, ... }, Ex(t) = 0, t E T, and ~ is an orthogonal random process. In addition, suppose that x(t, w) =

~(t,

w) -

~(t -

1, w),

WEn,

t E T.

Then the spectral distribution function of X, F('): [ -n, n] ~ R, is given by

= £1 ~(tW

f:1t 11 -

where /L('): !?J

~ [0, 2n]

F(A)

f:1t 10gl1 -

i 2 e- .l.1 dJ1(J.)

i e- .l.\2 d/L(A),

is Lebesgue measure. Since

=

0,

X satisfies equation 32.52 and m, Go (') = 0, and if m ~ n, Go (') is of degree m - n. But if this is so, then the arguments needed to establish equation 32.74 and the linearity of [ .]+ imply that

Chapter 32

912

zE

C,

Izi

~ 1,

(32.75) where G('): C --+ C is identically 0 if n > m and a polynomial of degree m - n if m ~ n. From equation 32.75 it follows that there is a 1 ~ k ~ n and a polynomial F( . ): C --+ C of degree n - k such that y(z) [ 1-

G(z)A(z)

] +

pZ-l

+ F(z)

zE

A(z)

C,

Izi

~

1.

Consequently, by condition iii of T 32.12, e(z)

=

a (G(Z)A(Z)

+ HZ))

K 2 p(z)B(z)

,

zE

C,

Izi

~ 1;

and we can conclude the proof by letting D(z) = K 2 p(z)B(z), which is a polynomial of degree (m + 1), land by letting H(z) = a(G(z)A(z) + F(z)), which is a polynomial of degree n - k, if n > m, and of degree m, if m ~ n. Theorem T 32.13 establishes a remarkably simple relationship between the spectral representation of X and the structure of the distributed tag that relates the solution of equation 32.72 to X. This relationship is exemplified in E 32.11, where (n, m) = (0,0) and (r,s) = (1,0), and in E 32.12, where (n, m) = (1,0) and (r,s) = (1,0). Another example is provided by E 32.13, where (n, m) = (0, 1) and (r,s) = (2, 1). E 32.13 Let X = {x(t, OJ); t E T} and ~ = {(t, OJ); t E T} be as described in T 32.12 and suppose that there is a constant fJ E (0, 1) such that x(t, OJ)

=

~(t,

Then for all y(z)

=1-

b(z)

=

OJ) - fJ~(t - 1, OJ), Z

E

OJ E

a,

t E T.

C such that Izi ~ 1,

fJz;

a((1 - pfJ) - fJ z); K 2 (l - pz)

and e(z)

=

H(z) D(z) ,

where H(z)

=

a((1 - pfJ) - fJz)

and

D(z)

= K 2 (I

- pz)(1 - fJz).

In commenting on McLaren's results, H. Liitkepohl gives an example where the kin T 32.13 (iv) is bigger than 1 (LiitkepohI1984, pp. 504-506).

Distributed Lags and Wide-Sense Stationary Processes

913

He also observes that H(·) and D(·) may have common factors even if A(·) and B(·) do not. A k > 1 and common factors of H(·) and D(·) simplify the structure of c(·) but are hard to determine on a priori grounds alone. Hence assuming k = 1 and disregarding common factors of H( . ) and D(· ) are probably the only options available to an econometrician.

Trends, Cycles, and Seasonals in Economic Time Series and Stochastic Difference Equations

33

In this chapter we begin by discussing various ways of modeling trends, cycles, and seasonals in economic time series. All of them can be rationalized by assuming that an economic time series is a partial realization of a family X of random variables which satisfy a stochastic difference equation. If the initial conditions of the difference equation are fixed, X is an ARIMA process. If the initial conditions are random, X is a dynamic stochastic process. Two sections of the chapter are given to detennining the adequacy of different schemes for modeling trends, cycles, and seasonals; we study the salient characteristics first of ARIMA processes and then of dynamic stochastic processes. Our conclusion is that an economic time series may be generated by a dynamic stochastic process, but it is unlikely that it can be generated by an ARIMA process. 33.1 Modeling Trends, Cycles, and Seasonals in Economic Time Series

In this section we consider the behavior over time of a single economic variable x. We assume that a time series of observations on x can be decomposed and written as a sum of four components, xt

=

Pt

+ Ct + St + '7t'

(33.1)

where p, C, 5, and '7, respectively, denote the trend, cycle, seasonal, and random component of the behavior of x. Our objective is to discuss various ways of modeling the behavior over time of p, c, and s. 33.1.1 Trends

The tenns "trend," "cycles," and "seasonals" have a more or less definite meaning in economics. Following E. Malinvaud (see Malinvaud 1966,

Trends, Cycles, and Seasonals

915

p. 440), we say that the trend is "a slow variation in some specific direction which is maintained over a long period of years." The cycle is "a movement, quasi-periodic in appearance, alternately increasing and decreasing." The seasonal movement is "composed of regular weekly, monthly, or yearly variations." Mathematically the trend is often represented as a polynomial in t whose coefficients are either constant or vary exponentially with t; for example,

Pt

=

rt

+ yt,

t E T;

(33.2)

or (more generally), m

Pt

=

nj-I

I I i=1 j=O

rtij(tj

zD,

t E T.

Then there exist an integer n and constants ak , k n = I~=I ni , ao = 1, and

(33.3)

=

0, 1, ... , n, such that

n

L akPt-k = 0, k==O

t E T.

In the case of equation 33.2, the ak are obtained by equating coefficients in 2

L

ak z - k =

1)2;

Z-2(Z -

k==O

and in the case of equation 33.3 we determine the ak from n

L

ak z - k =

n (z m

z-n

k=O

Zi)n j •

i=1

Some econometricians believe that the preceding view of the trend is much too simplistic. For example, instead of equation 33.2, they may postulate the following system of equations: I

Po

=

Pt

=

rt

and

f3I

=

f3;

(33.4)

Pt-I

+ f3/f

t

=

1,2, ... ;

(33.5)

f3t-1

+ ~t'

t=

2,3, ... ,

(33.6)

and f3t

=

where the ~t constitute a purely random process with mean zero and finite variance (Jt Then the f3t determine how the slope of the trend changes from one period to the next, and the trend is characterized by equation 33.4, PI = Po + f3I and

Chapter 33

916

t = 2,3, ... ,

(33.7)

where 5 is a shift operator that shifts Pt to Pt+l' The econometricians, who believe that the equations in 33.4-33.6 represent a preferable alternative to equation 33.2, would also replace equation 33.3 with a system of equations that is similar to equations 33.4-33.6 but contains more variables and more equations. For our purpose here it is not important to know exactly what the preferable alternative to equation 33.3 looks like. What matters to us is that in most cases the alternative to equation 33.3 will detennine (1) a constant d; (2) two polynomials in negative powers of z, M(z), and N(z), whose roots lie, respectively, outside the unit circle and on or inside the unit circle; and (3) a sequence of identically and independently distributed random variables ~t such that, for some positive integer q and constants Pt, t = 0, I, ... , q, the Pt satisfy the equations,

Pt

t

= P"

=

0, I, ... , q,

(33.8)

and

M(5) (1 - 5- 1 )dPt

=

t = q + I, q + 2, ....

N(5)~t,

(33.9)

In comparing equation 33.7 with equation 33.2 and equation 33.9 with equation 33.3, note that the solution to equation 33.7 satisfies equation 33.4, Pi = r:t.. + p, and

Pt =

r:t..

+ pt + U t,

where the equations,

Ut

t = 2,3, ... ,

(33.10)

are random variables whIch constitute a solution to the

t

=

2,3, ... ,

Uo

=

0, and

U1

= o.

Similarly, the Pt in equations 33.8 and 33.9 will satisfy a relation of the fonn, m

Pt

=

nj-l

I I i=l i=O

where the equations,

r:t..ij(tizD Vt

+

v"

t

= q+

I, q

+

are random variables which constitute a solution to the

t = q + I, q + 2, ... , Vt

=

0, t

=

(33.11)

2, ... ,

and

0,... , q.

In section 33.2 below we shall see that the behavior of Pt in equation

Trends, Cycles, and Seasonals

917

32.10 and Pt in equation 33.11 is for large t completely determined by

the behavior of the

Ut

and the

Ut ,

respectively.

33.1.2 Cycles and Seasonals

The cycle and the seasonal components of an economic time series are often represented mathematically as finite trigonometric series whose coefficients are constants if the corresponding components are not trending and polynomials in t whenever the associated components are trending. The period of each component of the cycle series is greater than one year and the period of each of the components of the seasonal series is less than or equal to one year. Suppose first that the cycle and seasonal component of x are nontrending. Then the preceding description of cycles and seasonals can be expressed as follows: There exist complex constants hy j = 1, ... , m, and real numbers Aj E [ - n, n) such that m

=.". t+ t SL

C

j=l

h·eitA.j

)

t E T.

'

(33.12)

Which part of the right-hand sum in equation 33.12 represents the nontrending seasonal component of x and which part represents the nontrending cyclical component of x depends on the length of the period in which time is measured. Suppose that time is measured in months and that - n ~ Al < Az < ... < Am < n. Moreover, let A and B be a partition of {I, ... , m}; that is, AnB=0

and

and suppose that, for all j

A u B = {I, ... , m},

=

1, ... ,

m, Aj

E

(33.13)

A if and only if

and

(33.14)

Then the seasonal component of x can be represented by

t E T,

(33.15)

and the cyclical component by t E T.

(33.16)

The assignment of frequencies to St and Ct is not always as clear-cut as it seems to be in equations 33.13-33.16. To see why, observe first that equations 33.13-33.15 imply that

Chapter 33

918

t E T,

(33.17)

a condition on which most econometricians would insist. Next observe that there may exist a constant c > 12 which is not divisible by 12 and satisfies both (I - S-c)ct

=

t E T,

0,

and

t E T, for some j E A. If such a constant exists, assigning Aj to 5t rather than Ct in order to satisfy equation 33.14 seems arbitrary. It is also important to observe that in many instances some work is required to show that a given representation of the sea50nal component of a time series is a special case of equation 33.15. Here is one case in point: Frequently the seasonal component of a time series is represented by the following equations:

E 33.1

12

5t

=

Ld

t E T,

k 5 k.t'

(33.18)

k=1

where t is measured in months, otherwise, and

5k ,t

equals 1 for t - k divisible by 12 and 0

12

Ld

k

=

(33.19)

O.

k=1

After a little reflection, we see that the 5t in equation 33.18 satisfies equation 33.17. Consequently, there exist constants hj,j = -6, ... , -1,1, ... , 5, such that 5t

=

5

L

t E T,

hj eit(21t j /12),

j=-6

NO

and we can conclude that the representation of a seasonal determined by equations 33.18 and 33.19 is a special case of equation 33.15. Equations 33.12 describe the behavior over time of a nontrending cycle and -seasonal component of an economic time series. If the cycle and seasonal components of x are trending, the equations in 33.12 must be changed to m

nj-l

+ = " "L..

Ct t 5 L..

b.)ilft'eitAj) I

,

t E T.

(33.20)

j=l 1=0

Which part of the right-hand sum of equation 33.28 belongs to 5 t and which part to Ct is determined in the same way we decomposed equation

Trends, Cycles, and Seasonals

919

33.12 into equations 33.15 and 33.16. We need not repeat those details

here. If the behavior over time of Ct and 5t can be represented by the equations in 32.20, then there exist an integer n and constants ak , k = 0, 1, ... , n, such that a o = 1, n = Lj=1 nj and n

L

ak(ct-k

+ 5t - k ) =

t E T.

0,

k=O

The ak are obtained by equating coefficients in

n (z m

n

L

ak z - k

=

z-n

k=O

eiAj)nj.

j=1

Again there are many econometricians who believe that the behavior over time of the cycle and seasonal components of x cannot be described by a deterministic model such as equation 33.20. They agree that the cyclical and seasonal characteristics of economic time series are caused by ever-recurring phenomena, e.g., the yearly round of climatic seasons and religious festivals. However, these recurring phenomena change over time, and this change injects an irregular pattern in the behavior of Ct and 5t that we must account for. To do that when time is measured in months, the econometrician might propose the following model for the cycle; t

'Yt = 'Yt-l 'Yo

=

+ ~tt

=

18,19, ...

t = 1,2, ... (33.21)

0,

t = 0, 1, ... , 17, where the ~t constitute a purely random process with mean zero and finite variance For the seasonal, the econometrician might propose

at.

t = 1,2, ... 11

L

¢>t-k

k==O

=

t = 12,13, ...

YJt'

t = 12,13, .,. So

YJi

= =

5

0,

i

=

0,1, ... ,11

i

=

0, 1, ... , 11,

(33.22)

Chapter 33

920

where the et constitute a purely random process with mean zero and finite variance (I~. By solving equations 33.21 and 33.22, we find that

t = 19,20, ... ; and

t = 13,14, .... Consequently, there exist constants hj , j = 0, 1, ... , 18, and dk , k = 0, 1, , 12, and numbers Aj E [ -1t, 1t), j = 1, ... , 18, and flk E [ -1t, 1t), k = 1, , 12, such that 18

Ct

=

ho

+L

hjeiO"i -+

Up

t

=

0, 1, ... ;

+. vtI

t

=

0, 1, ... ,

j=1

and 12

=

5t o d

+ "L...k de

where the equations,

itJlk

k=1

Ut

and the

(1 - 5- 1 )(1 - 5- 18 )Ut

VtI

=

respectively, are solutions to the system of

~tI

t

=

19,20, ... ; ut

=

0,

t

=

0, 1, ... ,18;

t

=

13, 14, ... ; vt

=

0,

t

=

0, 1, ... , 12.

and (1 - 5- 1 )(1 - 5- 12 )Vt

= ep

The models we described in equations 33.22 and 33.21 are two of many possible substitutes for equations 33.15 and 33.16. Most of these substitutes have the following characteristics in common: There exist (1) integers qc and qs; (2) triples of integers, (dc' c, ec) and (d s' 5, es ); (3) pairs of polynomials in negative powers of z, (Mc(z), Nc(z)) and (Ms(z), Ns(z)); and (4) purely random prosesses, {~t; t = qc + 1, qc + 2, ... , } and {e t ; t = qs + 1, qs + 2, ... } such that (1 - 5- 1 )dc (1 - 5-c)ec M c (5) Ct

=

Nc(5)~t,

t

=

qc

+ 1, qc + 2, ... ;

(33.23)

and

Here both C and 5 depend on the length of the periods in which time is measured; for example, 5 may be 4 or 12, according as the time is measured in quarters or months. Moreover, the roots of Mc(z) and Ms(z) are usually taken to lie inside the unit circle. Finally, when the appropriate initial

Trends, Cycles, and Seasonals

921

conditions are added to equations 33.23 and 33.24, the Ct and St constitute two GARIMA processes, where G stands for generalized and ARIMA is short for autoregressive integrated moving average. 2 33.1.3 Concluding Remarks

In the preceding sections we have described various ways of modeling trends, cycles, and seasonals in economic time series. It remains to observe that if we substitute the right-hand side of equations 33.3 and 33.12, respectively, for Pt and (ct + St) in equation 33.1 and assume that the fit in equation 33.1 constitute a wide-sense stationary process, we obtain a dynamic stochastic representation of the behavior of x. On the other hand, if we (1) substitute the right-hand side of equation 33.11 for Pt in equation 33.1; (2) substitute solutions to equations 33.39 and 33.40, respectively, for Ct and St in equation 33.1; and (3) assume that the fit in equation 33.1 constitute a wide-sense stationary process, then we obtain a GARIMAprocess representation of the behavior over time of x. In the next two sections we shall study the salient characteristics of ARIMA processes and dynamic stochastic processes to determine the appropriateness of using them, or generalizations of them, to model trends, cycles, and seasonals in economic time series. 33.2 ARIMA Processes

In this section we study the asymptotic behavior of so-called autoregressive integrated moving average processes. These processes constitute a large class of stochastic difference equations, which includes among many other well-known processes the simple one-dimensional random walk. They were dubbed by G. E. P. Box and G. M. Jenkins who found them to provide useful models for studying and controlling the behavior of certain economic variables and various chemical processes (Box and Jenkins 1970, pp.85-125). An autoregressive integrated moving average process (hereafter an ARIMA process) is defined as follows: Let X = {xU, w): t = - n + 1, - n + 2, ... } be a family of real-valued random variables on some probability space (0, $', P(· )). Then X is an ARIMA process if and only if it satisfies the following conditions: (i) There exist constants Xt , - n + 1 ~ t ~ 0, such that

xU, w) =

xt

a.e.,

t= - n

+ 1, ... , 0.

Chapter 33

922

(ii) There exists on (0, g;, P( . )) a real-valued, purely random process {11(t, w): t = ... , -1,0, 1, ... } with mean zero and finite positive and two sequences of constants {a k : k = 0, ... , n}, {cx s : 5 = variance ... , - 1, 0, 1, ... } such that ao = CX o = 1, an =I- 0, and

=

11

(J;,

00

I cx; < s=-oo

(33.25)

00,

n

00

I

akx(t - k, w)

I

=

CX s 11(t

+ 5, W),

t = 1,2, ....

(33.26)

s=-oo

k=O

(iii) There exist a positive integer 10' nonnegative integers Ii' and complex constants zi' j = 1, ... , 1, such that n

I

akz n - k

1

=

(z -

1)'0

k=O

TI (z -

(33.27)

z)'i,

i=l

j

=

1, ... ,

1.

(33.28)

°

In interpreting this definition, note that, when n = 1 and CX s = for =I- 0, then x is a simple one-dimensional random walk. Note also that Box and Jenkins always assume that n = 10' that CX s = for 5 > 0, and that Icxsi ~ Kf3/sl for some f3 E (0, 1) and some suitably large constant K.

5

°

33.2.1 The Short and Long Run Behavior of ARIMA Processes

The short run behavior of an ARIMA process is delineated in T 33.1. Suppose that {x(t, w); t = -n + 1, -n + 2, ... } is an ARIMA process t = 1, 2, ... } be the associated 1] process. Also let

T 33.1 and let

{1] (t, w); 00

y(t, w)

=

L (Xs1](t + s, w), s=-oo

t=

1,2, ..

0

0

(33.29)

Then there exist a function q>(') and a sequence of real constants Ys such that Yo

=

(33.30)

1,

v

L ak YV-k = 0,

V

=

1,

V

=

n, n

k=O

0

n

L

k=O q>(t)

a kYv-k

=

=

0,

XI"

n

L akq>(t -

k=O

k)

n - 1,

(33.31)

+ 1, ... ,

(33.32)

••

,

t = -n

+ 1, .. 0' 0,

(33.33)

= 0,

t=

(33.34)

1,2, ... ,

Trends, Cycles, and Seasonals

923

t-l

x(t, w)

=

cp(t)

+L

s=o

t=

)'sy(t - s, w),

1,2, ....

(33.35)

The existence of a function qJ(') and a set of constants Ys that satisfy equations 33.33 and 33.35 is easy to verify. So we will not prove it here. To establish equations 33.30-33.32 and 33.34 we use equations 33.26, 33.29, and 33.35 to note that, for all t = 1, 2, ... and n' = min{n, t - I}, n

y(t, w)

=

L

akx(t -

k=O

k, w)

n

=

L akqJ(t k=O

k)

=

L

k=O

t-k-1

n'

L ak I k=O

ysy(t - k -

5,

w)

s=O

n

=

+

t-1

n'

akqJ(t - k)

+L

k=O

ak

L

v=k

Yv-ky(t - v, w)

.to a,qJ(I- + :t~ Ct a,y,_,)Y(I+ I (.to a,y,_,)Y(1 k)

v,w)

(33.36)

v,w)

Since equation 33.36 is an identity in w, it implies the validity of equations 33.30-33.32 and 33.34. Theorem T 33.1, can be used to study the long-run behavior of ARIMA processes. To see how, let x(t, .) and y(t, .) be as in T 33.1 and note first that equations 33.33, 33.34, and 33.27 imply that there exist constants A jk , j = 1, ... ,1; k = 0, .. . ,lj - I, and ~, k = 0, ... ,10 - I, such that 1

qJ(t)

=

I j -1

' 0 -1

L L Ajk(tkzj) + k=O L j=1 k=O

~tk,

t = -n

+

1, ....

(33.37)

Since by 33.28, Izjl < 1 for all j, equation 32.37 implies that the "trend line" qJ ( .) satisfies the asymptotic relation (33.38)

the sign '" indicating that the ratio of the two sides in equation 33.38 tends to unity as t --. 00. Next, note that equations 33.30-33.32, 33.26, and 33.27 imply that there exist constants Cjk , j = 1, ... ,1; k = 0, ... , Ij - 1, and dk, k = 0, ... , 10 - 1, such that

924

Chapter 33

(33.39)

5

=

(33.40)

0, 1, ....

Thus YS' for large enough 5, satisfies the approximate relation (33.41)

Finally, note that if we assume that (33.42)

and that the function

A E [-n, n),

(33.43)

is piecewise continuous on (- n, n) and continuous in a neighborhood of A = 0, then, by lemma 1 in Stigum 1974, for all nonnegative integers q,

lim P ( { WEn:

T-+oo

LT tqy(t, w) {T

2 q+1f, (0)}-1/2

2q

t=1

y

+1


1. However, the results of the simulation experiment on an ARIMA process with 10 = 2 presented in E 33.2 below suggest that the chances of estimating q>(t) from time-series observation on x(·) are, if anything, poorer

Chapter 33

930

when 10 > 1 than when 10 = 1. To see why, compare the result of Feller's simultation experiment on the standard random walk (Feller 1957, figure 5, p. 84) with the result of my simulation experiment (see figures 33.1 and 33.2). One striking difference is that Feller's process seems to change sign much more frequently than my process. In fact, the number of changes of sign of an ordinary random walk grows (very roughly speaking) as some constant multiple of Jf, while the number of changes of sign of my process for 10 > 1 grows as some constant multiple of log t. E 33.2 Let {x(t); t equations, x(t) - 2.5x(t x( - 2)

=

1,

In this case n 3

L

ak z 3 -

k

=

1)

= -

+ 2x(t -

x( -1)

=

2, -I, ... } be an ARIMA process which satisfies the

=

2) - 0.5x(t - 3)

0.7,

x(O)

=

=

17(t),

0.5.

3 and

(z -

1)2(z - 0.5).

k=O

Moreover,

= 0.4 - O.lt + 0.1(0.5)', Ys = 0.2 + 1.95 + 0.8(0.5)S, iP = ! (1.9)2(J;.

n.

{G*(5)G(5)}G*(5)x s (w),

(34.43)

On almost all of 0, 2 2 (5, w) converges exponentially fast to y(.), and the associated seT-adjustment procedure, defined by

5> n,

(39.44)

has desirable properties, as evidenced in the following theorem: T 34.4 Let {x(t, w); t E T} be a dynamic stochastic process which satisfies assumptions i-iv and let 2 2 ( • ) and d 2 ( • ) be as defined in equations 34.43 and 34.44. Then there exist a bounded random variable K(w) and a constant Ie E to, 1) such that 1122 (S,w) - y(w)11 ~ K(w)Jc s ,

5> n,

a.e. (Pmeasure).

Consequently, lim d 2 (S)x S (w) S->OCJ

= lim

{us(w)

+ C(S)~(w) + D(S){3(w)},

a.e. (Pmeasure),

S-+OCJ

(34.45)

where the limit in equation 34.45 is componentwise.

Suppose at last that we want to erase in one swoop all trace of seasonal, cyclical, and trend factors from our observations on the x(t,·). Then we define 2 3 (5, w) by 2 3 (5, w)

= (~(5, w), y(5, w)),

5> n

+ m,

(34.46)

962

Chapter 34

where a(') and y(.) are as described in equation 34.30; and we let the corresponding SCT-adjustment procedure, .913 , be defined by

5> n

+ m.

(34.47)

Then .:£3 (5, .) converges in proabability to (cx('), y( . )), and the SCTadjustment procedure has equally agreeable properties. That is, we have T 34.5: T 34.5 Let {x(t, w); t E T} be a dynamic stochastic process that satisfies assumptions i-iv. In addition, let 2 3 (') and d 3 (') be as defined in equations 34.46 and 34.47. Then

p lim 2 3 (5, w) = (oc(w), y(w)); and plimd3 (5)x s (w)

=

lim {us(w)

+ D(5)!3(w)},

(34.48)

where the limit in equation 34.48 is componentwise.

34.2 Estimating the Coefficients in a Stochastic Difference Equation:

Consistency In this section I present various theorems concerning the asymptotic properties of least-squares estimates of the ak in equation 34.1. My purpose is to contrast the properties of these estimates when the initial conditions are fixed with their properties when the initial conditions are random. The fact that the properties vary with our assumptions about the initial conditions can be intuited from the discussion of stochastic difference equations in section 33.3. 34.2.1 Equations with Fixed Initial Conditions

We begin by assuming that the initi$11 conditions of equation 34.1 are fixed. If this is the case, then we can assert the following: Let {x(t); t = - n + 1, - n + 2, ... } be a family of real-valued random variables which satisfy the conditions:

T 34.6

(i) x(t) (ii) 0


n, be a sequence of random vectors which, for each N and "almost all" realizations of the x(t), satisfy N

t~

(nk~l x(t) -

ak(N)x(t - k)

)2

N(x(t)N )2 - k~l (XkX(t - k)

(l1~.i.~(ln t~

Then a(N) converges in probability to p lim a(N)

=

(34.49) a;

that is, (34.50)

-a.

This theorem was orginally established by Mann and Wald (Mann and Wald 1943) under the additional assumption that E1](1)4 < 00 and that the moduli of the roots of the polynomial M(z) = L.k=O akz- k are all less than 1. T. W. Anderson established the theorem for the case when the roots of M(z) are all distinct and have moduli greater than 1 (Anderson 1959), and M. M. Rao proved the theorem for the case when M(z) has two roots, one with modulus less than 1 and one with modulus greater than 1 (Rao 1961). Finally, H. Rubin proved the theorem for n = 1 (Rubin 1950), and T. ]. Muench proved it for an arbitrary n (Muench 1974). The important thing to notice about T 34.6 is that the least-squares estimate of - a is consistent and that the consistency of a(N) is independent of the values of the moduli of the roots of M(z). By making assumptions about the moduli of the roots of M(z), we can obtain a stronger result. T 34.7 Let {x(t); t = - n + 1, - n + 2, ... } be a family of real-valued random variables that satisfies conditions i-iii of T 34.6. Moreover, let M(z) = L~=o ak z - k and assume that the moduli of the roots of M(z) differ from 1. Finally, let a and a(N) be as defined in T 34.6. Then lima(N)

=

-a

w. pr. 1.

(34.51)

Moreover, if the moduli of the roots of M(z) are all greater than 1, then there exist positive constants K and Asuch that A E (0, 1) and Ila(N)

+ all

~

KA N

w. pr. 1,

N

=n+

1, n

+ 2, ....

(34.52)

In interpreting the conditions of T 34.6, Mann and Wald, Anderson, Rao, Rubin, and Muench took the distribution of the 1](t) to be independent of the values assumed by x(t) at t = - n + 1, ... , O. This means that the x(t) should be thought of as representing an experiment which starts at time t = 1 with fixed (i.e., nonrandom) initial conditions x- n + 1 , ••• , X o. Such experiments occur frequently in physics and chemistry. However, they are uncommon in sciences such as astronomy, meteorology, and economics. In the latter, researchers are more likely to observe ongoing" processes than experiments with a fixed initial date. /I

Chapter 34

964

34.2.2 Equations with Random Initial Conditions: Special Cases

Next we shall investigate the asymptotic properties of a(N) when the x(t) represent an ongoing process rather than an experiment with a fixed initial date. This means that we want to ascertain the limit of a(N) if the x(t) satisfy conditions ii and iii of T 34.6 for all t E { ... , - I, 0, I, ... } but not condition i. To do that we must introduce some notation: Let the ak be as in equation (34.1) and let zp and np, p = I, ... , h, be such that n

L

k=O

akz n- k =

h

TI

(z - zp)n p

Moreover, let w p = zp if bk , k = 0, ... , n, by n

L

=

zn M(z).

(34.53)

p=l

bkzn- k =

k=O

Izi

~

1 and let w p = Z;l if IZpl >

1. Then define

h

TI

(34.54)

(z - wp)n p •

p=l

Finally, let (0, $', P(·)) be a probability space, and let {x(t, w); t E T} be a dynamic stochastic process on (0, $', P(·)) with the representation x(t, w)

=

y(t, w)

+ [x(t, w),

t E T,

(34.55)

where {y(t, w); t E T} is a wide-sense stationary process and {fx(t, w); t E T} is a nonstationary stochastic process which satisfies h

[x(t, w))

=

np-l

L L

Apj(w) (tjz~),

t E T,

(34.56)

p=l j=O

and assume that (i) the x(t, . ) are real-valued; and IZpl =1= 1, p = 1, ... , h.

(ii)

Then observe that n

L adx(t k=O

k, w)

=

0,

t E T,

and recall that if condition ii above holds, then the representation of the x(t,·) in equation 34.55 is uniquely determined. To provide the right setting for our main result, we begin by considering two special cases. First the autoregressive case: T 34.8 Let {x(t, w); t E T} be a dynamic stochastic process which satisfies equations 34.55 and 34.56 and conditions i and ii above. Moreover, let

Least-Squares and Stochastic-Difference Equations

965

= (ai' ... ,an)' and b = (b l , ... , bn)', respectively, be as in equations 34.53 and 34.54 and let

a

a(N, w)

=

(B~(w)BN(W))-IB~(w)XN(W),

where xN(w)

=

iU,w)

=

(34.57)

(x(l, w), ... , x(N, w))'; B~(w)

n

(xU,w), ... ,x(t -

=

{i(O, w), ... , i(N -

+ l,w))' t = 0,1, ... , N

-

1, w)}; and

1.

Finally, let n

L akx(t -

=

1JU, w)

k, w),

t E

T,

k=O

and

d = {w

En: fxU'W) = 0, t E

T};

and assume that {1JU, w); t E T} is a purely random process whose variables are distributed independently of the fx(t, w) in equation 34.56 and that P(d) > O. Then lim a(N, w)

= - b a.e. (Pd measure),

(34.58)

N-+oo

where P..91(·) denotes the conditional probability measure on (0, $') given d.

In interpreting the theorem, note that, when .PI = 0, {x(t, w); t E T} is an autoregressive process. When .PI =I=- 0 and the conditions of the theorem are satisfied, then relative to P..91(·), the x(t,') behave as an autoregressive process. Note also that in estimating the parameters of an autoregressive process, statisticians usually assume that the process is stable, i.e., that a = b. Theorem T 34.8 shows that this assumption cannot be tested in any meaningful way if the parameter estimates are leastsquares estimates. Finally, note that E. J. Hannan has established equation 34.58 for the case a = b and .PI = 0 (see Hannan 1970, theorem VI. 1, p. 329). The relation 34.58 for a =I=- b was established in Stigum 1976 (pp. 49-75). Here is an example to fix our ideas: Let {x(t, w); t

E 34.6 xU, w)

+ axU -

1, w)

E

=

T} be a wide-sense stationary solution to the equations, 1JU, w),

T,

t E

where a > 1 and 1J = {1JU, w); t E T} is a purely random process with mean zero and finite variance (J,,2. Also let a(N, .) be as described in equation 34.57. Then a(N, w)

=

t~ xU, w)x(t -

=

-a

+ C~

1, w)

x(t -

l~ x(t -

1, w)1J(t, w)

1, W)2

l~ x(t -

1, W)2).

966

Chapter 34

Moreover (by lemma 3 in Stigum 1976, p. 61), lim N-->oo

(~) f N

1, w)1J(t, w)

x(t -

=

+a- 1(J";

a.e.,

t=1

and lim N-->oo

(~) f x(t N t=1

1, W)2

=

~

a-I

a.e.

Consequently,

a2

-

1

= -a + --- = -a- 1

lim a(N, w)

a.e.

a

N-->oo

Next the purely explosive case: T 34.9 Let {x(t, w); t E T} be a real-valued dynamic stochastic process which satisfies equations 34.55 and 34.56 and let a = (ai' ... ,an) and a(N, w), respectively, be as in equations 34.53 and 34.57. Moreover, suppose that

IZpl>

p

1,

=

1, ... ,

h;

and IApj(w) I > 0

a.e.,

p = 1, ... , (1.,

j

=

0, ... , n p - 1.

Then there exist a constant Aand a bounded random variable K( .) such that 0< A < 1, and

Ila(N, w) + all

~ K(W)A N

a.e. for N

> n.

(34.59)

In interpreting this theorem, note first that in form 34.59 is identical to 34.52. Then observe that if we let n

L

11(t, w) =

akx(t - k, w),

t E T,

k=O

and let 11 = {11 (t, w); t E T}, then 11 is a wide-sense stationary process. However, in contradistinction to the 11(t) of T 34.7 the 11(t,') we defined above need not constitute a purely random process. In fact, 11 need not even have an absolutely continuous spectral distribution function. This might seem surprising. So here is an example to show why it is not strange at all. E 34.7

Let {x(t, w); t E T} be a dynamic stochastic process with the representation 00

x(t, w)

= -

L

a- s 1J(t

+ 5, w) + A(w)a t ,

tE

T

s=1

where a > 1, where {1J(t, w); t E T} is an arbitrary real-valued wide-sense stationary process, and where A(W)2 > 0 a.e. Also let a(N,') be as described in equation 34.57. Then

Least-Squares and Stochastic-Difference Equations

1, w)

x(t, w) - ax(t -

=

967

t E T,

l1(t, w),

and

+

a(N, w) = a

Ct,

I,

x(l -

w)~(I, w)

/t,

x(1

~ I, W)')-

Moreover (by lemma 4 in Stigum 1976, p. 62), Tchebichev's Inequality, Borel's 0-1 Criterion, and some algebra), N

lim (N 1 / 2 a N )-1 N-+oo

L x(t -

1, W)l1(t, w)

=

a.e.,

0

1=1

and lim a- 2N N-+oo

N

L x(t -

1, wf

A(W)2

= -2-- > a-I

1=1

a.e.

0

From this it follows easily that there are a bounded random variable K( . ) and a constant)., E (0, 1) such that ... la(N, w) -

a.l

~ K(W)).,N

a.e. for N

~ 1.

The differences in the behavior of a(N,') in E 34.6 and E 34.7 are interesting. In the next example we record the results of a simulation experiment which show how dramatic these differences can be. Let {l1(t, w); t E T} be a purely random process of normally distributed variables with mean 0 and variance 1. Moreover, let

E 34.8

y(t,W)

= -

400

L

(1.02)-Sl1(t

+ 5,W),

t

=

=

0, 1, ... ;

0, 1, ... ;

(34.60)

s=1

400

w(t, w)

= L

(1.02) -Sl1(t -

5,

W),

t

(34.61)

s=o

= y(t, w) + (1.02)/, t = 0, 1, ... ; and observe that, for t = 1, 2, ... , 700,

(34.62)

x(t, w)

(1.02)y(t -

w(t,w) -

(1.02)-lW(t -

x(t, w) - (1.02)x(t -

= 11(t, w) - (1.02)-400 11 (t + 400); l,w) = l1(t,W) - (1.02)-400 11 (t - 400);

1, w)

y(t, w) -

1, w)

=

y(t, w) -

(1.02)y(t -

1, w).

By a pseudo-random sampling scheme, which Anders Ekeland programmed for me, I obtained one observation on each of the r,(t, .) from t = - 400 to t = 1100. Those observations and equations 34.60-34.62 provided me with one observation on each of the triples (y(t, . ), w(t, . ), x(t, . )) from t = 0 to t = 700 which I used to compute a,(N) =

for z

,t,

= y,

z(I),(1

~ 1)

j,t,

z(1 - I)',

N

~ 1,2, ... , 700,

w, and x. According to T 34.8, ay(N) and aw(N) converge with prob-

968

Chapter 34 1.10 1.08 1.06 1.04

i~A

1.02 I"~--------------­ 1.00 0.98 0.96

fIJ"

0.94 0.92

0.90 0.88 0.86 0.84 0B2

0.8+---r-,...--.--,...--.--,...---.-,--r-,--r-,---.----, o 50 100 150 200 250 300 350 400 450 500 550 600 650 700 N Figure 34.4 Least-squares estimates of ax in r(l) - axr(t - 1) = I/(t), 1 E T.

ability 1 to (1.02) - 1, and according to T 34.9, ax(N) converges exponentially fast to (1.02). These predictions are borne out in figures 34.4-34.6. 34.2.3 Equations with Random Initial Conditions: The Fundamental

Theorem For my main result, I need more notation. Let {x(t, w); t E T} be a realvalued dynamic stochastic process which satisfies equations 34.55, 34.56, and p = 1, ... , h.

(34.63)

Moreover, for a given w, let Q(z, w) be the polynomial of least order with leading coefficient 1 such that Q(S, w)!x(t, w)

= 0,

and let m, mp ' p

=

t E T. 1, ... ,

h; qk' k

k=O

=

n (z -

0, ... , m be such that

h

m

L

(34.64)

qkZ-k = Q(z, w) = z-m

zp)m p •

p=l

Finally, let the zp be numbered so that and

(34.65)

Least-Squares and Stochastic-Difference Equations 1.1 1.08

1.06 1.04 1.02

"til

1.00 0.98 0.96 0.94

0.92

0.90 0.88 0.86

OB4 OB2 0.80+---,-.,..--.---.--.,.---,--,--.---.--.,.---,--.,..--,--. o 50 100 150 200 250 300 350 400 450 500 550 600 650 700 N

Figure 34.5 Least-squares estimates of ay in y(t) - ayy(t - 1) = ,,(t). 1 E T. 1.10

1.08 1.06 1.04

1.02 1.00 0.98 ~

til

0.96 0.94

0.92 0.90 0.88

0.66 0.64

OB2 0.80t---:'::-::r---:T:-::--r:-----r-,---.-,---.----,,.---.-----,--.----, o 50 100 150 200 250 300 350 400 450 500 550 600 650 700 N

Figure 34.6 Least-squares estimates of a.. in w(t) - a.. w(1 - 1) = ,,(1), 1 E T.

969

970

Chapter 34

t/J(Z)

m L t/JkZ-k k=O

=

=

n h

Z-m"

(z - zp)m p ,

and

p=h\ +1

(34.66)

With the preceding notation, I can state my main result as in T 34.10. In reading it, note that equation 34.63 implies that M(z) is the minimal polynomial with leading coefficient 1 such that M(S) fx(t, w) = 0 a.e. Thus, if m #- nand m

z(t, w)

=

L

qkX(t -

k, w),

t E T,

k=O

then {z(t, w); t E T} is not a wide-sense stationary process. Similarly, if M(z) #- L~=o CkZ -k, and if n

=

v(t, w)

L ckx(t k=O

then {v (t, w); t

E

k, w),

t E T,

T} is not a wide-sense stationary process.

Let {x(t, w); t E T} be a real-valued dynamic stochastic process which satisfies equations 34.55, 34.56, and 34.63; define {tf(t, w); t E T} by

T 34.10

'J(t, w)

=

n

L akx(t - k, w), k=O

t E T;

(34.67)

= 0, 1, ... , m, be as defined in equations 34.64 and 34.65; and let = (c 1 , ••• , cn), where the Ck are as in equation 34.66. Moreover, let !/ = {w EO: L'::=Oqkfx(t - k,w) = 0, t E T, and there is no polynomial N(z) of lower order than m such that N(S)fx(t, w) = 0; t E T}. let qk' k

C'

Finally, let a(N,') be as described in equation 34.57 and assume that (i) the 'J(t, .) constitute a purely random process and are distributed indepen-

dently of the fx(t, . ); and (ii) P(Y)

> o.

Then lim a(N, w)

=

-c

a.e. (P.'/' measure),

(34.68)

where p.'/'(.) denotes the conditional probability measure on (0,31') given Y.

In interpreting this theorem, note that when h1 = h, C = a. For this case, E. ]. Hannan established equation 34.68 in Hannan 1970 (chapter VI,

Least-Squares and Stochastic-Difference Equations

971

theorem VI.l, p. 329). Note also that when !/ = nand M(z) = a. Finally, note that T 34.10 suggests that n can be partitioned into a finite number of sets on which a(N,·) converges a.e. to different vectors. For instance, on the set {w: Ajk(w) = O,j = hi + 1, ... , h, k = 0,1, ... , nj - I}, a(N,·) converges to - h, where h is as defined in theorem T 34.8. On the set {w: Ajk(w) =1= O,j = hi + 1, ... ,h,k = 0,1, ... , nj - I}, a(N, .) converges to - a. All intermediate cases are described by the definition of !/ and c. Thus even when a(N,·) is inconsistent in the sense that p lim N -+ oo a(N, w) =1= - a, a(N, .) always converges on each of the sets of the partition to something definite. L~=o qk Z -k, then again c =

Let {x(t, w); t

E 34.9 x(t, w)

=

y(t, w)

E

T} be a dynamic stochastic process with the representation

+ A(w)zt + B(w)w t,

t E T,

where z > w > 1. Moreover, let a'

=

(at,a z )

=

(-(z

+ w),zw),

and 'l(t, w)

=

x(t, w)

+ atx(t -

1, w)

+ az(t -

2, w),

t E

T.

Finally, let

9'z

= =

/;//3

= {w E Q : A(w) = 0, B(w) #

9'4

=

9't

{w E Q : A(w)

# 0, B(w) # O},

{w E Q : A(w)

# 0, B(w) = O},

{w E Q: A(w)

=

O,B(w)

O},

= O},

and b'

=

(b t , b z )

=

[-(1/z

+ 1/w),1/zw].

Assume now that {'l(t, w); t E T} is a purely random process which is indenpendently distributed of A(·) and B(·). Assume also that PCC/;) > 0, i = I, ... , 4, and let a(N,·) be as described in equation 34.57. Then

-a

lim a(N,w)

N-->oo

= {

a.e. on 9't,

+ (1/w)),z/w] ((liz) + w), wlz]

-[-(z

a.e. on 9'z,

- [-

a.e. on 9'3' and

- b

a.e. on 9'4.

34.2.4 Concluding Remarks

From all appearance it looks like the processes considered in theorems T 34.7 and T 34.10 differ from one another only in that the initial conditions in one are fixed (i.e., nonrandom) while the initial conditions in

972

Chapter 34

the other are random. This and the fact that the limit in equations 34.51 is completely independent of the values assumed by x(t) for t = - n + 1, ... , 0, make it hard to intuit the reason the conclusions drawn in the two theorems are so different. The following simple arguments will show why. Let {x(t); t = -n + 1, -n + 2, ... } be as specified in theorem T 34.7, and let X == {x(t, w); t E T} be as specified in theorem T 34.10. Then observe that condition iii of theorem T 34.6 and equation 34.67 imply that there exist constants Ys' 5 = 0, 1, ... , and real-valued functions q>(.) on {- n + 1, - n + 2, ... } and iP(.) on {- n + 1, - n + 2, ... } x n such that t-1

(i) a. x(t) = q>(t)

b. q> (t)

=

Xu t

+L

Ysl](t - 5), t = 1, 2, ... ;

s=O

= -

n

+ 1, ... , 0; and

n

c.

L akq>(t k=O

k)

=

0, t

=

1, 2, ....

t-1

(ii) a. x(t, w) b. cp(t,w)

= =

cp(t, w)

+L

Ys17(t -

=

+

5,

w), t

s=O

x(t,w), t

-n

=

1, 2, ... ;

1, ... ,0; and

n

C.

L

k=O

akCP(t -

k, w) = 0

for all

WEn

and t = 1, 2, ....

Assertions i and ii here make precise what it means to say that the two processes in theorems T 34.7 and T 34.10 differ in that one has fixed initial conditions while the other has random initial conditions. Next observe that there is a wide-sense stationary solution to equation 34.67. This solution is unique in the sense that any other solution can differ from it only on an n set of P-measure o. Thus we may pick the Y process in equation 34.55 as the wide-sense stationary solution of equation 34.67. If we do, we can also find constants Ys, 5 E T such that 00

y(t, w)

=

L

Ysl](t - 5, w)

a.e.,

for all t E T,

s=-oo

and assert that x(t, w)

= fx(t, w) +

00

L

s=-oo

Ysl](t -

5,

w)

a.e.,

for all t E T.

Least-Squares and Stochastic-Difference Equations

973

If the zp in equation 34.56 satisfy IZpl < I, p = I, ... , h, )'S = 0 for 5 < o. If the Zp satisfy IZpl > I, p = I, ... , h, )'S = 0 for 5 ~ o. Otherwise )'S -# 0 for 5 ~ 0 and for 5 ~ - k where k is a positive integer which depends on the multiplicity of the zp with IZpl > 1. Finally, observe that the last equation and conditions ii above imply that ii>U, w)

= [xU, w) +

00

I s=

)'s1]U -

5,

w)

for t

= -

n

+ I, ... , 0,

-00

and that there exist random variables Apk ('), p np - I, such that

=

I, ... , h, k

=

0,

for all WEn and

t = - n + I, - n + 2, .... From the last two equations and from (iib) we can draw the following conclusions: 1. If IZpl < 1 for all p = I, ... , j, the A pk (') are linear functions of the [xU, '), t = - n + I, ... , 0, and of the 1]U, .) for t ~ O. 2. If IZpl > 1 for all p = I, ... , h, the A pk (') are linear functions of the [xU, '), t = -n + I, ... , 0, and the 1]U, .) for t + n > k, where k is a positive integer which depends on the multiplicity of the various zp'

3. If some of the zp have moduli smaller than 1 and some greater than I, the A pk (') are linear functions of the [xU, '), t = - n + I, ... , 0, and of the

1]U, .) for t ~ 0 and for t + n > k, where k is a positive integer which depends on the multiplicity of the zp with IZpl > 1.

The preceding observations allow us to point out one other way in which the processes in theorem T 34.7 and T 34.10 differ. In theorem T 34.7 the 1] U), t = I, 2, ... , are implicitly assumed to be independently distributed of cpU), t = - n + I, - n + 2, ... , 0, and hence of cpU) for t ~ 1 as well. In theorem T 34.10, the 1](t,') are assumed to be independently distributed of the [xU, '). This means that, if IZpl < 1 for all p, the 1] U, .) for t ~ 1 are independently distributed of the ii> U, . ) for t = - n + I, - n + 2, .... In all other cases the distribution of the 1] U, .) for t ~ 1 is not independent of the values assumed by ii>(t,') for t = - n + I, ... , o. Evidently, in these cases the process considered in theorem T 34.10 differs in a second fundamental way from the process considered in theorem

974

Chapter 34

T 34.7. It is, therefore, revealing to observe that the conclusions of the two theorems differ only when at least one of the Zp has modulus greater than 1. In interpreting the conclusions of theorem T 34.10, note also that the set

[/ is defined in terms of values assumed by the Apq(')'s (and not the

A pq (' )'51). 34.3 Estimating the Coefficients in a Stochastic Difference Equation: Limiting Distributions

In this section I present theorems concerning the asymptotic distribution of least-squares estimates of the ak in equation 34.1. My primary aim is to contrast the properties of the limiting distributions when the initial conditions are fixed with their properties when the initial conditions are random. 34.3.1 Equations with Fixed Initial Conditions

We begin by assuming that the initial conditions are fixed. For this case we can assert the following theorem: Let {x(t); t = - n + 1, - n + 2, ... } be a family of real-valued random variables which satisfy conditions i-iii of T 34.6. Moreover, let a(N) and M(z), respectively, be as in equations 34.49 and 34.53. Then the following assertions are true:

T 34.11

(i) If the moduli of the roots of M(z) are all less than 1, the matrix 11

=

lim N~oo

(N- t 1

Ex(t -

i)x(t -

j))

(34.69) l,;;;i,j';;;n

t=l

is well defined (i.e., the limit exists) and positive-definite. Moreover, the random vectors ,jN(a(N) + a) converge in distribution to an n x 1 normally distributed vector '!J with mean zero and covariance matrix (J; 111, where (J,,2 = E", (t)2 . (ii) If the moduli of the roots of M(z) are all greater than 1, if Ok denotes the zero

vector in R k , if x(t) fI(t)

and

= (x(t), ... , x(t = (",(t), 0n-1)',

- n

+ 1))', t

=

1, ... ;

t=

0, 1, ... ;

Least-Squares and Stochastic-Difference Equations

A

=

1

o

0

1

o

o

975

(34.70)

1

and if

u= t~ A-t(x(O) + V~l A-V1J(V») (X(O) + k~l A-k~(k»)'A'-t, and w

= p ;~ t~ A-t(x(O) + m~l

A-m~(m»)'1(T - t + 1),

then U is nonsingular and positive-definite w. pr. 1 and

p lim AN (a(N)

+ a)' =

U-1w.

N--+oo

The first half of this theorem was proved by Mann and Wald under the additional assumption that all the moments of 17(1) exist and are finite. Anderson established the first half as stated and the second half under the additional assumption that U is nonsingular w. pr. 1. Finally, Muench showed that U must be nonsingular w. pro 1 if the distribution of 17(t) is nondegenerate. 1 In the next theorem, we determine the asymptotic distribution of j"N(a(N) + a) when the roots of M(z) lie on both sides of the unit circle. For that purpose, we let M(z) be as in equation 34.53 and assume that

p = 1, ... , hI;

p

= hI +

(34.71)

1, ... ,

h.

(34.72)

We also make use of several new symbols: ljJ(z)

D(z)

= =

m

L

ljJkZ-k

k=O n-m

=

n p=h +1 h

z-m

(z -

zp)n p ,

(34.73)

1

_

M(z)

L dkz k = ~( ), k=O Z I(J

and the n x n matrix

(34.74)

Chapter 34

976

1

0

0

0

1

0

0

0

d1

1

0

0

t/JI

1

a

0

d1

1

0

t/Jl

1

1

t/Jm

d1

0

t/Jm

R=

t/Jm

1

dn- m 0

dn- m

t/Jl

dn- m

o

o

o

o

o ...

t/Jm (34.75)

With the additional notation, our result can be stated as follows: T 34.12 Let {xff); f = - n + 1, - n + 2, ... } be a family of real-valued random variables which satisfy conditions i-iii of T 34.6. Moreover, let a(N) and M(z) be as defined in equations 34.49 and 34.53, and assume that M(z) satisfies equations 34.71 and 34.72. Finally, let t/!(z), D(z), and R be as defined in equations 34.73-34.75 and let m

u(t)

= L t/!kx(f -

= -

t

k),

(n - m)

+ 1, ... ,

k~O

and

r 2 ==

;~ (N- ,~ Eu(t -

Then

r2

1

j)u(t -

j»)l.;;.i.;n-m

(34.76)

is well defined and positive-definite. Also the random vectors + a) converge in distribution to

jN(a(N)

i

=

(Om' ,§)R',

where '§ = ('§I' .. " '§n-m) is an (n - m)-dimensionaI. normally distributed random vector with mean zero and covariance matrix (J~2r2-1.

As illustration of the preceding theorem consider the following example: E 34.10 Let {x(t); t assume that

= -

n

+ 1,

- n

+ 2, ... }, a, and a(N) be as in T 34.12 and

Least-Squares and Stochastic-Difference Equations

M(z)

=

977

Z-3(Z - Zl)(Z - Z2)(Z - Z3),

where 0 < Zl < 1 < Z2 < Z3' Also let 1/1 ( .), D('), and R. be as defined in equations 34.73-34.75, and let the u(t) be as described in T 34.12. Then in the present case m = 2,

=1D(z) = 1 I/I(z)

(Z2

+ Z3)Z-1 + Z2Z3Z-2,

(34.77)

ZlZ-l,

(34.78)

and

o (34.79)

1

Moreover, u(t)

=

x(t) - (Z2

+ Z3)X(t -

1)

+ Z2Z3X(t -

t = 0, I, ... ,

2),

and (34.80)

Finally, by theorem T 34.12, there is a normally distributed random variable with mean zero and variance (J,,2 r 2- 1 such that the random vectors jN'(a(N) + a) converge in distribution to.i = (0,0, t§)R.'

=

t§. (I, - (Z2

+ Z3)' Z2Z3)'

34.3.2 Equations with Random Initial Conditions

Next we shall consider the limiting distribution of the least-squares estimate of - a when the x(t) in equation 34.1 represent an ongoing process rather than an experiment with a fixed initial date. We begin with the purely explosive case: Let {x(t, w); t E T} be a real-valued dynamic stochastic process which satisfies equations 34.55 and 34.56 and let a = (a l , ... , an) and t1(N, .), respectively, be as in equations 34.53 and 34.57. Moreover, suppose that T 34.13

Izjl >

j

1,

= 1, ... , h;

and l~i(W)1 =j:.

0

a.e.,

j

=

1, ... ,

h,

i

= nj

-

1;

and let n

'1(t, w)

=

L

akx(t - k, w),

t E T.

k=O

Finally, let lx(t, w) = ([x(t, w), ... ,fx(t - n + I, w))'; t E T; let L 00 (w) = {lx(O, w), lx( -I, w), ... }; let A be as in equation 34.70, and assume that the '1(t,.) consti-

978

Chapter 34

tute a purely random process whose variables are distributed independently of the fxU,') with mean zero and positive finite variance. Then Loo(w) is of full rank a.e., and for all z ERn, lim P( {w En: AN(a(N, w) N-oo

+ a)' ~ z})

This theorem is a complete analogue of the second half of T 34.11. To see why, let x(t, w) = (x(t, w), ... , x(t - n + I, w))' and ~(t, w) = (1](t, w), 0, ... ,0)' and observe that x(t, w)

=

Ai(t -

=

Alx(t -

I, w)

+ ~(t, w),

t E T,

and lx(t, w)

I, w),

t E T.

Consequently, if we let g(t, w)

=

(y(t, w), ... , y(t t-1

g(t, w)

+ At lx(O, w) = Atx(O, w) + L

s=o

n

+ I, w))', t E

T, then

As~(t - s, w).

From this it follows by obvious arguments that with probability 1 OCJ

lx(O, w)

=

x(O, w)

+ L A -V~(v, w), v=1

and _

OCJ

=

fx(-t,w)

A-t{x(O,w)

+ L A-V~(v,w)},

t

=

1,2, ... ,

v=1

which suffices to establish the validity of our assertion. In order to state our main theorem, we need still more notation: Let p

=

I, ... , h;

and Q(z)

=

r

h

k=O

p=1

L qk z - r = z-r fl

(z - zp)m p •

(34.81)

Moreover, redefine (0 t/J (.) in terms of Q(.) as t/J(z)

=

m'

h

k=O

p=h t +1

L t/JkZ-k = z-m' fl

(z -

zp)m p ,

(34.82)

979

Least-Squares and Stochastic-Difference Equations

and let D(z)

=

M(z)/t/J(z) as in equation 34.74 with t/J(z) as in equation

34.82. Finally, let

=

C(z)

n

L

n (z p=1 hi

Ck Z- k = z-n+m"t/J(z)

k=O

zp)n p

n (z p=h +1 h

Z;1 )np-m p.

(34.83)

l

When hl = h, t/J(z) is taken to equal 1 and C(z) to equal M(z). When m = 0 and h1 =1= h, Ck = bk, k = 1, ... , n, where the bk are as defined in equation 34.54. Jf

Let {x(t, w); t E T} be a real-valued dynamic stochastic process which satisfies equations 34.55 and 34.56; let M(z) and the Zp be as in equation 34.53, 34.71, and 34.72; and let a = (a l , ... , an) and a(N,'), respectively, be as in equations 34.53 and 34.57. Moreover, let

T 34.14

n

=

1](t, w)

L akx(t k=O

k, w),

t E T,

and assume the 1](t,') constitute a purely random process whose variables are distributed independently of the !x(t,') with mean zero and positive variance. Finally, let Q('), t/J('), D('), and C(·) be as specified above in equations 34.8134.83; let n

~(t, w)

=

L bky(t k=O

((t, w)

=

L cky(t k=O

k, w),

t E T;

k, w),

t E

n

T;

and m·

Y2(t, w)

=

L t/Jky(t k=O

k, w),

t E

T;

let

Y =

{w En: L~=o qk!x(t k Lk=O itk Z - such

k, w) = 0, t E T, and there is no polynomial N(z) = that 5 < r, 'lJ=o itdx(t - k, w) = 0, t E T, and its =1= O};

and assume that P(Y) > 0. Then the following assertions are true. (i) If mJf

lim

=

0, for all z ERn,

p({wEn:yIf::j(a(N,w)

+ b)

~ z}IY)

N-+oo

=

(21t0-l}-n/2W3I l / 2

where 0-1

r3 =

=

E~(t, W)2;

{E{y(t -

r

exp - (1I20-1)(uT3 u) du,

J{UER':U~Z} Ir3 1is the determinant of r 3 ;

i,w)y(t - j,w)IY}L~i.j~n;

and E{ 'IY} denotes the expected valued of (.) with respect to P{ ·IY}. (ii) If < m < n, if o-l = E((t, W)2, and if

°

Jf

(34.84)

Chapter 34

r4 =

980

{E{Y2(t - i,W)Y2(t - j,w)I9'} }l~i.j~n-m"

(34.85)

then there exists a normally distributed vector '!i = ('!iI' ... , 'fJn - m .) with mean zero and covariance matrix ol r;l such that the vector jN"(a(N, w) + c) converge in distribution to i

=

(am" '!i)R',

where R is as defined in equation 34.75 with 1jJ(') and D(') as specified in equations 34.82 and 34.74, and where c = (c l , ••. , cn) with the Ck as defined in equation 34.83.

The first half of T 34.14 is related to the first half of T 34.11 in the following way. Suppose that IZpl < 1 for all p = 1, ... , h. Then tjJ(z) == 1, m* = 0, bk = ak , k = 1, ... , n, and ~(t, w) = 1J(t, w) for all t E T. Moreover, it is easy to show that, if the variance of the 1J(t.') is the same as the variance of the 1J(t) of T 34.11, then r 3 as defined in equation 34.84 equals r 1 as defined in equation 34.69. Consequently, if IZpl < 1, p = 1, ... , h, and if the variance of the 1J(t,') and the 1J(t) are identical, the limiting distribution of the vector ~(a(N, w) + a) is the same as the limiting distribution of the random vector ~(a(N) + a) of T 34.11 (i). The second half of T 34.14 is related to the conclusions of T 34.12 in the following way. Suppose that Q(z) = M(z), and that M(z) satisfies equations 34.71 and 34.72. Then < m* < n, and

°

g

= {w En; Apq(W) =I-

0, p

=

1,

, h, q

=

np - I}.

Moreover m* = m, Ck = ak , k = 1, , n, ((t, w) = 1J(t, w) for all t E T, and the tjJ(.), D('), and R. of T 34.14 are identical with the ljJ('), D('), and R. of T 34.12. Finally, it is easy to show that, if the variance of 1J(t,') is the same as the variance of the 1J(t) of T 34.12, then r 4 as defined in equation 34.85 equals r 2 as defined in equation 34.76. Consequently, if (2(z) = M(z), if M(z) satisfies equations 34.71 and 34.72, and if the variance of 1J(') is identical with the variance of 1J(t), then the limiting distribution of the vectors j]J(a(N, w) + a) is the same as the limiting distribution of the random vectors ~(a(N) + a) of T 34.12. E 34.11 Let {x(t, w); t suppose that M(z)

=

Z-3(Z -

Zl)(Z -

E

T}, {11(t, w); t

Z2)(Z -

E

T}, a, and a(N,') be as in T 34.14 and

Z3),

where 0< Zl < 1 < Z2 < Z3' Also let Q('), 1jJ('), and C(·) and D(') and R, respectively, be as specified in equations 34.81-34.83 and 34.74 and 34.75 and let ((t,'), Y2(t, '), and r 4 be as described in T 34.14. Then

Least-Squares and Stochastic-Difference Equations

=

x(t, w)

981

+ A lO (w)zf + A20(W)Z~ + A30(W)Z~,

y(t, w)

and there are two interesting special cases to consider.

Case 1:

Suppose that Q(z)

=

n : AiO(w) i= O,i

Y'

{w E

= M(z). Then m'" = = 1,2,3},

2,

C

= a,

and 1jJ('), 0('), and R are as described in equations 34.77-34.79. Moreover,

= 1](t, w), Y2(t, w) = y(t, w)

t E T;

((t, w)

+ Z3)y(t -

- (Z2

1, w)

+ Z2z3y(t -

2, w),

t E T;

and Y2(t, w) - Z1Y2(t -

= 1](t, w),

1, w)

t E T.

From this it follows that 00

Y2(t, w)

L z:1](t -

=

5,

t E T,

W),

s=O

and hence that

r4

(J2

=-"-2'

1 -

Z1

which equals r 2 as defined in equation 34.80 if the variance of the 1](t,') equals the variance of the 1](t) of E 34.10. Finally, from T 34.14 it follows that the limiting distribution of the vectors jN(a(N, w) + a) equals the distribution of the vector (0,0, '§)R', where '§ is normally distributed with mean zero and variance (J,,2 r';-1. This limiting distribution is the same as the limiting distribution of the jN(a(N) + a) of E 34.10 if the variance of 1](f) equals the variance of ,,(t,').

Case 2: Q(z)

=

Suppose that Z-2(Z - Z1)(Z - Z2)'

Then m'" Y'

=

=

{w En: AiO(w)

=1O(z) = 1 ljJ(z)

C(z)

1,

=1-

i=

0, i

=

1,2; A 30 (w)

=

Z2Z-1, (Z1 (Z1

+ Z3)Z-1 + Z1Z3Z-2, + Z2 + Z3 1)Z-1 + (Z2Z31 + Z1(Z2 + Z3 1»Z-2

and 1

R= (_(z,l+ z,) Z1 Z3

-Z2

0

0 1 )-Z2

Moreover, Y2(t, w)

and

O},

= y(t, w)

- z2y(t -

1, w),

t E T,

- Z1Z2Z31Z-3,

982

Chapter 34

=

((t, w)

y(t, W) -

+ Z2 + Z3 1)y(t + Zl (Z2 + Z3 1 ) )y(t -

(Zl

+ (Z2Z31

1, w)

2, w) -

Zl Z2Z31

y(t -

3, w).

Finally, if we let ¢(t, w)

= Y2 (t, w)

-

(Zl

+ Z3 1)Y2 (t -

1, w)

+ Zl Z3 1Y2 (t -

2, w),

and A""

=

((Zl

+ Z3 1)

1 -Zl Z3 ))

1

0'

then it is easy to see that

r4 =

f

A,,"s((Jl

0) A

° °

s=o

,,",S.

T 34.14 now implies that there exists a normally distributed random vector 1 r'§ = (r'§1' r'§2) with mean zero and covariance matrix (Jlri such that jN(a(N, w) + c) converges in distribution to

x=

(0, ~)R'

=

(r'§1' r'§2 -

Evidently, if x =

x3 = -z1x1 -

Z2~1' - Z2~2)'

(X 1,X2 ,X 3 ),

Z2X2

then

w. pro 1.

34.3.3 A Simulation Experiment

The conclusions of T 34.14 in the generality in which they are stated were first established in Stigum 1974 (pp. 360-361). Note, however, that the first half of T 34.14 under the additional assumption that IZpl < 1, p = 1, ... , h, is an immediate consequence of a theorem which is proved by Hannan and by Anderson (see Hannan 1970, theorem VI.l, p. 329, and Anderson 1958, theorem 5.5.7, p. 200). When Ey/(f, W)4 < 00 and IZpl < 1, p = 1, ... , h, the first half of theorem T 34.14 is an immediate consequence of a theorem of Grenander and Rosenblatt (see Grenander and Rosenblatt 1957, pp. 111-114). The condition m'" < n is crucial for the validity of theorem T 34.14, as can be seen by rereading the statement of T 34.13. Since the preceding results have an important bearing on the possibility of using finite samples to test hypotheses concerning the roots of M(z), we conclude this section by recording the results of a simple simulation experiment. Let {", (t); t = - 41, ... , - 1, 0, 1, ... , 60} be independent, normally distributed random variables with mean and variance 1. Moreover, let

E 34.12

°

Least-Squares and Stochastic-Difference Equations

U(t)

40

L (0.5)s1](t -

= -

983

t = - I, 0, I, ... , 20;

5),

s=o V(t) y(t) x(t)

= = =

40

L (1.5)-s1](t + 5),

t= -

s=l

I, 0, I, ... , 20;

+ 1.5 v (t), t = -I, 0, I, ... , 20; + 2(0.5)t + (1.5/, t = -I, 0, I, ... , 20,

0.5 u(t)

(34.86)

y(t)

(34.87)

and observe that both y( .) and x( .) approximately satisfy the equation z(t)

+ alz(t -

1)

+ azz(t -

2)

=

1](t)

(34.100)

with a l = - 2 and a z = 0.75. Finally, let t1 x (N) and t1 y(N), respectively, denote the least-squares estimate of (aI' a z ) based on observations on x( . ) and y( . ). According to T 34.14, lim N t1 x (N) = (2, -0.75) and lim N t1 y(N) = (7/6, -113); for large N the distribution of t1 x (N) is concentrated on the line az = -I.5a l + 2.25. By a pseudo-random sampling scheme, which Dr. Jaffar AI-Abdulla programmed for me, I obtained fifty independently distributed sets of observations on the 1](t) from t = - 41 to t = 60. Those observations and equations 34.86 and 34.87 were used to generate fifty independently distributed sets of observations on the pairs (y(t), x(t)) from t = - 1 to t = 20. Finally, I used each set of observations on (y(t), x(t)) to compute fifty values of (t1 y (20), t1 x (20)). The t1 y (20) values are displayed in figure 34.7 and the t1 x (20) values are displayed in figure 34.8. 34.4 Concluding Remarks In the last two sections we have only paid lip service to the case when M(z) has roots of modules equal to 1. We have also ignored the possibility that the difference equation in 34.1 might contain a constant tenn; and we

have steered clear of all the estimation problems that arise when the 17 (t) do not constitute a purely random process. Hence a few remarks concerning these omissions are called for. First, roots of modulus equal to 1: We know from Lai and Wei 1983 (theorem I, pp. 2-3) that if we add to the assumptions of T 34.6 the condition "EI17(t)1 2H < 00 for some b > 0," the convergence in equation 34.50 will happen with probability 1. We also know from Chan and Wei 1988 that if we add the same condition to the assumptions of T 34.11, we can determine the limiting distribution of the a(N) when the roots of M(z) are all less than or equal to 1 in absolute value. However, in this case both the sequence of nonnalizing constants and the limiting distributions are too involved to describe in a few words. Next, constant tenns: Judging from the results described in Fuller, Hasza, and Goebel 1981, adding a constant tenn to T 34.6 (iii) would not affect the limiting behavior of the least-squares estimate of a in any essential way.

984

Chapter 34



0.20

• 0.10

• 0.00





.•

-0.10

• • •

C\l

l\l

• •

-0.20





-0.30



• -0.40

• •







..



-0.50

• • -0.60



• •



• •

Figure 34.7 11,(20) values.



985

Least-Squares and Stochastic-Difference Equations

-0.10

-0.20

• ••

-0.30

-0.40

-0.50

-0.60

'"